Finding Homologs to Nucleic Acid or Protein Sequences Using the Framesearch Program
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
The Framesearch algorithm includes the possibility of a frameshift error in its alignment algorithm, and therefore can find alignments that span different reading frames. Protocols in this unit describe the use of Framesearch to search a protein sequence database for sequences that are similar to a query nucleotide sequence, and to search a nucleotide sequence database for sequences that are similar to a query protein sequence. Three alternate protocols describe ways to improve the speed of Framesearch and thus make it practical for routine use. Framesearch is especially appropriate for low?quality single?read nucleotide sequence data, such as ESTs (expressed sequence tags) or early drafts of genomic sequences; it does not offer any significant advantage over less CPU?intensive algorithms for relatively high?quality nucleotide sequences without many single?nucleotide insertion or deletion errors.
Table of Contents
- Basic Protocol 1: Framesearch Using a Nucleic Acid Query Sequence
- Basic Protocol 2: Framesearch Using a Protein Query Sequence
- Alternate Protocol 1: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Nucleic Acid Query Sequence
- Alternate Protocol 2: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Protein Query Sequence
- Alternate Protocol 3: Improving Speed of Framesearch by Using Specialized Hardware
- Support Protocol 1: Downloading and Converting Sequence Files for the Examples Used in the Protocols
- Guidelines for Understanding Results
- Commentary
- Figures
Materials
Basic Protocol 1: Framesearch Using a Nucleic Acid Query Sequence
Necessary Resources
Basic Protocol 2: Framesearch Using a Protein Query Sequence
Necessary Resources
Alternate Protocol 1: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Nucleic Acid Query Sequence
Necessary Resources
Alternate Protocol 2: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Protein Query Sequence
Necessary Resources
Alternate Protocol 3: Improving Speed of Framesearch by Using Specialized Hardware
Necessary Resources
|
Figures
-
Figure 3.2.1 Six‐frame‐translated search versus Framesearch. View Image -
Figure 3.2.2 Distribution of scores generated by using Framesearch to compare nucleotides 52500 through 55000 of gi‐15829254_55.seq with all peptide sequences from the example bacterial genome. Since the selected region comprises all of one gene and parts of two flanking genes, there are three very strong hits, highlighted by arrows above. There are also many lower‐quality hits with scores below 400. Most likely, hits with scores above 200 represent genes related to the three genes contained in this region, while hits with scores between 100 and 200 may represent borderline matches, but scores below 100 probably do not represent biologically significant matches. View Image -
Figure 3.2.3 The list of hits from a Framesearch run in which a nucleic acid sequence was used to search a number of peptide sequences. The name of the query sequence, the wildcard expression specifying the target sequences, and the name of the peptide sequence with the best match have been boldfaced in the sample output. View Image -
Figure 3.2.4 Alignment of a nucleotide query sequence against a peptide database sequence, generated by Framesearch. Note that the middle portion has been omitted here. The names of the query and database sequences, just above this alignment, have been boldfaced for emphasis. View Image -
Figure 3.2.5 The list of hits from a Framesearch run in which an amino acid sequence was used to search a number of nucleotide sequences. The name of the query sequence, the wildcard expression specifying the target sequences, and the name of the nucleotide sequence with the best match have been boldfaced in the sample output. View Image -
Figure 3.2.6 Alignment of an amino acid query sequence against a nucleotide database sequence, generated by Framesearch. Note that the middle portion has been omitted here. The names of the query and database sequences, just above this alignment, have been boldfaced for emphasis. Also note that following the name of the nucleotide sequence in this example is the string “/rev”, which means this alignment is to the reverse complement of this nucleotide sequence. View Image -
Figure 3.2.7 Illustration of how insertion and deletion errors affect alignments generated by the six‐frame‐translated Smith‐Waterman algorithm. Note that this example was generated on a DeCypher genomics accelerator, manufactured by TimeLogic. SSEARCH in the GCG environment would give very similar results, in a slightly different format. The nucleotides selected are the reverse complement of those nucleotides from the E. coli O157:H7 genome, NCBI REFSEQ number NC_002695, which correspond to amino acids 1 to 84 of the protein with NCBI gi number 13361126. View Image -
Figure 3.2.8 A Framesearch alignment between a nucleotide query sequence and a peptide target sequence, in the format generated by a TimeLogic DeCypher genomics accelerator system. Framesearch in the GCG environment would generate the same output, in a slightly different format. The nucleotides selected for this example are the reverse complement of those nucleotides from the E. coli O157:H7 genome, NCBI REFSEQ number NC_002695, which correspond to amino acids 1 to 84 of the protein with NCBI gi number 13361126. Compare this figure with Figure , which shows how Framesearch dynamically follows the correct reading frame despite the frameshift errors created when indel errors are deliberately introduced into the nucleotide sequence. View Image -
Figure 3.2.9 This is a continuation of Figure , and should be compared with it. View Image
Videos
Literature Cited
Literature Cited | |
Accelerys. 2001. Announcement of new features in SeqWeb version 2 http://www.accelerys.com/products/seqweb/whats_new2p0.html. | |
NOTE: The text of this poster can be found at http://sulu.gcg.com/company/posters/framesearch.html. | |
Edelman, I., Faigler, S., Mintz, E., Natan, A., and Devereux, J. 1995. Framesearch: A rigorous alignment program for searching protein databases with nucleic acid queries. Poster, Genome Sequence and analysis Conference, Hilton Head, South Carolina, 1995. | |
NOTE: The GCG Transcript, subtitled “Bio‐Computing News for Users of the Wisconsin Package,” was published by the company for a number of years. The text of this issue, which features a discussion of the newly‐added Framesearch program, can be found at http://sulu.gcg.com/pub/newsletter/vol3_no2_nov95.html. | |
GCG. 1995. GCG Transcript 3:2. Genetics Computing Group, Madison, Wisconsin. | |
Halperin, E., Faigler, S., and Gill‐More, R. 1999. FramePlus: Aligning DNA to protein sequences. Bioinformatics 15(11):867‐873. | |
TimeLogic. 2001. Manuals supplied with a DeCypher bioinformatics accelerator. TimeLogic Corporation, Incline Village, Nevada. | |
Zhang, Z., Pearson, W.R., and Miller, W. 1997. Aligning a DNA sequence with a protein sequence. Journal of Computational Biology 4(3):339‐349. | |
Key References | |
Edelman et al., 1995. See above. | |
The key reference for the Framesearch algorithm is the poster by Edelman. The key reference for a particular implementation of Framesearch is the documentation supplied with that implementation. | |
Internet Resources | |
http://www.accelerys.com/ | |
Web site of Accelerys, the corporate parent of GCG. | |
http://www.cgen.com/ | |
Web site of the Compugen company. | |
http://www.paracel.com/ | |
Web site of the Paracel company. | |
http://www.timelogic.com | |
Web site of the TimeLogic company. |