Sequence Similarity Searching Using the BLAST Family of Programs

互联网2013-12-31

1117

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Database sequence similarity searching is carried out thousands of times each day by researchers worldwide and has become a very valuable tool. Over the years, a number of algorithms have been implemented to facilitate database searching. The BLAST (Basic Local Alignment Research Tool) family of sequence similarity search programs allows searches to be done quickly and easily, but with sensitive, yet rigorous statistical expectations. In this unit, which is a completely new version of its predecessor of the same title, the user learns how to access the databases, determine the correct searching strategies, and apply examples of BLAST searches to his or her own data.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Accessing BLAST Programs and Documentation
Introduction to BLAST
Examples of BLAST Searches
Searching Strategies
Sequence Alignment Algorithms
Appendix A: BLAST Parameters
Appendix B: Sequence Identifier Syntax
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 19.3.1 Submitting a BLASTP search using the NCBI's World Wide Web interface.

View Image
Figure 19.3.2 Example of the top portion of a BLASTP report.

View Image

Figure 19.3.3 Example of the graphical view of a BLASTP report. The bars are color coded by the strength of the database match. The strongest matches (those with a bit score >200) are red, followed by pink (bit score 80 to 200), green (50 to 80), blue (40 to 50), and black (<40).

View Image

Figure 19.3.4 Example of the hit list from a BLASTP report.

View Image
Figure 19.3.5 Example of a BLASTP alignment.

View Image
Figure 19.3.6 Example of the graphical view of a BLASTX report.

View Image
Figure 19.3.7 Example of the hit list from a BLASTX report.

View Image
Figure 19.3.8 Example of selected BLASTX alignments.

View Image
Figure 19.3.9 Example of the graphical view of a TBLASTN report.

View Image
Figure 19.3.10 Example of the hit list from a TBLASTN report.

View Image
Figure 19.3.11 Example of a TBLASTN alignment.

View Image
Figure 19.3.12 Example of the graphical view of a BLASTN report.

View Image
Figure 19.3.13 Example of the hit list from a BLASTN report.

View Image
Figure 19.3.14 Example of a BLASTN alignment.

View Image
Figure 19.3.15 Example of the hit list from a PSI‐BLAST report.

View Image

Figure 19.3.16 Example of a hit list from a BLASTP report in which the query sequence was not filtered. Black squares, added manually by the authors, indicate hits that would not appear if the query had been filtered.

View Image

Videos

Literature Cited

Literature Cited
	Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., et al. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252:1651‐1656.
	Altschul, S.F. 1991. Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219:555‐565.
	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nature Genet. 6:119‐129.
	Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402.
	Bairoch, A. and Apweiler, R. 1998. The SWISS‐PROT protein sequence data bank and its supplement TrEMBL in 1998. Nucl. Acids Res. 26:38‐42.
	Barker, W.C., Garavelli, J.S., Haft, D.H., Hunt, L.T., Marzec, C.R., Orcutt, B.C., Srinivasarao, G.Y., Yeh, L.S.L., Ledley, R.S., Mewes, H.W., Pfeiffer, F., and Tsugita, A. 1998. The PIR‐International Protein Sequence Database. Nucl. Acids Res. 26:27‐32.
	Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., and Ouellette, B.F. 1998. GenBank. Nucl. Acids Res. 26:1‐7.
	Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—database for “expressed sequence tags”. Nature Genet. 4:332‐333.
	Chandrasekharappa, S.C., Guru, S.C., Manickam, P., Olufemi, S.E., Collins, F.S., Emmert‐Buck, M.R., Debelenko, L.V., Zhang, Z., Lubensky, I.A., Liotta, L.A., et al. 1997. Positional cloning of the gene for multiple endocrine neoplasia‐type 1. Science 276:404‐407.
	Chang, Z.Y., Nygaard, P., Chinault, A.C., and Kellems, R.E. 1991. Deduced amino acid sequence of Escherichia coli adenosine deaminase reveals evolutionarily conserved amino acid residues: Implications for catalytic function. Biochemistry 30:2273‐2280.
	Claverie, J.M. and Makalowski, W. 1994. Alu alert. Nature 371:752.
	Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol.5, suppl. 3. (M.O. Dayhoff, ed.) pp.345‐352. National Biomedical Research Foundation, Washington, D.C.
	Gish, W. and States, D.J. 1993. Identification of protein coding regions by database similarity search. Nature Genet. 3:266‐272.
	Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915‐10919.
	Holm, L. and Sander, C. 1997. An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease. Proteins 28:72‐82.
	Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264‐2268.
	Karlin, S. and Altschul, S.F. 1993. Applications and statistics for multiple high‐scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U.S.A. 90:5873‐5877.
	Lavin, M.F. and Shiloh, Y. 1997. The genetic defect in ataxia‐telangiectasia. Annu. Rev. Immunol. 15:177‐202.
	Olson, M., Hood, L., Cantor, C., and Botstein, D. 1989. A common language for physical mapping of the human genome. Science 245:1434‐1435.
	Ostell, J.M. and Kans, J.A. 1998. The NCBI data model. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp.121‐144. John Wiley & Sons, New York.
	Ouellette, B.F.F. 1998. The GenBank sequence database. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp.16‐45. John Wiley & Sons, New York.
	Ouellette, B.F. and Boguski, M.S. 1997. Database divisions and homology search files: A guide for the perplexed. Genome Res. 7:952‐955.
	Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:63‐98.
	Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
	Schuler, G.D. 1998. Sequence alignment and database searching. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp.145‐171. John Wiley & Sons, New York.
	Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. In Atlas of Protein Sequence and Structure, Vol.5, suppl. 3. (M.O. Dayhoff, ed.) pp.353‐358. National Biomedical Research Foundation, Washington, D.C.
	Seabra, M.C., Brown, M.S., and Goldstein, J.L. 1993. Retinal degeneration in choroideremia: Deficiency of rab geranylgeranyl transferase. Science 259:377‐381.
	Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
	Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy for high resolution sequence‐based physical mapping of complex genomes. Nature Genet. 7:40‐47.
	Stoesser, G., Moseley, M.A., Sleep, J., McGowran, M., Garcia‐Pastor, M., and Sterk, P. 1998. The EMBL nucleotide sequence database. Nucl. Acids Res. 26:8‐15.
	Tateno, Y., Fukami‐Kobayashi, K., Miyazaki, S., Sugawara, H., and Gojobori, T. 1998. DNA Data Bank of Japan at work on genome sequence data. Nucl. Acids Res. 26:16‐20.
	Wolfsberg, T.G., Straight, P.D., Gerena, R.L., Huovila, A.P., Primakoff, P., Myles, D.G., and White, J.M. 1995. ADAM, a widely distributed and developmentally regulated gene family encoding membrane proteins with a disintegrin and metalloprotease domain. Dev. Biol. 169:378‐383.
	Wootton, J.C. and Federhen, S. 1993. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149‐163.
	Wootton, J.C. and Federhen, S. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266:554‐571.
	Zhang, J. and Madden, T.L. 1997. PowerBLAST: A new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res. 7:649‐656.
	Zhang, Z., Berman, P., and Miller, W. 1998. Alignments without low‐scoring regions. J. Comput. Biol. 5:197‐210.