丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Sequence Similarity Searching Using the BLAST Family of Programs

互联网

1056
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

Database sequence similarity searching is carried out thousands of times each day by researchers worldwide and has become a very valuable tool. Over the years, a number of algorithms have been implemented to facilitate database searching. The BLAST (Basic Local Alignment Research Tool) family of sequence similarity search programs allows searches to be done quickly and easily, but with sensitive, yet rigorous statistical expectations. In this unit, which is a completely new version of its predecessor of the same title, the user learns how to access the databases, determine the correct searching strategies, and apply examples of BLAST searches to his or her own data.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Accessing BLAST Programs and Documentation
  • Introduction to BLAST
  • Examples of BLAST Searches
  • Searching Strategies
  • Sequence Alignment Algorithms
  • Appendix A: BLAST Parameters
  • Appendix B: Sequence Identifier Syntax
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 19.3.1 Submitting a BLASTP search using the NCBI's World Wide Web interface.
    View Image
  •   Figure 19.3.2 Example of the top portion of a BLASTP report.
    View Image
  •   Figure 19.3.3 Example of the graphical view of a BLASTP report. The bars are color coded by the strength of the database match. The strongest matches (those with a bit score >200) are red, followed by pink (bit score 80 to 200), green (50 to 80), blue (40 to 50), and black (<40).
    View Image
  •   Figure 19.3.4 Example of the hit list from a BLASTP report.
    View Image
  •   Figure 19.3.5 Example of a BLASTP alignment.
    View Image
  •   Figure 19.3.6 Example of the graphical view of a BLASTX report.
    View Image
  •   Figure 19.3.7 Example of the hit list from a BLASTX report.
    View Image
  •   Figure 19.3.8 Example of selected BLASTX alignments.
    View Image
  •   Figure 19.3.9 Example of the graphical view of a TBLASTN report.
    View Image
  •   Figure 19.3.10 Example of the hit list from a TBLASTN report.
    View Image
  •   Figure 19.3.11 Example of a TBLASTN alignment.
    View Image
  •   Figure 19.3.12 Example of the graphical view of a BLASTN report.
    View Image
  •   Figure 19.3.13 Example of the hit list from a BLASTN report.
    View Image
  •   Figure 19.3.14 Example of a BLASTN alignment.
    View Image
  •   Figure 19.3.15 Example of the hit list from a PSI‐BLAST report.
    View Image
  •   Figure 19.3.16 Example of a hit list from a BLASTP report in which the query sequence was not filtered. Black squares, added manually by the authors, indicate hits that would not appear if the query had been filtered.
    View Image

Videos

Literature Cited

Literature Cited
   Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., et al. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252:1651‐1656.
   Altschul, S.F. 1991. Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219:555‐565.
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nature Genet. 6:119‐129.
   Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402.
   Bairoch, A. and Apweiler, R. 1998. The SWISS‐PROT protein sequence data bank and its supplement TrEMBL in 1998. Nucl. Acids Res. 26:38‐42.
   Barker, W.C., Garavelli, J.S., Haft, D.H., Hunt, L.T., Marzec, C.R., Orcutt, B.C., Srinivasarao, G.Y., Yeh, L.S.L., Ledley, R.S., Mewes, H.W., Pfeiffer, F., and Tsugita, A. 1998. The PIR‐International Protein Sequence Database. Nucl. Acids Res. 26:27‐32.
   Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., and Ouellette, B.F. 1998. GenBank. Nucl. Acids Res. 26:1‐7.
   Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—database for “expressed sequence tags”. Nature Genet. 4:332‐333.
   Chandrasekharappa, S.C., Guru, S.C., Manickam, P., Olufemi, S.E., Collins, F.S., Emmert‐Buck, M.R., Debelenko, L.V., Zhang, Z., Lubensky, I.A., Liotta, L.A., et al. 1997. Positional cloning of the gene for multiple endocrine neoplasia‐type 1. Science 276:404‐407.
   Chang, Z.Y., Nygaard, P., Chinault, A.C., and Kellems, R.E. 1991. Deduced amino acid sequence of Escherichia coli adenosine deaminase reveals evolutionarily conserved amino acid residues: Implications for catalytic function. Biochemistry 30:2273‐2280.
   Claverie, J.M. and Makalowski, W. 1994. Alu alert. Nature 371:752.
   Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol.5, suppl. 3. (M.O. Dayhoff, ed.) pp.345‐352. National Biomedical Research Foundation, Washington, D.C.
   Gish, W. and States, D.J. 1993. Identification of protein coding regions by database similarity search. Nature Genet. 3:266‐272.
   Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915‐10919.
   Holm, L. and Sander, C. 1997. An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease. Proteins 28:72‐82.
   Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264‐2268.
   Karlin, S. and Altschul, S.F. 1993. Applications and statistics for multiple high‐scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U.S.A. 90:5873‐5877.
   Lavin, M.F. and Shiloh, Y. 1997. The genetic defect in ataxia‐telangiectasia. Annu. Rev. Immunol. 15:177‐202.
   Olson, M., Hood, L., Cantor, C., and Botstein, D. 1989. A common language for physical mapping of the human genome. Science 245:1434‐1435.
   Ostell, J.M. and Kans, J.A. 1998. The NCBI data model. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp.121‐144. John Wiley & Sons, New York.
   Ouellette, B.F.F. 1998. The GenBank sequence database. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp.16‐45. John Wiley & Sons, New York.
   Ouellette, B.F. and Boguski, M.S. 1997. Database divisions and homology search files: A guide for the perplexed. Genome Res. 7:952‐955.
   Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:63‐98.
   Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
   Schuler, G.D. 1998. Sequence alignment and database searching. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp.145‐171. John Wiley & Sons, New York.
   Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. In Atlas of Protein Sequence and Structure, Vol.5, suppl. 3. (M.O. Dayhoff, ed.) pp.353‐358. National Biomedical Research Foundation, Washington, D.C.
   Seabra, M.C., Brown, M.S., and Goldstein, J.L. 1993. Retinal degeneration in choroideremia: Deficiency of rab geranylgeranyl transferase. Science 259:377‐381.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy for high resolution sequence‐based physical mapping of complex genomes. Nature Genet. 7:40‐47.
   Stoesser, G., Moseley, M.A., Sleep, J., McGowran, M., Garcia‐Pastor, M., and Sterk, P. 1998. The EMBL nucleotide sequence database. Nucl. Acids Res. 26:8‐15.
   Tateno, Y., Fukami‐Kobayashi, K., Miyazaki, S., Sugawara, H., and Gojobori, T. 1998. DNA Data Bank of Japan at work on genome sequence data. Nucl. Acids Res. 26:16‐20.
   Wolfsberg, T.G., Straight, P.D., Gerena, R.L., Huovila, A.P., Primakoff, P., Myles, D.G., and White, J.M. 1995. ADAM, a widely distributed and developmentally regulated gene family encoding membrane proteins with a disintegrin and metalloprotease domain. Dev. Biol. 169:378‐383.
   Wootton, J.C. and Federhen, S. 1993. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149‐163.
   Wootton, J.C. and Federhen, S. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266:554‐571.
   Zhang, J. and Madden, T.L. 1997. PowerBLAST: A new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res. 7:649‐656.
   Zhang, Z., Berman, P., and Miller, W. 1998. Alignments without low‐scoring regions. J. Comput. Biol. 5:197‐210.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序