丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Finding Similar Nucleotide Sequences Using Network BLAST Searches

互联网

1001
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user?friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn . Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low?complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge. Curr. Protoc. Bioinform. 26:3.3.1?3.3.26. © 2009 by John Wiley & Sons, Inc.

Keywords: BLAST; sequence alignment; database search; homology search; mapping; nucleic acid; DNA; RNA; genome; blastn; Megablast

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using the Web‐Interface BLAST from the NCBI BLAST Server for Nucleotide Sequences
  • Basic Protocol 2: The Default Blastn Result Output
  • Support Protocol 1: Setting Optional Parameters
  • Support Protocol 2: Formatting Results of a BLAST Search
  • Alternate Protocol 1: Megablast Search for Ribosomal RNA
  • Alternate Protocol 2: Finding Transcribed Gene Copies and Splice Variants Using Megablast
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 3.3.1 The home page of the NCBI BLAST server (http://blast.ncbi.nlm.nih.gov).
    View Image
  •   Figure 3.3.2 The basic search screen for nucleic acid BLAST at NCBI. The title line and the sequence of the human let‐7c microRNA in FASTA‐format were pasted into the search field.
    View Image
  •   Figure 3.3.3 The results of a blastn search using . (A ) Administrative section and the color‐coded graphical display of the best hits to the query sequence. (B ) One‐line descriptions of the database sequences similar to the query with maximal and total scores, total coverage, E‐value, maximal percent identity, and links to other databases.
    View Image
  •   Figure 3.3.4 Pairwise local alignment of the query and the mouse BAC clone RP24‐270A10 from chromosome 13. Note that the query matches this clone at two distant locations.
    View Image
  •   Figure 3.3.5 Algorithm parameters.
    View Image
  •   Figure 3.3.6 Reformatting BLAST results.
    View Image
  •   Figure 3.3.7 The hit table view for automatic parsing.
    View Image
  •   Figure 3.3.8 The query, database and limit selection page for Megablast for the human 5.8S ribosomal RNA (NR_003285).
    View Image
  •   Figure 3.3.9 The Algorithm parameter selection page for the Megablast project of Figure .
    View Image
  •   Figure 3.3.10 Alignment number 1000: human 5.8S ribosomal RNA (NR_003285) versus the 18S ribosomal RNA of Pisidium nitidum , a bivalvic mollusc.
    View Image
  •   Figure 3.3.11 The query, database and limit selection page for Megablast for the human Duchenne muscular dystrophy gene (NM_000109).
    View Image
  •   Figure 3.3.12 The Algorithm parameter selection page for the Megablast for the human Duchenne muscular dystrophy gene (NM_000109).
    View Image
  •   Figure 3.3.13 Splice variants of the human Duchenne muscular dystrophy gene (NM_000109). Splice variants are indicated by interrupted lines representing sequences.
    View Image
  •   Figure 3.3.14 The Genomic View of the localizations of the sequences similar to the human Duchenne muscular dystrophy gene (NM_000109).
    View Image

Videos

Literature Cited

   Altschul, S.F. 1991. Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219:555‐565.
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nat. Genet. 6:119‐129.
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402.
   Baker, M.E., Yan, L., and Pear, M.R. 2000. Three‐dimensional model of human TIP30, a coactivator for HIV‐1 Tat‐activated transcription, and CC3, a protein associated with metastasis suppression. Cell Mol. Life Sci. 57:851‐858.
   Barrett, C., Hughey, R., and Karplus, K. 1997. Scoring hidden Markov models. Comput. Appl. Biosci. 13:191‐199.
   Baxevanis, A.D. 2005. Assessing pairwise sequence similarity: BLAST and FASTA. In Bioinformatics. A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F. Ouellette, eds.), pp. 295‐324. John Wiley & Sons, Hoboken, N.J.
   Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucl. Acids Res. 28:235‐242.
   Birney, E. and Durbin, R. 1997. Dynamite: A flexible code generating language for dynamic programming methods used in sequence comparison. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5:56‐64.
   Bolten, E., Schliep, A., Schneckener, S., Schomburg, D., and Schrader, R. 2001. Clustering protein sequences–structure prediction by transitive homology. Bioinformatics 17:935‐941.
   Dayhoff, M.O. and Eck, R.V. 1968. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure (M.O. Dayhoff, ed.), pp. 33‐45. National Biomedical Research Foundation, Silver Spring, Md.
   Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14:755‐763.
   Elbashir, S.M., Harborth, J., Weber, K., and Tuschl, T. 2002. Analysis of gene function in somatic mammalian cells using small interfering RNAs. Methods 26:199‐213.
   Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L., and Bateman, A. 2008. The Pfam protein families database. Nucl. Acids Res. 36:D281‐D288.
   Gerstein, M. 1998. Measurement of the effectiveness of transitive sequence comparison, through a third ‘intermediate’ sequence. Bioinformatics 14:707‐714.
   Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915‐10919.
   Holm, L. and Sander, C. 1998. Removing near‐neighbor redundancy from large protein sequence collections. Bioinformatics 14:423‐429.
   Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., and Madden, T.L. 2008. NCBI BLAST: A better web interface. Nucl. Acids Res. 36:W5‐W9.
   Jurka, J., Kapitonov, V.V., Kohany, O., and Jurka, M.V. 2007. Repetitive sequences in complex genomes: Structure and evolution. Annu. Rev. Genomics Hum. Genet. 8:241‐259.
   Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264‐2268.
   Karlin, S. and Bucher, P. 1992. Correlation analysis of amino acid usage in protein classes. Proc. Natl. Acad. Sci. U.S.A. 89:12165‐12169.
   Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Kober, K.M., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., and Kent, W.J. 2008. The UCSC Genome Browser Database: 2008 update. Nucl. Acids Res. 36:D773‐D779.
   Kent, W.J. 2002. BLAT–the BLAST‐like alignment tool. Genome Res. 12:656‐664.
   Korf, I., Yandell, M., and Bedell, J. 2003. BLAST. An Essential Guide to the Basic Local Alignment Tool. O'Reilly, Sebastopol, Calif.
   Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. 2006. SMART 5: Domains in the context of genomes and networks. Nucl. Acids Res. 34:D257‐D260.
   Liang, Y.D. 2006. Introduction to JAVA programming: Comprehensive Version, 3rd Ed. Pearson Prentice Hall, Lebanon, Ind.
   Møller, A. and Schwartzbach, M.I. 2006. An introduction to XML and Web technologies. Addison‐Wesley, New York.
   Morgulis, A., Gertz, E.M., Schaffer, A.A., and Agarwala, R. 2006. A fast and symmetric DUST implementation to mask low‐complexity DNA sequences. J. Comput. Biol. 13:1028‐1040.
   Morgulis, A., Coulouris, G., Raytselis, Y., Madden, T.L., Agarwala, R., and Schaffer, A.A. 2008. Database Indexing for Production MegaBLAST Searches. Bioinformatics 24:1757‐1756.
   Ning, Z., Cox, A.J., and Mullikin, J.C. 2001. SSAHA: A fast search method for large DNA databases. Genome Res. 11:1725‐1729.
   Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. 1998. SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. U.S.A. 95:5857‐5864.
   Stajich, J.E. 2007. An introduction to BioPerl. Methods Mol. Biol. 406:535‐548.
   Stein, L. 1998. Official Guide to Programming with CGI.pm. The Standard for Building Web Scripts. John Wiley and Sons, New York.
   Tisdall, J.D. 2001 Beginning PERL for Bioinformatics. An Introduction to PERL for Biologists. O'Reilly, Sebastopol, Calif.
   Ullman, L. 2006. MySQL: Visual Quickstart Guide. Peachpit Press, Berkeley, Calif.
   Wang, Y., Addess, K.J., Chen, J., Geer, L.Y., He, J., He, S., Lu, S., Madej, T., Marchler‐Bauer, A., Thiessen, P.A., Zhang, N., and Bryant, S.H. 2007. MMDB: Annotating protein sequences with Entrez's 3D‐structure database. Nucl. Acids Res. 35:D298‐D300.
   Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, T.L., Maglott, D.R., Miller, V., Ostell, J., Pruitt, K.D., Schuler, G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L., and Yaschenko, E. 2008. Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 36:D13‐D21.
   Wootton, J.C. and Federhen, S. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymol 266:554‐571.
   Wu, T.D. and Watanabe, C.K. 2005. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859‐1875.
   Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. 2000. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7:203‐214.
   Zweig, A.S., Karolchik, D., Kuhn, R.M., Haussler, D., and Kent, W.J. 2008. UCSC genome browser tutorial. Genomics 92:75‐84.
Key References
   Altschul et al., 1994. See above.
   An excellent review on the application of pairwise BLAST tools for the identification of possible coding regions, for the elucidation of gene structure and protein function. This review discusses significance sequence filtering, database issues, alignment statistics, gap costs, scoring systems, and others.
   Altschul et al., 1997. See above.
   This is the original research paper on gapped alignment blast and position specific iterative BLAST. A series of algorithmic and performance improvements, gap penalty, and statistical considerations, as well as biological examples with marginal similarities are covered.
   Baxevanis, A.D. and Ouellette, B.F. 2005. Bioinformatics. A Practical Guide to the Analysis of Genes and Proteins. John Wiley & Sons, Hoboken, N.J.
   A widely taught, clearly written textbook that introduces pairwise sequence similarity searches, biological databases, and many other areas of bioinformatics. Reviews the general concepts of alignments, scoring matrices, and BLAST with practical applications and guidelines for interpretation.
   Korf et al., 2003. See above.
   An excellent overview of theory and practice of the BLAST tools as of 2003. This most comprehensive and easy‐to‐understand textbook is highly recommended to everyone in bioinformatics or computational biology.
Internet Resources
   http://blast.ncbi.nlm.nih.gov
   The NCBI BLAST Web site.
   http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.chapter.EntrezHelp
   The Entrez Documentation at NCBI.
   http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar
   The Entrez site for nucleic acid searches at NCBI.
   http://www.bioperl.org
   The BioPerl site.
   http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs
   The full documentation for BLAST at NCBI.
   http://www.ebi.ac.uk/blast2/nucleotide.html
   The European Bioinformatics Institute Server for the Washington University BLAST.
   http://repeatmasker.genome.washington.edu
   The RepeatMasker Website.
   http://www.girinst.org/Censor_Server.html
   The Genetic Research Institute Website.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序