
Finding Similar Nucleotide Sequences Using Network BLAST Searches


The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user?friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn . Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low?complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge. Curr. Protoc. Bioinform. 26:3.3.1?3.3.26. © 2009 by John Wiley & Sons, Inc.

Keywords: BLAST; sequence alignment; database search; homology search; mapping; nucleic acid; DNA; RNA; genome; blastn; Megablast

  •   Figure 3.3.1 The home page of the NCBI BLAST server (http://blast.ncbi.nlm.nih.gov).
  •   Figure 3.3.2 The basic search screen for nucleic acid BLAST at NCBI. The title line and the sequence of the human let‐7c microRNA in FASTA‐format were pasted into the search field.
  •   Figure 3.3.3 The results of a blastn search using . (A ) Administrative section and the color‐coded graphical display of the best hits to the query sequence. (B ) One‐line descriptions of the database sequences similar to the query with maximal and total scores, total coverage, E‐value, maximal percent identity, and links to other databases.
  •   Figure 3.3.4 Pairwise local alignment of the query and the mouse BAC clone RP24‐270A10 from chromosome 13. Note that the query matches this clone at two distant locations.
  •   Figure 3.3.5 Algorithm parameters.
  •   Figure 3.3.6 Reformatting BLAST results.
  •   Figure 3.3.7 The hit table view for automatic parsing.
  •   Figure 3.3.8 The query, database and limit selection page for Megablast for the human 5.8S ribosomal RNA (NR_003285).
  •   Figure 3.3.9 The Algorithm parameter selection page for the Megablast project of Figure .
  •   Figure 3.3.10 Alignment number 1000: human 5.8S ribosomal RNA (NR_003285) versus the 18S ribosomal RNA of Pisidium nitidum , a bivalvic mollusc.
  •   Figure 3.3.11 The query, database and limit selection page for Megablast for the human Duchenne muscular dystrophy gene (NM_000109).
  •   Figure 3.3.12 The Algorithm parameter selection page for the Megablast for the human Duchenne muscular dystrophy gene (NM_000109).
  •   Figure 3.3.13 Splice variants of the human Duchenne muscular dystrophy gene (NM_000109). Splice variants are indicated by interrupted lines representing sequences.
  •   Figure 3.3.14 The Genomic View of the localizations of the sequences similar to the human Duchenne muscular dystrophy gene (NM_000109).
Key References
   Altschul et al., 1994. See above.
   An excellent review on the application of pairwise BLAST tools for the identification of possible coding regions, for the elucidation of gene structure and protein function. This review discusses significance sequence filtering, database issues, alignment statistics, gap costs, scoring systems, and others.
   Altschul et al., 1997. See above.
   This is the original research paper on gapped alignment blast and position specific iterative BLAST. A series of algorithmic and performance improvements, gap penalty, and statistical considerations, as well as biological examples with marginal similarities are covered.
   Baxevanis, A.D. and Ouellette, B.F. 2005. Bioinformatics. A Practical Guide to the Analysis of Genes and Proteins. John Wiley & Sons, Hoboken, N.J.
   A widely taught, clearly written textbook that introduces pairwise sequence similarity searches, biological databases, and many other areas of bioinformatics. Reviews the general concepts of alignments, scoring matrices, and BLAST with practical applications and guidelines for interpretation.
   Korf et al., 2003. See above.
   An excellent overview of theory and practice of the BLAST tools as of 2003. This most comprehensive and easy‐to‐understand textbook is highly recommended to everyone in bioinformatics or computational biology.
Internet Resources
   The NCBI BLAST Web site.
   The Entrez Documentation at NCBI.
   The Entrez site for nucleic acid searches at NCBI.
   The BioPerl site.
   The full documentation for BLAST at NCBI.
   The European Bioinformatics Institute Server for the Washington University BLAST.
   The RepeatMasker Website.
   The Genetic Research Institute Website.
