Finding Homologs in Amino Acid Sequences Using Network BLAST Searches

互联网2013-12-31

1438

Abstract
Table of Contents
Figures
Literature Cited

Abstract

The Basic Local Alignment Search Tool (BLAST) is the most fundamental (and most misused) algorithm and software in bioinformatics/computational biology for functional assessment of unknown proteins or discovery of similar proteins with potentially common evolutionary origins. We show how to balance sensitivity with selectivity (without generating massive output) by selecting and demonstrating proper database, algorithm, and alignment display options of the user?friendly Web sites of the National Center for Biotechnology Information (NCBI). We discuss protein query searches against protein databases and submission of all combinations of translated searches. Careful biological and statistical inferences are drawn to possible functions, taking into account the highly nonrandom nature of proteins. Guidelines for such inferences, using real?life biological examples (e.g., protein kinases with widely distributed structural and functional domains), are provided. We show how to avoid incorrect functional inference from misleading similarities, using the divergent evolution of a serine protease domain that erodes the protease function in haptoglobins. Curr. Protoc. Bioinform. 25:3.4.1?3.4.34. © 2009 by John Wiley & Sons, Inc.

Keywords: BLAST; bioinformatics; computational biology; database search; functional assessment; statistical inference; local alignment; translated database search

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Using the BLAST Web Interface to Perform a Protein‐to‐Protein Search (blastp)
Support Protocol 1: Setting Algorithm Parameters for Advanced BLAST
Support Protocol 2: Reformatting Results from a BLAST Search
Basic Protocol 2: Translated BLAST Searches
Basic Protocol 3: bl2seq for Comparing Two Sequences
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 3.4.1 The top page for the Basic Local Alignment Search Tool (BLAST) at the National Center for Biotechnology Information (NCBI) Web server.

View Image

Figure 3.4.2 The top screen of the home page for protein‐to‐protein BLAST searches on the NCBI Web server.

View Image

Figure 3.4.3 Algorithm parameters for advanced BLAST searches on the NCBI Web server can be displayed by clicking on “Algorithm parameter” (shown at the bottom of Fig. ).

View Image

Figure 3.4.4 The blastp result page displayed in three panels. (A ) The top section of the results with links to the BLAST home page, recent results, saved search strategies, help, reformatting, and resubmission, and for saving the search strategies. Database and query sequence information is also shown. (B ) The graphical summary of the alignments and their one‐line descriptions, with Link Out icons representing linked external databases (see Table in ). (C ) Sequence retrieval links and detailed pairwise alignments between the query sequence and the search sequences.

View Image

Figure 3.4.5 The taxonomy report displays found sequences, sorted by organism.

View Image
Figure 3.4.6 The distance tree of results shows results based on protein similarity.

View Image

Figure 3.4.7 The related structures display shows related proteins with known three‐dimensional structures.

View Image

Figure 3.4.8 The Format Request page. Note the Request ID number.

View Image

Figure 3.4.9 The multiple alignment view shows an alignment that is a result of choosing “Query‐anchored with letters for identities.”

View Image

Figure 3.4.10 View of a “Hit table.” This view displays separate rows for each hit, with tab‐delimited fields display the high‐scoring segment pairs for each database sequence.

View Image

Figure 3.4.11 The top page for translated BLAST searches at the WebBLAST server at NCBI. This screen appears when the blastx program is selected.

View Image

Figure 3.4.12 Results of a blastx search of a part of the human dystrophin gene submitted from the page shown in Figure .

View Image

Figure 3.4.13 blastp search of the amino acid sequence of the human dystrophin protein (NP_000100.2 in RefSeq) against the Swiss‐Prot database.

View Image

Figure 3.4.14 Launching bl2seq to perform BLAST comparisons of two sequences.

View Image

Figure 3.4.15 The bl2seq alignment of the human haptoglobin and complement C1r‐B subcomponent precursor.

View Image

Videos

Literature Cited

	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nat. Genet. 6:119‐129.
	Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402.
	Altschul, S.F., Wootton, J.C., Gertz, E.M., Agarwala, R., Morgulis, A., Schaffer, A.A., and Yu, Y.K. 2005. Protein database searches using compositionally adjusted substitution matrices. FEBS J. 272:5101‐5109.
	Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J., Chothia, C., and Murzin, A.G. 2008. Data growth and its impact on the SCOP database: New developments. Nucl. Acids Res. 36:D419‐D425.
	Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N., and Yeh, L.S. 2005. The Universal Protein Resource (UniProt). Nucl. Acids Res. 33:D154‐D159.
	Bajic, V.B., Brent, M.R., Brown, R.H., Frankish, A., Harrow, J., Ohler, U., Solovyev, V.V., and Tan, S.L. 2006. Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol. 7:S3.1‐S3.13.
	Baxevanis, A.D. 2005. Assessing pairwise sequence similarity: BLAST and FASTA. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd ed. (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp. 295‐324. John Wiley & Sons, New York.
	Baxevanis, A.D. and Ouellette, B.F. (eds.) 2005. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd ed. John Wiley & Sons, New York.
	Benson, D.A., Karsch‐Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. 2008. GenBank. Nucl. Acids Res. 36:D25‐D30.
	Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucl. Acids Res. 28:235‐242.
	Birney, E., Clamp, M., and Durbin, R. 2004. GeneWise and Genomewise. Genome Res. 14:988‐995.
	Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., and Bairoch, A. 2007. UniProtKB/Swiss‐Prot: The manually annotated section of the UniProt knowledge base. Methods Mol. Biol. 406:89‐112.
	Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94.
	Carter, K. and Worwood, M. 2007. Haptoglobin: A review of the major allele frequencies worldwide and their association with diseases. Int. J. Lab. Hematol. 29:92‐110.
	Dayhoff, M.O. and Eck, R.V. 1968. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure (M.O. Dayhoff, ed.) pp. 33‐45. National Biomedical Research Foundation, Silver Spring, Md.
	Elnitski, L., Riemer, C., Petrykowska, H., Florea, L., Schwartz, S., Miller, W., and Hardison, R. 2002. PipTools: A computational toolkit to annotate and analyze pairwise comparisons of genomic sequences. Genomics 80:681‐690.
	Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L., and Bateman, A. 2008. The Pfam protein families database. Nucl. Acids Res. 36:D281‐D288.
	Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8:967‐974.
	Gish, W. and States, D.J. 1993. Identification of protein coding regions by database similarity search. Nat. Genet. 3:266‐272.
	Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915‐10919.
	Huang, X. and Zhang, J. 1996. Methods for comparing a DNA sequence with a protein sequence. Comput. Appl. Biosci. 12:497‐506.
	Hubbard, T.J., Ailey, B., Brenner, S.E., Murzin, A.G., and Chothia, C. 1998. SCOP, Structural Classification of Proteins database: Applications to evaluation of the effectiveness of sequence alignment methods and statistics of protein structural data. Acta Crystallogr. D Biol. Crystallogr. 54:1147‐1154.
	Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110:462‐467.
	Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264‐2268.
	Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Kober, K.M., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., and Kent, W.J. 2008. The UCSC genome browser database: 2008 update. Nucl. Acids Res. 36:D773‐D779.
	Korf, I., Yandell, M., and Bedell, J. 2003. BLAST. An essential guide to the Basic Local Alignment Tool. O'Reilly, Sebastopol, Calif.
	Krogh, A., Brown, M., Mian, I.S., Sjolander, K., and Haussler, D. 1994. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235:1501‐1531.
	Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange‐Thomann, N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R.A., Muzny, D.M., Scherer, S.E., Bouck, J.B., Sodergren, E.J., Worley, K.C., Rives, C.M., Gorrell, J.H., Metzker, M.L., Naylor, S.L., Kucherlapati, R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D.R., Doucette‐Stamm, L., Rubenfield, M., Weinstock, K., Lee, H.M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R.W., Federspiel, N.A., Abola, A.P., Proctor, M.J., Myers, R.M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D.R., Olson, M.V., Kaul, R., Raymond, C., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G.A., Athanasiou, M., Schultz, R., Roe, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W.R., de la Bastide, M., Dedhia, N., Blöcker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J.A., Bateman, A., Batzoglou, S., Birne, E., Bork, P., Brown, D.G., Burge, C.B., Cerutti, L., Chen, H.C., Church, D., Clamp, M., Copley, R.R., Doerks, T., Eddy, S.R., Eichler, E.E., Furey, T.S., Galagan, J., Gilbert, J.G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L.S., Jones, T.A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W.J., Kitts, P., Koonin, E.V., Korf, I., Kulpm, D., Lancetm, D., Lowem, T.M., McLysaght, A., Mikkelsen, T., Moran, J.V., Mulder, N., Pollara, V.J., Ponting, C.P., Schuler, G., Schultz, J., Slater, G., Smit, A.F., Stupka, E., Szustakowski, J., Thierry‐Mieg, D., Thierry‐Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y.I., Wolfe, K.H., Yang, S.P., Yeh, R.F., Collins, F., Guyer, M.S., Peterson, J., Felsenfeld, A., Wetterstrand, K.A., Patrinos, A., Morgan, M.J., de Jong, P., Catanese, JJ, Osoegawa, K., Shizuya, H., Choi, S., and Chen, Y.J. International Human Genome Sequencing Consortium 2001. 2001. Initial sequencing and analysis of the human genome. Nature 409:860‐921.
	Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. 2006. SMART 5: Domains in the context of genomes and networks. Nucl. Acids Res. 34:D257‐D260.
	Liang, Y.D. 2006. Introduction to JAVA programming: Comprehensive Version, 3rd ed. Pearson Prentice Hall, Lebanon, Ind.
	Lipman, D.J. and Pearson, W.R. 1985. Rapid and sensitive protein similarity searches. Science 227:1435‐1441.
	Møller, A. and Schwartzbach, M.I. 2006. An Introduction to XML and Web Technologies. Addison‐Wesley, Harlow, England.
	Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2007. NCBI reference sequences (RefSeq): A curated non‐redundant sequence database of genomes, transcripts and proteins. Nucl. Acids Res. 35:D61‐D65.
	Reeck, G.R., de Haen, C., Teller, D.C., Doolittle, R.F., Fitch, W.M., Dickerson, R.E., Chambon, P., McLachlan, A.D., Margoliash, E., Jukes, T.H., and Zuckerkandl, E. 1987. “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it. Cell 50:667.
	Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. 2001. Improving the accuracy of PSI‐BLAST protein database searches with composition‐based statistics and other refinements. Nucl. Acids Res. 29:2994‐3005.
	Schones, D.E. and Zhao, K. 2008. Genome‐wide approaches to studying chromatin modifications. Nat. Rev. Genet. 9:179‐191.
	Shi, J., Blundell, T.L., and Mizuguchi, K. 2001. FUGUE: Sequence‐structure homology recognition using environment‐specific substitution tables and structure‐dependent gap penalties. J. Mol. Biol. 310:243‐257.
	Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
	Solovyev, V., Kosarev, P., Seledsov, I., and Vorobyev, D. 2006. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7:S10.11‐S10.12.
	Stajich, J.E. 2007. An Introduction to BioPerl. Methods Mol. Biol. 406:535‐548.
	Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucl. Acids Res. 34:W435‐W439.
	Stein, L. 1998. Official Guide to Programming with CGI.pm. The Standard for Building Web Scripts. John Wiley & Sons, New York.
	Tatusova, T.A. and Madden, T.L. 1999. BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174:247‐250.
	Uberbacher, E.C., Xu, Y., and Mural, R.J. 1996. Discovering and understanding genes in human DNA sequence using GRAIL. Methods Enzymol. 266:259‐281.
	Ullman, L. 2006. MySQL, 2nd ed. Peachpit Press, Berkeley, Calif.
	Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., Gocayne, J.D., Amanatides, P., Ballew, R.M., Huson, D.H., Wortman, J.R., Zhang, Q., Kodira, C.D., Zheng, X.H., Chen, L., Skupski, M., Subramanian, G., Thomas, P.D., Zhang, J., Gabor Miklos, G.L., Nelson, C., Broder, S., Clark, A.G., Nadeau, J., McKusick, V.A., Zinder, N., Levine, A.J., Roberts, R.J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu‐Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A.E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T.J., Higgins, M.E., Ji, R.R., Ke, Z., Ketchum, K.A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G.V., Milshina, N., Moore, H.M., Naik, A.K., Narayan, V.A., Neelam, B., Nusskern, D., Rusch, D.B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S.C., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M.L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., y Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y.H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N.N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn‐Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J.F., Guigó, R., Campbell, M.J., Sjolander, K.V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes‐Stine, J., Caulk, P., Chiang, Y‐H., Coyne, M., Dahlke, C., Mays, A.D., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. 2001. The sequence of the human genome. Science 291:1304‐1351.
	Wang, Y., Addess, K.J., Chen, J., Geer, L.Y., He, J., He, S., Lu, S., Madej, T., Marchler‐Bauer, A., Thiessen, P.A., Zhang, N., and Bryant, S.H. 2007. MMDB: Annotating protein sequences with Entrez's 3D‐structure database. Nucl. Acids Res. 35:D298‐D300.
	Westbrook, J., Feng, Z., Jain, S., Bhat, T.N., Thanki, N., Ravichandran, V., Gilliland, G.L., Bluhm, W., Weissig, H., Greer, D.S., Bourne, P.E., Berman, H.M. 2002. The Protein Data Bank: Unifying the archive. Nucl. Acids Res. 30:245‐248.
	Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, T.L., Maglott, D.R., Miller, V., Ostell, J., Pruitt, K.D., Schuler, G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L., and Yaschenko, E. 2008. Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 36:D13‐D21.
	Wootton, J.C. and Federhen, S. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266:554‐571.
	Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O'Donovan, C., Redaschi, N., and Suzek, B. 2006. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucl. Acids Res. 34:D187‐D191.
	Zweig, A.S., Karolchik, D., Kuhn, R.M., Haussler, D., and Kent, W.J. 2008. UCSC genome browser tutorial. Genomics. 92:75‐84.
Key References
	Altschul et al., 1994. See above.
	Probably the best description of the BLAST program that produced nongapped alignments at that time. This review discusses the underlying statistics and their biological interpretation, the scoring schemes, the search, the sensitivity, and selectivity on biological examples.
	Altschul et al., 1997. See above.
	The original research paper on gapped and PSI‐BLAST. Both are significant improvements over earlier BLAST versions. Computational speed, increased sensitivity, and decreased selectivity are analyzed.
	Baxevanis and Ouellette, 2005. See above.
	A widely taught textbook that introduces pairwise sequence similarity searches, biological databases and many other areas of bioinformatics. Reviews the general concepts of alignments, scoring matrices and BLAST with practical applications and guidelines for interpretation.
	Gish and States, 1993. See above.
	Another original research paper, this one about translated BLAST. The authors evaluate the advantages and pitfalls of this application when processing introns, frameshifts, and similar issues. Besides the theory, implications on statistical significance are illustrated on examples.
	Korf et al., 2003. See above.
	An excellent overview of theory and practice of the BLAST tools as of 2003. This most comprehensive and easy‐to‐understand textbook is highly recommended to everyone in bioinformatics or computational biology.
Internet Resources
	http://blast.ncbi.nlm.nih.gov/
	The NCBI BLAST Web site.
	http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs
	The full documentation for BLAST at NCBI.
	http://www.ebi.ac.uk/blast2
	The European Bioinformatics Institute Server for the Washington University BLAST.
	http://repeatmasker.genome.washington.edu/cgi‐bin/RepeatMasker
	The RepeatMasker Web site.
	http://www.girinst.org/Censor_Server.html
	The Genetic Research Institute Web site.
	http://www.ch.embnet.org/software/COILS_form.html
	Coiled coil predictions.