Using the Blocks Database to Recognize Functional Domains

互联网2013-12-31

1149

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

Blocks are ungapped multiple alignments of related protein sequence segments that correspond to the most conserved regions of the proteins. The Blocks Database is a collection of blocks representing known protein families that can be used to compare a protein or DNA sequence with documented families of proteins. Protocols in this unit describe the analysis of proteins and families using Blocks?based tools, including searching, exploring relationships with trees, making new blocks, and designing PCR primers from blocks for isolating homologous sequences.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Basic Protocol 1: Exploring Protein Families Using the Blocks Database
Support Protocol 1: Search Blocks Versus Other Databases
Basic Protocol 2: Analyzing Protein Sequences with the Block Searcher
Basic Protocol 3: Analyzing DNA Sequences with the Block Searcher
Basic Protocol 4: Viewing Trees Based on Blocks
Basic Protocol 5: Using Block Maker
Basic Protocol 6: Designing Primers from Blocks
Guidelines for Understanding Results
Commentary
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Exploring Protein Families Using the Blocks Database

Necessary Resources

Hardware
- Workstation, personal computer, or terminal connected to the Internet

Software
- Any type of Web browser for the Web interface
- Either Chime or Rasmol helper application to view protein structures using a browser

Support Protocol 1: Search Blocks Versus Other Databases

Necessary Resources

Hardware
- Workstation, personal computer, or terminal connected to the Internet. The programs can be installed on common Unix workstations.

Software
- E‐mail program for the E‐mail interface
- Web browser for the Web interface
- Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.

Files
- Query sequences are accepted in FASTA or GenBank format ( appendix 1B )

Basic Protocol 2: Analyzing Protein Sequences with the Block Searcher

Necessary Resources

Hardware
- Workstation, personal computer, or terminal connected to the Internet. The programs can be installed on common Unix workstations.

Software
- E‐mail program for the E‐mail interface
- Web browser for the Web interface
- Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.

Files
- Query sequences are accepted in FASTA or GenBank format ( appendix 1B )

Basic Protocol 3: Analyzing DNA Sequences with the Block Searcher

Necessary Resources

Hardware
- Workstation, personal computer, or terminal connected to the Internet

Software
- Any type of Web browser

Files
- None

Basic Protocol 4: Viewing Trees Based on Blocks

Necessary Resources

Hardware
- Workstation, personal computer, or terminal connected to the Internet. The programs can be installed on common Unix workstations.

Software
- E‐mail program for the E‐mail interface
- Web browser for the Web interface
- Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.

Files
- Query sequences are accepted in FASTA or GenBank format ( appendix 1B )

Basic Protocol 5: Using Block Maker

Necessary Resources

Hardware
- Workstation, personal computer, or terminal connected to the Internet for the Web interface. The programs can be installed on common Unix workstations.

Software
- Web browser for the Web interface
- Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.

Files
- Input is in Blocks format as described at http://blocks.fhcrc.org/block_format.html.
- Utilities are available at http://blocks.fhcrc.org/process_blocks.html to convert common multiple alignment formats to Blocks format.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 2.2.1 The Blocks Web site home page (http://blocks.fhcrc.org).

View Image

Figure 2.2.2 Top of the Blocks Database entry page for the C‐5 cytosine‐specific DNA methylase family. The blocks accession number is IPB001525 and the sequences used to make the blocks were taken from InterPro entry IPR001525.

View Image

Figure 2.2.3 One page of the second block representing the C‐5 cytosine‐specific DNA methylase family, IPB001525B, showing the block header lines and some of the 158 sequence segments included in the block.

View Image

Figure 2.2.4 Sequence logos for the IPB001525 blocks, showing the multiple alignments graphically.

View Image

Figure 2.2.5 3D Blocks output for 6MHT showing the IPB001525 blocks on the structure.

View Image

Figure 2.2.6 Part of the MAST output generated by selecting the MAST Search link in Figure . The query in this search was constructed from six position‐specific scoring matrices computed from the six IPB001525 blocks, and the database was Drosophila protein sequences. GenBank entry AAF53163.1 is the top hit.

View Image

Figure 2.2.7 The upper part of the Block Searcher input form.

View Image

Figure 2.2.8 The Block Searcher result from searching GenBank entry AAF53163.1 against the Blocks Database, showing five of the six IPB001525 blocks in the top hit.

View Image

Figure 2.2.9 Segment of GenBank AE003635.1 that includes the AAF53163.1 coding region. The DNA sequence is used with Block Searcher to overcome an error in AAF53163.1.

View Image

Figure 2.2.10 The Block Searcher result from searching a segment of GenBank entry AE003635.1 against the Blocks Database, showing all six of the IPB001525 blocks in the top hit. The query most closely resembles PMT1_SCHPO in blocks A, C, E, and F.

View Image

Figure 2.2.11 Corrected version of protein sequence AAF53163.1.

View Image

Figure 2.2.12 The first page of a phylogenetic tree made from the block regions of the 158 sequences included in IPB001525 as displayed by the ProWeb TreeViewer.

View Image

Figure 2.2.13 The subclade of the IPB001525 tree that includes PMT1_SCHPO as displayed by the ProWeb TreeViewer showing options for subclade analysis, including a link to Block Maker.

View Image

Figure 2.2.14 The Block Maker input form obtained by clicking the Block Maker link in Figure with the full‐length sequences for the subclade inserted.

View Image

Figure 2.2.15 The Block Maker result from the subclade selected in Figure plus the corrected Dnmt2 sequence (Fig. ) using the MOTIF motif finder.

View Image

Figure 2.2.16 The Block Maker result from the subclade selected in Figure plus the corrected Dnmt2 sequence (Fig. ) using the Gibbs motif finder.

View Image

Figure 2.2.17 The CODEHOP input form obtained by clicking the CODEHOP link in Figure with the Block Maker Gibbs blocks inserted.

View Image

Figure 2.2.18 Part of the CODEHOP result showing suggested PCR primers with maximum degeneracy set to 32.

View Image

Videos

Literature Cited

Literature Cited
	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
	Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Holo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. 2000. InterPro—An integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16:1145‐1150.
	Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordia, P., Selley, J.N., and Wright, W. 2000. PRINTS‐S: The database formerly known as PRINTS. Nucleic Acids Res. 28:225‐227.
	Bailey, T. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology pp. 28‐36. AAAI Press, Menlo Park, Calif.
	Bailey, T.L. and Gribskov, M. 1998. Combining evidence using p‐values: Application to sequence homology searches. Bioinformatics 14:48‐54.
	Bairoch, A. and Apweiler, R. 2000. The SWISS‐PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28:45‐48.
	Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffith‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280.
	Hall, B.G. 2001. Phylogenetic Trees Made Easy: A How‐To Manual for Molecular Biologists. Sinauer Press, Sunderland, Mass.
	Henikoff, S. 1991. Playing with blocks: Some pitfalls of forcing multiple alignments. New Biol. 3:1148‐1154.
	Henikoff, S. and Henikoff, J.G. 1991. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19:6565‐6572.
	Henikoff, S. and Henikoff, J.G. 1994. Position‐based sequence weights. J. Mol. Biol. 243:574‐578.
	Henikoff, J.G. and Henikoff, S. 1996. Using substitution probabilities to improve position‐specific scoring matrices. Comput. Appl. Biosci. 12:135‐143.
	Henikoff, S. and Henikoff, J.G. 1997. Embedding strategies for effective use of multiple sequence alignment information. Protein Sci. 6:698‐705.
	Henikoff, S., Henikoff, J.G., Alford, W.J., and Pietrokovski, S. 1995. Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163:GC17‐GC26.
	Huang, J.Y. and Brutlag, D.L. 2001. The eMOTIF database. Nucleic Acids Res. 29:202‐204.
	Kunin, V., Chan, B., Sitbon, E., Lithwick, G., and Pietrokovski, S. 2001. Consistency analysis of similarity between multiple alignments: Prediction of protein function and fold structure from analysis of local sequence motifs. J. Mol. Biol. 307:939‐949.
	Mount, D.W. 2001. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,NY.
	Neuwald, A.F., Liu, J.S., and Lawrence, C.E. 1995. Gibbs motif sampling: Detection of bacterial outer membrane protein repeats. Protein Sci. 4:1618‐1632.
	Ng, P.C. and Henikoff, S. 2001. Predicting deleterious amino acid substitutions. Genome Res. 11:863‐874.
	Ng, P.C. and Henikoff, S. 2002. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12:436‐446.
	Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Meth. Enzymol. 183:63‐98.
	Pietrokovski, S. 1996. Searching databases of conserved sequence regions by aligning protein multiple‐alignments. Nucleic Acids. Res. 24:3836‐3845.
	Pietrokovski, S. and Henikoff, S. 1997. A helix‐turn‐helix DNA‐binding motif predicted for transposases of DNA transposons. Mol. Gen. Gent. 254:689‐695.
	Pietrokovski, S., Henikoff, J.G., and Henikoff, S. 1998. Exploring protein homology with the Blocks server. Trends Genet. 14:162‐163.
	Pinarbasi, E., Elliott, J., and Hornby, D..P. 1996. Activation of a yeast pseudo DNA methyltransferase by deletion of a single amino acid. J. Mol. Biol. 257:804‐813.
	Rose, T.M., Schultz, E.R., Henikoff, J.G., Pietrokovski, S., McCallum, C.M., and Henikoff, S. 1998. Consensus‐degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res. 26:1628‐1635.
	Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406‐425.
	Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L., and Altschul, S.F. 1999. IMPALA: Matching a protein sequence against a collection of PSI‐BLAST‐constructed position‐specific score matrices. Bioinformatics 15:1000‐1011.
	Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. 2001. Improving the accuracy of PSI‐BLAST protein database searches with composition‐based statistics and other refinements. Nucleic Acids Res. 29:2994‐3005.
	Schneider, T.D. and Stephens, R.M. 1990. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 18:6097‐6100.
	Silverstein, K.A., Shoop, E., Johnson, J.E., and Retzel, E.F. 2001. MetaFam: A unified classification of protein families. I. Overview and statistics. Bioinformatics 17:249‐261.
	Smith, H.O., Annau, T.M., and Chandrasegaran, S. 1990. Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. U.S.A. 87:826‐830.
	Tatusov, R.L., Altschul, S.F., and Koonin, E.V. 1994. Detection of conserved segments in proteins: Iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. U.S.A. 91:12091‐12095.
	Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
	Waskiewicz, A.J., Rikhof, H.A., Hernandez, R.E., and Moens, C.B. 2001. Zebrafish Meis functions to stabilize Pbx proteins and regulate hindbrain patterning. Development 128:4139‐4151.
	Wootton, J.C. and Federhen, S. 1993. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149‐163.
Key References
	Henikoff and Henikoff, 1991. See above.
	Introduces the Blocks Database, how it is constructed using PROTOMAT and how it is searched using Block Searcher.
	Pietrokovski, 1996. See above.
	Introduces LAMA for searching blocks versus a database of blocks as an example of searching multiple alignments against one another for sensitive detection of motifs.
	Rose et al., 1998. See above.
	Describes the CODEHOP strategy for detecting distant homologs using PCR and the Web‐based implementation for designing optimal CODEHOP primers.
Internet Resources
	http://blocks.fhcrc.org
	This is the Blocks Web page.
	http://www.proweb.org
	This is the ProWeb Web page.