Using the Blocks Database to Recognize Functional Domains
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
Blocks are ungapped multiple alignments of related protein sequence segments that correspond to the most conserved regions of the proteins. The Blocks Database is a collection of blocks representing known protein families that can be used to compare a protein or DNA sequence with documented families of proteins. Protocols in this unit describe the analysis of proteins and families using Blocks?based tools, including searching, exploring relationships with trees, making new blocks, and designing PCR primers from blocks for isolating homologous sequences.
Table of Contents
- Basic Protocol 1: Exploring Protein Families Using the Blocks Database
- Support Protocol 1: Search Blocks Versus Other Databases
- Basic Protocol 2: Analyzing Protein Sequences with the Block Searcher
- Basic Protocol 3: Analyzing DNA Sequences with the Block Searcher
- Basic Protocol 4: Viewing Trees Based on Blocks
- Basic Protocol 5: Using Block Maker
- Basic Protocol 6: Designing Primers from Blocks
- Guidelines for Understanding Results
- Commentary
- Figures
Materials
Basic Protocol 1: Exploring Protein Families Using the Blocks Database
Necessary Resources
Support Protocol 1: Search Blocks Versus Other Databases
Necessary Resources
Basic Protocol 2: Analyzing Protein Sequences with the Block Searcher
Necessary Resources
Basic Protocol 3: Analyzing DNA Sequences with the Block Searcher
Necessary Resources
Basic Protocol 4: Viewing Trees Based on Blocks
Necessary Resources
Basic Protocol 5: Using Block Maker
Necessary Resources
|
Figures
-
Figure 2.2.1 The Blocks Web site home page (http://blocks.fhcrc.org). View Image -
Figure 2.2.2 Top of the Blocks Database entry page for the C‐5 cytosine‐specific DNA methylase family. The blocks accession number is IPB001525 and the sequences used to make the blocks were taken from InterPro entry IPR001525. View Image -
Figure 2.2.3 One page of the second block representing the C‐5 cytosine‐specific DNA methylase family, IPB001525B, showing the block header lines and some of the 158 sequence segments included in the block. View Image -
Figure 2.2.4 Sequence logos for the IPB001525 blocks, showing the multiple alignments graphically. View Image -
Figure 2.2.5 3D Blocks output for 6MHT showing the IPB001525 blocks on the structure. View Image -
Figure 2.2.6 Part of the MAST output generated by selecting the MAST Search link in Figure . The query in this search was constructed from six position‐specific scoring matrices computed from the six IPB001525 blocks, and the database was Drosophila protein sequences. GenBank entry AAF53163.1 is the top hit. View Image -
Figure 2.2.7 The upper part of the Block Searcher input form. View Image -
Figure 2.2.8 The Block Searcher result from searching GenBank entry AAF53163.1 against the Blocks Database, showing five of the six IPB001525 blocks in the top hit. View Image -
Figure 2.2.9 Segment of GenBank AE003635.1 that includes the AAF53163.1 coding region. The DNA sequence is used with Block Searcher to overcome an error in AAF53163.1. View Image -
Figure 2.2.10 The Block Searcher result from searching a segment of GenBank entry AE003635.1 against the Blocks Database, showing all six of the IPB001525 blocks in the top hit. The query most closely resembles PMT1_SCHPO in blocks A, C, E, and F. View Image -
Figure 2.2.11 Corrected version of protein sequence AAF53163.1. View Image -
Figure 2.2.12 The first page of a phylogenetic tree made from the block regions of the 158 sequences included in IPB001525 as displayed by the ProWeb TreeViewer. View Image -
Figure 2.2.13 The subclade of the IPB001525 tree that includes PMT1_SCHPO as displayed by the ProWeb TreeViewer showing options for subclade analysis, including a link to Block Maker. View Image -
Figure 2.2.14 The Block Maker input form obtained by clicking the Block Maker link in Figure with the full‐length sequences for the subclade inserted. View Image -
Figure 2.2.15 The Block Maker result from the subclade selected in Figure plus the corrected Dnmt2 sequence (Fig. ) using the MOTIF motif finder. View Image -
Figure 2.2.16 The Block Maker result from the subclade selected in Figure plus the corrected Dnmt2 sequence (Fig. ) using the Gibbs motif finder. View Image -
Figure 2.2.17 The CODEHOP input form obtained by clicking the CODEHOP link in Figure with the Block Maker Gibbs blocks inserted. View Image -
Figure 2.2.18 Part of the CODEHOP result showing suggested PCR primers with maximum degeneracy set to 32. View Image
Videos
Literature Cited
Literature Cited | |
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. | |
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402. | |
Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Holo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. 2000. InterPro—An integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16:1145‐1150. | |
Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordia, P., Selley, J.N., and Wright, W. 2000. PRINTS‐S: The database formerly known as PRINTS. Nucleic Acids Res. 28:225‐227. | |
Bailey, T. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology pp. 28‐36. AAAI Press, Menlo Park, Calif. | |
Bailey, T.L. and Gribskov, M. 1998. Combining evidence using p‐values: Application to sequence homology searches. Bioinformatics 14:48‐54. | |
Bairoch, A. and Apweiler, R. 2000. The SWISS‐PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28:45‐48. | |
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffith‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280. | |
Hall, B.G. 2001. Phylogenetic Trees Made Easy: A How‐To Manual for Molecular Biologists. Sinauer Press, Sunderland, Mass. | |
Henikoff, S. 1991. Playing with blocks: Some pitfalls of forcing multiple alignments. New Biol. 3:1148‐1154. | |
Henikoff, S. and Henikoff, J.G. 1991. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19:6565‐6572. | |
Henikoff, S. and Henikoff, J.G. 1994. Position‐based sequence weights. J. Mol. Biol. 243:574‐578. | |
Henikoff, J.G. and Henikoff, S. 1996. Using substitution probabilities to improve position‐specific scoring matrices. Comput. Appl. Biosci. 12:135‐143. | |
Henikoff, S. and Henikoff, J.G. 1997. Embedding strategies for effective use of multiple sequence alignment information. Protein Sci. 6:698‐705. | |
Henikoff, S., Henikoff, J.G., Alford, W.J., and Pietrokovski, S. 1995. Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163:GC17‐GC26. | |
Huang, J.Y. and Brutlag, D.L. 2001. The eMOTIF database. Nucleic Acids Res. 29:202‐204. | |
Kunin, V., Chan, B., Sitbon, E., Lithwick, G., and Pietrokovski, S. 2001. Consistency analysis of similarity between multiple alignments: Prediction of protein function and fold structure from analysis of local sequence motifs. J. Mol. Biol. 307:939‐949. | |
Mount, D.W. 2001. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,NY. | |
Neuwald, A.F., Liu, J.S., and Lawrence, C.E. 1995. Gibbs motif sampling: Detection of bacterial outer membrane protein repeats. Protein Sci. 4:1618‐1632. | |
Ng, P.C. and Henikoff, S. 2001. Predicting deleterious amino acid substitutions. Genome Res. 11:863‐874. | |
Ng, P.C. and Henikoff, S. 2002. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12:436‐446. | |
Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Meth. Enzymol. 183:63‐98. | |
Pietrokovski, S. 1996. Searching databases of conserved sequence regions by aligning protein multiple‐alignments. Nucleic Acids. Res. 24:3836‐3845. | |
Pietrokovski, S. and Henikoff, S. 1997. A helix‐turn‐helix DNA‐binding motif predicted for transposases of DNA transposons. Mol. Gen. Gent. 254:689‐695. | |
Pietrokovski, S., Henikoff, J.G., and Henikoff, S. 1998. Exploring protein homology with the Blocks server. Trends Genet. 14:162‐163. | |
Pinarbasi, E., Elliott, J., and Hornby, D..P. 1996. Activation of a yeast pseudo DNA methyltransferase by deletion of a single amino acid. J. Mol. Biol. 257:804‐813. | |
Rose, T.M., Schultz, E.R., Henikoff, J.G., Pietrokovski, S., McCallum, C.M., and Henikoff, S. 1998. Consensus‐degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res. 26:1628‐1635. | |
Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406‐425. | |
Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L., and Altschul, S.F. 1999. IMPALA: Matching a protein sequence against a collection of PSI‐BLAST‐constructed position‐specific score matrices. Bioinformatics 15:1000‐1011. | |
Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. 2001. Improving the accuracy of PSI‐BLAST protein database searches with composition‐based statistics and other refinements. Nucleic Acids Res. 29:2994‐3005. | |
Schneider, T.D. and Stephens, R.M. 1990. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 18:6097‐6100. | |
Silverstein, K.A., Shoop, E., Johnson, J.E., and Retzel, E.F. 2001. MetaFam: A unified classification of protein families. I. Overview and statistics. Bioinformatics 17:249‐261. | |
Smith, H.O., Annau, T.M., and Chandrasegaran, S. 1990. Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. U.S.A. 87:826‐830. | |
Tatusov, R.L., Altschul, S.F., and Koonin, E.V. 1994. Detection of conserved segments in proteins: Iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. U.S.A. 91:12091‐12095. | |
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680. | |
Waskiewicz, A.J., Rikhof, H.A., Hernandez, R.E., and Moens, C.B. 2001. Zebrafish Meis functions to stabilize Pbx proteins and regulate hindbrain patterning. Development 128:4139‐4151. | |
Wootton, J.C. and Federhen, S. 1993. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149‐163. | |
Key References | |
Henikoff and Henikoff, 1991. See above. | |
Introduces the Blocks Database, how it is constructed using PROTOMAT and how it is searched using Block Searcher. | |
Pietrokovski, 1996. See above. | |
Introduces LAMA for searching blocks versus a database of blocks as an example of searching multiple alignments against one another for sensitive detection of motifs. | |
Rose et al., 1998. See above. | |
Describes the CODEHOP strategy for detecting distant homologs using PCR and the Web‐based implementation for designing optimal CODEHOP primers. | |
Internet Resources | |
http://blocks.fhcrc.org | |
This is the Blocks Web page. | |
http://www.proweb.org | |
This is the ProWeb Web page. |