丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Using the DFCI Gene Index Databases for Biological Discovery

互联网

1863
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species?specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species?specific database. Methods implemented currently include nucleotide or protein sequence queries using WU?BLAST, text?based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information. Curr. Protoc. Bioinform. 29:1.6.1?1.6.36. © 2010 by John Wiley & Sons, Inc.

Keywords: gene index database; gene index; databases; DFCI

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Identifying a Tentative Consensus (TC) Representing a Specific Sequence with BLAST
  • Alternate Protocol 1: Searching by Tentative Consensus, Expressed Transcripts, Expressed Sequence Tag, or GenBank Identifier
  • Alternate Protocol 2: Searching by Gene Ontology Functional Classification
  • Alternate Protocol 3: Searching by Radiation Hybrid Map Location (for Human, Mouse, and Rat Only)
  • Alternate Protocol 4: Search Gene Expression by Library Annotation
  • Alternate Protocol 5: Searching by Metabolic Pathway
  • Basic Protocol 2: Using the Genomic Maps with the DFCI Gene Indices
  • Basic Protocol 3: Using EGO to Identify Orthologous Groups
  • Basic Protocol 4: Using RESOURCERER
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 1.6.1 The DFCI Gene Index home page at http://compbio.dfci.harvard.edu/tgi/tgipage.html has links to the 114 species‐specific databases currently available. Other resources available include the Eukaryotic Gene Ortholog (EGO) database, the RESOURCERER utility for annotating and cross‐referencing mammalian microarray resources, and maps of the TCs to completed genome sequences.
    View Image
  •   Figure 1.6.2 The home page for the Maize Gene Index.
    View Image
  •   Figure 1.6.3 The BLAST search page allows users to query any of the DFCI Gene Index databases, as well as the EGO and RESOURCERER databases, using protein or DNA sequences.
    View Image
  •   Figure 1.6.4 The main search page for the Maize Gene Index allows users to search the database using a variety of accession numbers, including DFCI TC number, a Transcript Identifier, GenBank Accessions, and clone identifiers.
    View Image
  •   Figure 1.6.5 Gene Ontology (GO) terms and Enzyme Commission (EC) identifiers are assigned to the TCs to provide functional annotation and to provide links to metabolic pathway databases.
    View Image
  •   Figure 1.6.6 The GO browser shows the hierarchy of functional assignments for TCs identified as members of a particular functional class.
    View Image
  •   Figure 1.6.7 For humans, mouse, and rat, TCs are mapped to their respective genomes using the available radiation hybrid maps.
    View Image
  •   Figure 1.6.8 RH Mapping Data. A snippet of Mouse TCs containing markers mapped to chromosome 1.
    View Image
  •   Figure 1.6.9 The expression summary page allows each Gene Index database to be explored using information on the libraries from which the ESTs were derived.
    View Image
  •   Figure 1.6.10 The Expression Search page allows the frequency of ESTs from various libraries to be compared in order to identify differentially expressed genes based on the sources of libraries from which the ESTs were derived.
    View Image
  •   Figure 1.6.11 An example of a library‐based expression comparison. The relative abundance of ESTs is depicted using a hot/cold (red/blue) color map and significant differences between classes of ESTs are denoted by the associated R statistic (Stekel et al., ).
    View Image
  •   Figure 1.6.12 Gbrowse. ESTs from the various plant Gene Index databases are aligned to the Arabidopsis thaliana genome sequence.
    View Image
  •   Figure 1.6.13 The home page for the Eukaryotic Gene Ortholog (EGO) database.
    View Image
  •   Figure 1.6.14 A TOG alignment from the EGO database showing alignments of a possible transcription factor from A. Salmon , C.posadaii , cattle, dog, Medicago , oilseed rape, and Trout. (A ) Shows a table with all TC components of the group and their putative function. The next table shows the blast results. (B ) Shows a snippet of the sequence alignments.
    View Image
  •   Figure 1.6.15 The RESOURCERER home page allows users to select a variety of widely used microarray resources for human, mouse, and rat for annotation or cross‐platform and cross‐species comparisons. Users can also enter their own microarray platform for annotation by providing GenBank accession numbers.
    View Image
  •   Figure 1.6.16 Annotation for the Affymetrix HG_U95Av2 provided by RESOURCER includes Affymetrix Probe IDs, Clone names (when available), GenBank accessions, UniGene identifiers, DFCI TC numbers for human identified though EGO, GO terms, and annotated function, Physical map location based on alignments of the DFCI THCs, with links to the appropriate databases.
    View Image
  •   Figure 1.6.17 RESOURCERER also allows microarray platforms to be compared. Here, annotations for Affymetrix HG_U95Av2 and HG_U95C human GeneChip are compared through EGO. Only elements in common to both datasets are shown (intersection).The annotation includes Affymetrix Probe IDs, Clone names when available, GenBank IDs with links to NCBI, the TGI TC numbers for Human (THCs).
    View Image
  •   Figure 1.6.18 A sample TC report for Aedes Aegypti TC57832. (A ) At the top of each record is a FASTA‐formatted sequence representing the consensus produced by the clustering and assembly process. Immediately following that are predicted open reading frames, a graphical representation of the EST, and gene sequences that comprise the TC. (B ) Shows a table with links to a variety of resources including GenBank records, source laboratory etc; it also shows a prediction of the coding strand and the evidence used to support the assignment. (C ) Buttons provide links to expression summaries based on the libraries represented in each TC assembly, SNPs identified in the TC, and predicted 70‐mers oligos. Links to the top 5 results of the searches against a protein database, GO term and EC number assignments, and links to Metabolic Pathways in KEGG, are also given.
    View Image
  •   Figure 1.6.19 A schematic overview of the Gene Index Assembly process. For each species represented, EST sequences are downloaded from the dbEST database at the NCBI (http://www.ncbi.nlm.nih.gov/dbEST). Sequences are cleaned to remove contaminating vector, adapter, mitochondrial, ribosomal, and other sequences wherever possible. Coding sequences (annotated CDS regions) representing genes are parsed from GenBank records. All EST and gene sequences are compared pairwise using megaBLAST and grouped based on shared sequence similarity. Each cluster is then assembled at high stringency to produce Tentative Consensus (TC) sequences, which are annotated by sequence similarity search against a local copy of UNIPROT, and released through the DFCI Web site.
    View Image

Videos

Literature Cited

Literature Cited
   Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel‐Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25:25‐29.
   Boguski, M.S. and Schuler, G.D. 1995. Establishing a human transcript map. Nat. Genet. 10:369‐371.
   Cariaso, M., Folta, P., Wagner, M., Kuczmarski, T., and Lennon, G. 1999. IMAGEne I: Clustering and ranking of I.M.A.G.E. cDNA clones corresponding to known genes. Bioinformatics 15:965‐973.
   Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T., and Hide, W. 2001. STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res. 29:234‐238.
   Fitch, W.M. 1970. Distinguishing homologous from analogous proteins. Syst. Zool. 19:99‐113.
   Hatzigeorgiou, A.G., Fiziev, P., and Reczko, M. 2001. DIANA‐EST: A statistical analysis. Bioinformatics 17:913‐919.
   Hogenesch, J.B., Ching, K.A., Batalov, S., Su, A.I., Walker, J.R., Zhou, Y., Kay, S.A., Schultz, P.G., and Cooke, M.P. 2001. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 106:413‐415.
   Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9:868‐877.
   International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860‐921.
   Iseli, C., Jongeneel, C.V., and Bucher, P. 1999. ESTScan: A program for detecting, evaluating and reconstructing potential coding regions in EST sequences. In ISMB ‘99 (Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology) pp. 138‐148. AAAI Press, Menlo Park, Calif.
   Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28:27‐30.
   Lee, Y., Sultana, R., Pertea, G., Cho, J., Karamycheva, S., Tsai, T., Parvizi, B., Cheung, F., Antonescu, V., White, J., Holt, I., Liang, F., and Quackenbush, J. 2002. Cross‐referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. 12:493‐502.
   Liang, F., Holt, I., Pertea, G., Karamycheva, S., Salzberg, S.L., and Quackenbush, J. 2000. An optimized protocol for analysis of EST sequences. Nucleic Acids Res. 28:3657‐3665.
   Makalowski, W. and Boguski M.S. 1998. Evolutionary parameters of the transcribed mammalian genome: An analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad Sci. U.S.A. 95:9407‐9412.
   Pertea, G., Huang, X., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. 2003. TIGR Gene Indices clustering tools (TGICL): A software system for fast clustering of large EST datasets. Bioinformatics 19:651‐652.
   Quackenbush, J., Liang, F., Holt, I., Pertea, G., and Upton, J. 2000. The TIGR gene indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 28:141‐145.
   Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R., and White, J. 2001. The TIGR Gene Indices: Analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29:159‐164.
   Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. 1995. Quantitative monitoring of gene expression patterns with complementary DNA microarray. Science 270:467‐470.
   Schuler, G.D. 1997. Sequence mapping by electronic PCR. Genome Res. 7:541‐550.
   Smith, T.P., Grosse, W.M., Freking, B.A., Roberts, A.J., Stone, R.T., Casas, E., Wray, J.E., White, J., Cho, J., Fahrenkrug, S.C., Bennett, G.L., Heaton, M.P., Laegreid, W.W., Rohrer, G.A., Chitko‐McKown, C.G., Pertea, G., Holt, I., Karamycheva, S., Liang, F., Quackenbush, J., and Keele, J.W. 2001. Sequence evaluation of four pooled‐tissue normalized bovine cDNA libraries and construction of a gene index for cattle. Genome Res. 11:626‐630.
   Stekel, D.J., Git, Y., and Falciani, F. 2000. The comparison of gene expression from multiple cDNA libraries. Genome Res. 10:2055‐2061.
   Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
   Tsai, J., Sultana, R., Lee, Y., Pertea, G., Karamycheva, S., Antonescu, V., Cho, J., Parvizi, B., Cheung, F., and Quackenbush, J. 2001. RESOURCERER: A database for annotating and linking microarray resources within and across species. Genome Biol. 2:software0002.1‐software0002.4.
   Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial analysis of gene expression. Science 270:484‐487.
   Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., Gocayne, J.D., Amanatides, P., Ballew, R.M., Huson, D.H., Wortman, J.R., Zhang, Q., Kodira, C.D., Zheng, X.H., Chen, L., Skupski, M., Subramanian, G., Thomas, P.D., Zhang, J., Gabor Miklos, G.L., Nelson, C., Broder, S., Clark, A.G., Nadeau, J., McKusick, V.A., Zinder, N., Levine, A.J., Roberts, R.J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu‐Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A.E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T.J., Higgins, M.E., Ji, R.R., Ke, Z., Ketchum, K.A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G.V., Milshina, N., Moore, H.M., Naik, A.K., Narayan, V.A., Neelam, B., Nusskern, D., Rusch, D.B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M.L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y.H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N.N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn‐Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J.F., Guigó, R., Campbell, M.J., Sjolander, K.V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes‐Stine, J., Caulk, P., Chiang, Y.H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. 2001. The sequence of the human genome. Science 291:1304‐1351.
   Yu, J., Hu, S., Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Cao, M., Liu, J., Sun, J., Tang, J., Chen, Y., Huang, X., Lin, W., Ye, C., Tong, W., Cong, L., Geng, J., Han, Y., Li, L., Li, W., Hu, G., Huang, X., Li, W., Li, J., Liu, Z., Li, L., Liu, J., Qi, Q., Liu, J., Li, L., Li, T., Wang, X., Lu, H., Wu, T., Zhu, M., Ni, P., Han, H., Dong, W., Ren, X., Feng, X., Cui, P., Li, X., Wang, H., Xu, X., Zhai, W., Xu, Z., Zhang, J., He, S., Zhang, J., Xu, J., Zhang, K., Zheng, X., Dong, J., Zeng, W., Tao, L., Ye, J., Tan, J., Ren, X., Chen, X., He, J., Liu, D., Tian, W., Tian, C., Xia, H., Bao, Q., Li, G., Gao, H., Cao, T., Wang, J., Zhao, W., Li, P., Chen, W., Wang, X., Zhang, Y., Hu, J., Wang, J., Liu, S., Yang, J., Zhang, G., Xiong, Y., Li, Z., Mao, L., Zhou, C., Zhu, Z., Chen, R., Hao, B., Zheng, W., Chen, S., Guo, W., Li, G., Liu, S., Tao, M., Wang, J., Zhu, L., Yuan, L., and Yang, H. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79‐92.
   Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. 2000. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7:203‐214.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序