Using Chado to Store Genome Annotation Data
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
Chado is a relational database schema that can be used to manage a wide variety of biological information, including genome annotation, genetic, phenotypic, and expression data. Its flexibility comes from its use of ?ontologies,? which are controlled vocabularies that describe data types and the relationships among them. By changing its ontologies, Chado can be customized to suit many different needs. Another aspect that gives Chado its flexibility is its use of a modular design, which means that users can choose to use only those features of Chado that are suitable for their needs. XORT is the main software tool used to move data in and out of Chado databases. XORT uses an XML?based file format for data import and export; this format is called ChadoXML, The protocols described in this chapter show how to use XORT and related software to import genome annotation data into Chado databases, and how to export data stored in Chado databases into different file formats for report and data mining purposes.
Keywords: Chado; genome; annotation; database; XORT; GAME; GMOD
Table of Contents
- Basic Protocol 1: Installing Chado and XORT in the Unix/Linux Environment
- Basic Protocol 2: Building a Chado Annotation Database
- Basic Protocol 3: Loading a GenBank File
- Basic Protocol 4: Querying a Chado Annotation Database Using SQL
- Basic Protocol 5: Generating Standard Reports from a Chado Annotation Database
- Support Protocol 1: Installing Software for a Unix‐Like Environment on a PC
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Figures
-
Figure 9.6.1 A schematic representation of the protocols and the organizational relationship between the protocols and data flow for Chado. View Image -
Figure 9.6.2 GAME XML format, which is one of the input formats for the annotation editor Apollo. View Image -
Figure 9.6.3 Structure for ChadoXML, which serves as intermediate format between Chado database and other file formats. View Image -
Figure 9.6.4 FEATURES section of GenBank record to be loaded into Chado. View Image -
Figure 9.6.5 FEATURES section of GenBank record to be loaded into Chado, modified to reflect chromosomal coordinates. View Image -
Figure 9.6.6 Example query to retrieve location information for the gene oaf . View Image -
Figure 9.6.7 Results returned for the query depicted in Figure . View Image -
Figure 9.6.8 Example query to get transcripts and their locations for the gene oaf . View Image -
Figure 9.6.9 Results returned for the query depicted in Figure . View Image -
Figure 9.6.10 Example query to get exons and their locations for a given transcript, “oaf‐RB.”. View Image -
Figure 9.6.11 Results returned for the query depicted in Figure . View Image -
Figure 9.6.12 Example query to get exons and their locations for the gene oaf . View Image -
Figure 9.6.13 Results returned for the query depicted in Figure . View Image -
Figure 9.6.14 Example query to list types of analysis available and sets of data used in the analysis for a given genomic region (arm 2L, bases 1 to 49,999). View Image -
Figure 9.6.15 Results returned for the query depicted in Figure . View Image -
Figure 9.6.16 Example query to list aligned objects for a given genomic region (arm 2L, bases 1 to 49,999). View Image -
Figure 9.6.17 Results returned for the query depicted in Figure . View Image -
Figure 9.6.18 Example query to retrieve the alignment details for the alignment of a given sequence against the chromosome arm (e.g., GenBank record “AY129461”). View Image -
Figure 9.6.19 Results returned for the query depicted in Figure . View Image -
Figure 9.6.20 Examples of conf/bulkfiles/fbreleases.xml and conf/bulkfiles/fbbulk‐hetr3.xml files modified to reflect the database. View Image -
Figure 9.6.21 Example of commands used for generating report files. View Image -
Figure 9.6.22 Screen shot of the Cygwin setup window. View Image -
Figure 9.6.23 Groups that must be installed in order to install Cygwin. View Image -
Figure 9.6.24 The Central Dogma model for a protein‐coding gene with one known spliced transcript. The dashed lines denote the featureloc records of features aligned to the genomic contig, while the solid lines denote the feature_relationship records between two features (subject and object). View Image -
Figure 9.6.25 Data implementation of prediction and alignment evidence in Chado to support genome annotation. The dashed line denotes the featureloc of features aligned to genomic contig, while solid line denotes the feature relationship between two features. View Image -
Figure 9.6.26 The “rebase” error message from Cygwin. View Image
Videos
Literature Cited
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 5:215:403‐10. | |
Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94. | |
Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8:967‐974. | |
Lewis, S.E., Searle, S.M.J., Harris, N., Gibson, M., Iyer, V., Richter, J., Wiel, C., Bayraktarogly, L., Birney, E., Crosby, M.A., Kaminker, J.S., Matthews, B.B., Prochnik, S.E., Smith, C.D., Tupy, J.L., Rubin, G.M., Misra, S., Mungall, C.J., and Clamp, M.E. 2002. Apollo: A sequence annotation editor. Genome Biol. 3(12). | |
Mungall, C.J., Misra, S., Berman, B.P., Carlson, J., Frise, E., Harris, N., Marshall, B., Shu, S., Kaminker, J.S., Prochnik, S.E., Smith, C.D., Smith, E., Tupy, J.L., Wiel, C., Rubin, G.M., and Lewis, S.E. 2002. An integrated computational pipeline and database to support whole‐genome sequence annotation. Genome Biol. 3(12). | |
Reese, M.G., Kulp, D., Tammana, H., and Haussler, D. 2000. Genie: Gene finding in Drosophila melanogaster. Genome Res. 10:529‐538. | |
Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12:1599‐1610. | |
Internet Resources | |
http://www.gmod.org | |
Web site of GMOD. | |
http://www.flybase.org | |
Web site of FlyBase. | |
http://www.fruitfly.org/annot/gamexml.dtd.txt | |
Location of GAME XML DTD. |