丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Using Chado to Store Genome Annotation Data

互联网

664
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

Chado is a relational database schema that can be used to manage a wide variety of biological information, including genome annotation, genetic, phenotypic, and expression data. Its flexibility comes from its use of ?ontologies,? which are controlled vocabularies that describe data types and the relationships among them. By changing its ontologies, Chado can be customized to suit many different needs. Another aspect that gives Chado its flexibility is its use of a modular design, which means that users can choose to use only those features of Chado that are suitable for their needs. XORT is the main software tool used to move data in and out of Chado databases. XORT uses an XML?based file format for data import and export; this format is called ChadoXML, The protocols described in this chapter show how to use XORT and related software to import genome annotation data into Chado databases, and how to export data stored in Chado databases into different file formats for report and data mining purposes.

Keywords: Chado; genome; annotation; database; XORT; GAME; GMOD

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Installing Chado and XORT in the Unix/Linux Environment
  • Basic Protocol 2: Building a Chado Annotation Database
  • Basic Protocol 3: Loading a GenBank File
  • Basic Protocol 4: Querying a Chado Annotation Database Using SQL
  • Basic Protocol 5: Generating Standard Reports from a Chado Annotation Database
  • Support Protocol 1: Installing Software for a Unix‐Like Environment on a PC
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 9.6.1 A schematic representation of the protocols and the organizational relationship between the protocols and data flow for Chado.
    View Image
  •   Figure 9.6.2 GAME XML format, which is one of the input formats for the annotation editor Apollo.
    View Image
  •   Figure 9.6.3 Structure for ChadoXML, which serves as intermediate format between Chado database and other file formats.
    View Image
  •   Figure 9.6.4 FEATURES section of GenBank record to be loaded into Chado.
    View Image
  •   Figure 9.6.5 FEATURES section of GenBank record to be loaded into Chado, modified to reflect chromosomal coordinates.
    View Image
  •   Figure 9.6.6 Example query to retrieve location information for the gene oaf .
    View Image
  •   Figure 9.6.7 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.8 Example query to get transcripts and their locations for the gene oaf .
    View Image
  •   Figure 9.6.9 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.10 Example query to get exons and their locations for a given transcript, “oaf‐RB.”.
    View Image
  •   Figure 9.6.11 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.12 Example query to get exons and their locations for the gene oaf .
    View Image
  •   Figure 9.6.13 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.14 Example query to list types of analysis available and sets of data used in the analysis for a given genomic region (arm 2L, bases 1 to 49,999).
    View Image
  •   Figure 9.6.15 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.16 Example query to list aligned objects for a given genomic region (arm 2L, bases 1 to 49,999).
    View Image
  •   Figure 9.6.17 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.18 Example query to retrieve the alignment details for the alignment of a given sequence against the chromosome arm (e.g., GenBank record “AY129461”).
    View Image
  •   Figure 9.6.19 Results returned for the query depicted in Figure .
    View Image
  •   Figure 9.6.20 Examples of conf/bulkfiles/fbreleases.xml and conf/bulkfiles/fbbulk‐hetr3.xml files modified to reflect the database.
    View Image
  •   Figure 9.6.21 Example of commands used for generating report files.
    View Image
  •   Figure 9.6.22 Screen shot of the Cygwin setup window.
    View Image
  •   Figure 9.6.23 Groups that must be installed in order to install Cygwin.
    View Image
  •   Figure 9.6.24 The Central Dogma model for a protein‐coding gene with one known spliced transcript. The dashed lines denote the featureloc records of features aligned to the genomic contig, while the solid lines denote the feature_relationship records between two features (subject and object).
    View Image
  •   Figure 9.6.25 Data implementation of prediction and alignment evidence in Chado to support genome annotation. The dashed line denotes the featureloc of features aligned to genomic contig, while solid line denotes the feature relationship between two features.
    View Image
  •   Figure 9.6.26 The “rebase” error message from Cygwin.
    View Image

Videos

Literature Cited

   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 5:215:403‐10.
   Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94.
   Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8:967‐974.
   Lewis, S.E., Searle, S.M.J., Harris, N., Gibson, M., Iyer, V., Richter, J., Wiel, C., Bayraktarogly, L., Birney, E., Crosby, M.A., Kaminker, J.S., Matthews, B.B., Prochnik, S.E., Smith, C.D., Tupy, J.L., Rubin, G.M., Misra, S., Mungall, C.J., and Clamp, M.E. 2002. Apollo: A sequence annotation editor. Genome Biol. 3(12).
   Mungall, C.J., Misra, S., Berman, B.P., Carlson, J., Frise, E., Harris, N., Marshall, B., Shu, S., Kaminker, J.S., Prochnik, S.E., Smith, C.D., Smith, E., Tupy, J.L., Wiel, C., Rubin, G.M., and Lewis, S.E. 2002. An integrated computational pipeline and database to support whole‐genome sequence annotation. Genome Biol. 3(12).
   Reese, M.G., Kulp, D., Tammana, H., and Haussler, D. 2000. Genie: Gene finding in Drosophila melanogaster. Genome Res. 10:529‐538.
   Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12:1599‐1610.
Internet Resources
   http://www.gmod.org
   Web site of GMOD.
   http://www.flybase.org
   Web site of FlyBase.
   http://www.fruitfly.org/annot/gamexml.dtd.txt
   Location of GAME XML DTD.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序