丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Using geneid to Identify Genes

互联网

877
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

This unit describes the usage of geneid, an efficient gene?finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. Geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. Geneid software is in the public domain, and it is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene finding tools.

Keywords: Gene identification; genes; exons; splicing; genome annotation; bioinformatics

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Using the geneid Unix Application to Predict Genes
  • Basic Protocol 2: Visualizing geneid Predictions
  • Basic Protocol 3: Using External Information to Solidify geneid Predictions
  • Alternate Protocol 1: Using the geneid Web Server to Predict Genes
  • Support Protocol 1: How to Get geneid and Visualization Programs
  • Guidelines for Understanding Results
  • Commentary
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 4.3.1 Default geneid prediction on sequence example1. The fields, from left to right, are defined in Table .
    View Image
  •   Figure 4.3.2 Predicted Start codons (top) and First exons (bottom) on sequence example1 (partial output). The fields, from left to right, are defined in Table and steps 3 and 4 of .
    View Image
  •   Figure 4.3.3 geneid prediction in extended format.
    View Image
  •   Figure 4.3.4 geneid prediction in GFF format.
    View Image
  •   Figure 4.3.5 Using gff2ps to visualize geneid output. Graphical representation of geneid output on sequence example1 with default gff2ps.
    View Image
  •   Figure 4.3.6 Using Apollo to visualize geneid output.
    View Image
  •   Figure 4.3.7 Using the UCSC genome browser to visualize geneid output.
    View Image
  •   Figure 4.3.8 Improving gene prediction by using external information (). (A ) Default geneid prediction on sequence example2. (B ) geneid prediction when the exon coordinates of gene AC004463.3 are given to geneid. (C ) Ensembl annotation of the sequence.
    View Image
  •   Figure 4.3.9 Using external information to investigate alternative splicing forms with geneid (). (A ) Default geneid prediction on sequence example3. (B, C ) Prediction of two alternative transcripts. The EST1 and EST2 tracks display the exonic structure of partial ESTs matches whose coordinates have been given to geneid. geneid+EST1 and geneid+EST2 show the resulting geneid predictions. Isoform1 and Isoform2 correspond to the coordinates of the two isoforms. (D ) Prediction of a third alternative transcript. The EST3 track displays the exonic structure of the EST, whose genomic coordinates has been given to geneid. geneid+EST3a and geneid+EST3b display the geneid predictions before and after the exon filtering process. The Isoform3 track contains the annotation for this isoform. (E ) The coordinates of a promoter element (Promoter; may be obtained by experimental means) are given to geneid, which improves the prediction of the first coding exon (geneid+Prom).
    View Image
  •   Figure 4.3.10 geneid Web server: DNA and external information area.
    View Image
  •   Figure 4.3.11 geneid Web server: Prediction Options area.
    View Image
  •   Figure 4.3.12 geneid Web server: Output Options area.
    View Image
  •   Figure 4.3.13 geneid Web server output with the sequence example1.fa.
    View Image
  •   Figure 4.3.14 geneid Default Gene Model.
    View Image

Videos

Literature Cited

Literature Cited
   Abril, J.F. and Guigó, R. 2000. gff2ps: Visualizing genomic annotations. Bioinformatics 16:743‐744.
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Aury, J.M., Jaillon, O., Duret, L., Noel, B., Jubin, C., Porcel, B.M., Segurens, B., Daubin, V., Anthouard, V., Aiach, N., Arnaiz, O., Billaut, A., Beisson, J., Blanc, I., Bouhouche, K., Camara, F., Duharcourt, S., Guigó, R., Gogendeau, D., Katinka, M., Keller, A.M., Kissmehl, R., Klotz, C., Koll, F., Le Mouel, A., Lepere, G., Malinsky, S., Nowacki, M., Nowak, J.K., Plattner, H., Poulain, J., Ruiz, F., Serrano, V., Zagulski, M., Dessen, P., Betermier, M., Weissenbach, J., Scarpelli, C., Schachter, V., Sperling, L., Meyer, E., Cohen, J., and Wincker, P. 2006. Global trends of whole‐genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444:171‐178.
   Birney, E. and Durbin, R. 2000. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10:547‐548.
   Brent, M.R. and Guigó, R. 2004. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14:264‐272.
   Castellano, S., Novoselov, S.V., Kryukov, G.V., Lescure, A., Blanco, E., Krol, A., Gladyshev, V.N., and Guigó, R. 2004. Reconsidering the evolution of eukaryotic selenoproteins: A novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep. 5:71‐77.
   Castellano, S., Morozova, N., Morey, M., Berry, M.J., Serras, F., Corominas, M., and Guigó, R. 2001. In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Reports 2:697‐702.
   Fagioli, M., Alcalay, M., Pandolfi, P.P., Venturini, L., Mencarelli, A., Simeone, A., Acampora, D., Grignani, F., and Pelicci, P.G. 1992. Alternative splicing of PML transcripts predicts coexpression of several carboxy‐terminally different protein isoforms. Oncogene. 7:1083‐1091.
   Glockner, G., Eichinger, L., Szafranski, K., Pachebat, J.A., Bankier, A.T., Dear, P.H., Lehmann, R., Baumgart, C., Parra, G., Abril, J.F., Guigó, R., Kumpf, K., Tunggal, B., Cox, E., Quail, M.A., Platzer, M., Rosenthal, A., Noegel, A.A.; Dictyostelium Genome Sequencing Consortium. 2002. Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418:79‐85.
   Guigó, R. 1998. Assembling genes from predicted exons in linear time with dynamic programming. J. Comp. Biol. 5:681‐702.
   Guigó, R., Knudsen, S., Drake, N., and Smith, T. 1992. Prediction of gene structure. J. Mol. Biol. 226:141‐157.
   Guigó, R., Flicek, P., Abril, J.F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V.B., Birney, E., Castelo, R., Eyras, E., Ucla, C., Gingeras, T.R., Harrow, J., Hubbard, T., Lewis, S.E., and Reese, M.G. 2006. EGASP: The human ENCODE Genome Annotation Assessment Project. Genome Biol. 7:S2.1‐S3.31.
   Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., Hillman‐Jackson, J., Kuhn, R.M., Pedersen, J.S., Pohl, A., Raney, B.J., Rosenbloom, K.R., Siepel, A., Smith, K.E., Sugnet, C.W., Sultan‐Qurraie, A., Thomas, D.J., Trumbower, H., Weber, R.J., Weirauch, M., Zweig, A.S., Haussler, D., and Kent, W.J. 2006. The UCSC Genome Browser Database: Update 2006. Nucl. Acids Res. 34:D590‐D598.
   International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695‐716.
   Jaillon, O., Aury, J.M., Brunet, F., Petit, J.L., Stange‐Thomann, N., Mauceli, E., Bouneau, L., Fischer, C., Ozouf‐Costaz, C., Bernot, A., Nicaud, S., Jaffe, D., Fisher, S., Lutfalla, G., Dossat, C., Segurens, B., Dasilva, C., Salanoubat, M., Levy, M., Boudet, N., Castellano, S., Anthouard, V., Jubin, C., Castelli, V., Katinka, M., Vacherie, B., Biemont, C., Skalli, Z., Cattolico, L., Poulain, J., De Berardinis, V., Cruaud, C., Duprat, S., Brottier, P., Coutanceau, J.P., Gouzy, J., Parra, G., Lardier, G., Chapple, C., McKernan, K.J., McEwan, P., Bosak, S., Kellis, M., Volff, J.N., Guigó, R., Zody, M.C., Mesirov, J., Lindblad‐Toh, K., Birren, B., Nusbaum, C., Kahn, D., Robinson‐Rechavi, M., Laudet, V., Schachter, V., Quetier, F., Saurin, W., Scarpelli, C., Wincker, P., Lander, E.S., Weissenbach, J., and Roest Crollius, H. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto‐karyotype. Nature 431:916‐917.
   Lewis, S.E., Searle, S.M.J., Harris, N., Gibson, M., Iyer, V., Ricter, J., Wiel, C., Bayraktaroglu, L., Birney, E., Crosby, M.A., Kaminker, J.S., Matthews, B., Prochnik, S.E., Smith, C.D., Tupy, J.L., Rubin, G.M., Misra, S., Mungall, C.J., and Clamp, M.E. 2002. Apollo: A sequence annotation editor. Genome Biology 3:research0082.
   Mott, R. 1997. EST_GENOME: A program to align spliced DNA sequences to unspliced genomic DNA. Comp. Appl. Biosci. 13:477‐478.
   Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520‐562.
   Parra, G., Blanco, E., and Guigó, R. 2000. geneid in Drosophila. Genome Res. 10:511‐515.
   Parra, G. Agarwal, P. Óbril, J.F. Wiehe, T. Fickett, J.W. Guigó, R. and 2003. Comparative gene prediction in human and mouse. Genome Res. 13:108‐117.
   Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:63‐98.
   Rat Genome Sequencing Project Consortium. 2004. Genome sequence of the brown Norway rat yields insights into mammalian evolution. Nature 428:493‐521.
   Stormo, G.D. 2000. Gene‐finding approaches for eukaryotes. Genome Res. 10:394‐397.
   Zhang, M.Q. 2002. Computational prediction of eukaryotic protein‐coding genes. Nat. Rev. Genet. 3:698‐709.
Key References
   Guigó et al., 1992. See above.
   Description of the first implementation of geneid.
   Guigó et al., 2006. See above.
   A community experiment to assess the state‐of‐the‐art in one percent of the human genome sequence.
   Parra et al., 2000. See above.
   Description of geneid v 1.0 used in the Adh region of Drosophila melanogaster.
Internet Resources
   http://genome.imim.es/software/geneid/index.html
   This is the geneid Web page.
   http://genome.imim.es/software/gfftools/GFF2PS.html
   This is gff2ps Web page.
   http://www.fruitfly.org/annot/apollo/
   This is Apollo Web page (see UNIT )
   http://genome.ucsc.edu/
   This is UCSC genome browser (golden path; UNIT ).
   http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
   This is GFF format Web page.
   http://www.w3.org/XML/
   This is XML format Web page.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序