
Using geneid to Identify Genes


  • Table of Contents
This unit describes the usage of geneid, an efficient gene?finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. Geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. Geneid software is in the public domain, and it is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene finding tools.

Keywords: Gene identification; genes; exons; splicing; genome annotation; bioinformatics

Table of Contents

  • Basic Protocol 1: Using the geneid Unix Application to Predict Genes
  • Basic Protocol 2: Visualizing geneid Predictions
  • Basic Protocol 3: Using External Information to Solidify geneid Predictions
  • Alternate Protocol 1: Using the geneid Web Server to Predict Genes
  • Support Protocol 1: How to Get geneid and Visualization Programs
  • Guidelines for Understanding Results
  • Commentary
  •   Figure 4.3.1 Default geneid prediction on sequence example1. The fields, from left to right, are defined in Table .
  •   Figure 4.3.2 Predicted Start codons (top) and First exons (bottom) on sequence example1 (partial output). The fields, from left to right, are defined in Table and steps 3 and 4 of .
  •   Figure 4.3.3 geneid prediction in extended format.
  •   Figure 4.3.4 geneid prediction in GFF format.
  •   Figure 4.3.5 Using gff2ps to visualize geneid output. Graphical representation of geneid output on sequence example1 with default gff2ps.
  •   Figure 4.3.6 Using Apollo to visualize geneid output.
  •   Figure 4.3.7 Using the UCSC genome browser to visualize geneid output.
  •   Figure 4.3.8 Improving gene prediction by using external information (). (A ) Default geneid prediction on sequence example2. (B ) geneid prediction when the exon coordinates of gene AC004463.3 are given to geneid. (C ) Ensembl annotation of the sequence.
  •   Figure 4.3.9 Using external information to investigate alternative splicing forms with geneid (). (A ) Default geneid prediction on sequence example3. (B, C ) Prediction of two alternative transcripts. The EST1 and EST2 tracks display the exonic structure of partial ESTs matches whose coordinates have been given to geneid. geneid+EST1 and geneid+EST2 show the resulting geneid predictions. Isoform1 and Isoform2 correspond to the coordinates of the two isoforms. (D ) Prediction of a third alternative transcript. The EST3 track displays the exonic structure of the EST, whose genomic coordinates has been given to geneid. geneid+EST3a and geneid+EST3b display the geneid predictions before and after the exon filtering process. The Isoform3 track contains the annotation for this isoform. (E ) The coordinates of a promoter element (Promoter; may be obtained by experimental means) are given to geneid, which improves the prediction of the first coding exon (geneid+Prom).
  •   Figure 4.3.10 geneid Web server: DNA and external information area.
  •   Figure 4.3.11 geneid Web server: Prediction Options area.
  •   Figure 4.3.12 geneid Web server: Output Options area.
  •   Figure 4.3.13 geneid Web server output with the sequence example1.fa.
  •   Figure 4.3.14 geneid Default Gene Model.
Literature Cited

