GrailEXP and Genome Analysis Pipeline for Genome Annotation

互联网2013-12-31

1032

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

The Basic Protocol describes the use of GrailEXP, the latest version of the gene finding system from Oak Ridge National Laboratory. GrailEXP provides gene models, by making use of sequence similarity with Expressed Sequence Tags (ESTs) and known genes. GrailEXP also provides alternatively spliced constructs for each gene based on the available EST evidence. The Support Protocol describes the use of the Genome Analysis Pipeline, a web application which allows users to perform comprehensive sequence analysis by offering a selection from a wide choice of supported gene finders, other biological feature finders, and database searches.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface
Alternate Protocol 1: Using Genome Analysis Pipeline for Comprehensive Analysis of DNA Sequences
Guidelines for Understanding Results
Commentary
Literature Cited
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface

Necessary Resources

Hardware
- Any computer workstation (PC, Macintosh, Unix, Linux) with Web access

Software
- Web browser (e.g., Netscape Navigator, Microsoft Internet Explorer)

Files
- DNA sequence of interest in Raw or FASTA format ( appendix 1B )

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 4.9.1 GrailEXP submission form.

View Image
Figure 4.9.2 Pipeline submission form.

View Image
Figure 4.9.3 Pipeline summary page.

View Image
Figure 4.9.4 GrailEXP genes text table.

View Image
Figure 4.9.5 Java Pipeline Viewer.

View Image

Videos

Literature Cited

	Brunak, S., Englebrecht, J., and Knudsen, S. 1990. Neural network detects errors in the assignment of mRNA splice sites. Nucl. Acids Res. 18:4797‐4801.
	Brunak, S., Englebrecht, J., and Knudsen, S. 1992. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220:49‐65.
	Claverie, J.‐M., Sauvaget, I., and Bougueleret, L. 1990. K‐tuple frequency analysis: From intron/exon discrimination to T‐cell epitope mapping. Methods Enzymol. 183:237‐252.
	Dong, S. and Searles, D.B. 1994. Gene structure prediction by linguistic methods. Genomics 23:540‐551.
	Fickett, J.W. 1982. Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10:5303‐5318.
	Fickett, J.W. and Tung, C.‐S. 1992. Assessment of protein coding measures. Nucl. Acids Res. 20:6441‐6450.
	Gelfand, M.S. 1990. Computer prediction of the exon‐intron structure of mammalian pre‐mRNAs. Nucl. Acids Res. 18:5865‐5869.
	Guigo, R., Knudsen, S., Drake, N., and Smith, T. 1992. Prediction of gene structure. J. Mol. Biol. 226:141‐157.
	Henikoff, S. and Henikoff, J. 1991. Automated assembly of protein blocks for database searching. Nucl. Acids Res. 19:6565‐6572.
	Hutchinson, G.B. and Hayden, M.R. 1992. The prediction of exons through an analysis of spliceable open reading frames. Nucl. Acids Res. 20:3453‐3462.
	Hyatt, D. and Uberbacher, E.C. 2002. Computational DNA sequence analysis and annotation. In Genomic Technologies: Present and Future (D.J. Galas, and, S.J. McCormack, eds.) pp. 345‐374. Caister Academic Press, Norfolk, U.K.
	Mani, G.S. 1992. Long‐range correlations in DNA and the coding regions. J. Theor. Biol. 158:447‐464.
	Mural, R.J., Einstein, J.R., Guan, X., Mann, R.C., and Uberbacher, E.C. 1992. An artificial intelligence approach to DNA sequence feature recognition. Trends Biotech. 10:67‐69.
	Snyder, E.E. and Stormo, G.D. 1993. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucl. Acids Res. 21:607‐613.
	Solovyev, V.V., Salamov, A.A., and Lawrence, C.B. 1994. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl. Acids Res. 22:5156‐5163.
	Staden, R. 1984. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucl. Acids Res. 12:505‐519.
	Uberbacher, E.C. and Mural, R.J. 1991. Locating protein‐coding regions in human DNA sequences by a multiple sensor‐neural network approach. Proc. Natl. Acad. Sci. U.S.A. 88:11261‐11265.
	Xu, Y., Mural, R., Shah, M., and Uberbacher, E. 1994a. Recognizing exons in genomic sequence using GRAIL II. In Genetic Engineering, Principles and Methods (J.K. Setlow, ed.) vol. 15, pp. 241‐253. Plenum, New York.
	Xu, Y., Mural, R.J., and Uberbacher, E.C. 1994b. Constructing gene models from accurately‐predicted exons: An application of dynamic programming. CABIOS 10:613‐623.
Key References
	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Antequera, F. and Bird, A. 1993. Number of CpG islands and genes in human and mouse. Proc. Natl. Acad. Sci. U.S.A. 90:11995‐11999.
	Bairoch, A. 1993. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucl. Acids Res. 21:3097‐3103.
	Bairoch, A. and Boeckman, B. 1993. The SWISS‐PROT protein sequence data bank, recent developments. Nucl. Acids Res. 21:3093‐3094.
	Beck, S., Kelly, A., Radley, E., Khurshid, F., Alderton, R.P., and Trowsdale, J. 1992. DNA sequence analysis of 66 kb of the human MHC class II region encoding a cluster of genes for antigen processing. J. Mol. Biol. 228:433‐441.
	Benson, D., Lipman, D.J., and Ostell, J. 1993. GenBank. Nucl. Acids Res. 21:2963‐2965.
	Bilofsky, H.S. and Burks, C. 1988. The GenBank genetic sequence data bank. Nucl. Acids Res. 16:1861‐1864.
	Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—database for “expressed sequence tags.” Nature Genet. 4:332‐333.
	Brody, L.C., Abel, K.J., Castilla, L.H., Couch, F.J., McKinley, D.R., Yin, G.Y., Ho, P.P., Merajver, S., Chandrasekharappa, S.C., Xu, J., Cole, J.L., Struewing, J.P., Valdes, J.M., Collins, F.S., and Weber, B.L. 1995. Construction of a transcription map surrounding the BRCA1 locus of human chromosome 17. Genomics 26:238‐247.
	Fields, C., Adams, M.D., White, O., and Venter, J.C. 1994. How many genes in the human genome? Nature Genet. 7:345‐346.
	Gardiner‐Garden, M. and Frommer, M. 1987. CpG islands in vertebrate genomes. J. Mol. Biol. 196:261‐282.
	John, R.M., Robbins, C.A., and Myers, R.M. 1994. Identification of genes within CpG‐enriched DNA from human chromosome 4p16.3. Human Mol. Gen. 3:1611‐1616.
	Jurka, J., Walichiewicz, J., and Milosavljevic, A. 1992. Prototypic sequences from human repetitive DNA. J. Mol. Evol. 35:286‐291.
	Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. 1992. CpG islands as gene markers in the human genome. Genomics 13:1095‐1107.
	Lawrence, B.J., Schwabe, W., Kloschis, P., Coy, J.F., Poustka, A., Brennan, M.B., and Hochgeschwender, U. 1994. Rapid identification of gene sequences for transcriptional map assembly by direct cDNA screening of genomic reference libraries. Hum. Mol. Gen. 3:2014‐2023.
	Marshall, E. 1995. A strategy for sequencing the genome 5 years early. Science 267:783‐784.
	Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
	Peltoketo, H., Isomaa, V., Maeentausta, O., and Vihko, R. 1988. Complete amino acid sequence of human placenta 17‐β‐hydroxysteroid dehydrogenase deduced from cDNA. FEBS Lett. 239:73‐77.
	Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
	Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy for high resolution sequence‐based physical mapping of complex genomes. Nature Genet. 6:40‐47.
	Wiginton, D.A., Kaplan, D., States, J.C., Akeson, A.L., Perme, C.M., Bilyk, I.J., Vaughn, A.J., Lattier, D.C., and Hutton, J.J. 1986. Complete sequence and structure of the gene for human adenosine deaminase. Biochemistry 25:8234‐8244.
	Xu, H., Wei, H., Tassone, F., Graw, F., Gardiner, K., and Weissman, S. 1995a. Search for genes from the dark band region of chromosome 21. Genomics 27:1‐8.
	Xu, Y., Mural, R.J., and Uberbacher, E.C. 1995b. Correcting sequencing errors in DNA coding regions using a dynamic programming approach. CABIOS 11:117‐124.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

GrailEXP and Genome Analysis Pipeline for Genome Annotation

Abstract

Table of Contents

Materials

Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface

Figures

Videos

Literature Cited