Digital Gene Expression by Tag Sequencing on the Illumina Genome Analyzer
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
This unit provides a protocol for performing digital gene expression profiling on the Illumina Genome Analyzer sequencing platform. Tag sequencing (Tag?seq) is an implementation of the LongSAGE protocol on the Illumina sequencing platform that increases utility while reducing both the cost and time required to generate gene expression profiles. The ultra?high?throughput sequencing capability of the Illumina platform allows the cost?effective generation of libraries containing an average of 20 million tags, a 200?fold improvement over classical LongSAGE. Tag?seq has less sequence composition bias, leading to a better representation of AT?rich tag sequences, and allows a more accurate profiling of a subset of the transcriptome characterized by AT?rich genes expressed at levels below the threshold of detection of LongSAGE (Morrissy et al., 2009). Curr. Protoc. Hum. Genet. 65:11.11.1?11.11.36 © 2010 by John Wiley & Sons, Inc.
Keywords: gene expression; Tag?seq; Illumina; RNA; cDNA; tag; PCR
Table of Contents
- Introduction
- Basic Protocol 1: First‐ and Second‐Strand cDNA Synthesis for Tag‐seq Library Construction
- Basic Protocol 2: Tag Generation
- Basic Protocol 3: PCR and Fragment Isolation
- Basic Protocol 4: Preparing the Library for Illumina Sequencing
- Alternate Protocol 1: Amplified Tag‐seq Library Construction (Tag‐seqLite)
- Basic Protocol 5: Data Analysis
- Reagents and Solutions
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Basic Protocol 1: First‐ and Second‐Strand cDNA Synthesis for Tag‐seq Library Construction
Materials
Basic Protocol 2: Tag Generation
Materials
Basic Protocol 3: PCR and Fragment Isolation
Materials
Basic Protocol 4: Preparing the Library for Illumina Sequencing
Materials
Alternate Protocol 1: Amplified Tag‐seq Library Construction (Tag‐seqLite)
|
Figures
-
Figure 11.11.1 Tag‐seq library generation. Polyadenylated mRNAs (open rectangles) are captured using oligo(dT) beads, and double‐stranded cDNA is subsequently synthesized. The cDNA (double clear rectangles) is digested with the Nla III anchoring restriction enzyme (vertical arrows), leaving a 4‐bp overhang (GTAC). Only cDNA fragments anchored to oligo (dT) beads are retained. Adapter A (light gray rectangle) is ligated to the overhang, and adds a recognition site for the Type IIS tagging enzyme Mme I. Following Mme I digestion (gray vertical arrow), a second adapter is ligated (Adapter B, light gray rectangle) to the resulting 2‐bp overhang. PCR primers (horizontal gray arrows) annealing to adapters A and B are used to enrich tags (dark gray rectangles). Cluster generation and sequencing (horizontal black arrow) is performed on the Illumina cluster station and analyzer. The resulting image files are processed to extract the read sequences, and 21‐bp SAGE tags are further extracted from the reads. Tags consist of the 4‐bp Nla III recognition sites and 17 bp of unique sequence, and constitute a total of 21 bases that can be mapped back to the original mRNA. View Image -
Figure 11.11.2 Overview of the flow of analysis (A ) and output of three analysis scripts. (B ) SdCompare: the number of tag sequences ( y axis) with expression counts above 20 in each of two compared libraries are binned by the log‐ratio of their expression ( x axis). This provides a measure of the similarity between two libraries (ce0068 and ce0069). (C ) CorrelatePlot: a scatterplot of the (log) expression levels of tags sequenced in two libraries (ce0068 and ce0069; x axis and y axis, respectively) is shown along with the Pearson correlation coefficient of the two libraries, the linear regression equation (top right), and the linear regression line. (D ) SdSageTree : A hierarchical tree representation of the distance matrix calculated for five libraries. The distance matrix is constructed from the standard deviations of the log ratios of the tag expression values in the five libraries. See Table for an overview of the scripts and Table for a summary of the data files. View Image -
Figure 11.11.3 Purified 85‐bp PCR product on the Agilent DNA1000 chip is seen as a 96 bp peak. View Image -
Figure 11.11.4 Gel image of PCR products along with the no‐template control. Both 13‐cycle and 15‐cycle 85‐bp PCR product bands were excised, gel purified, and ethanol precipitated. In general, if the purity of both PCR products is similar and if both have enough products for sequencing, use the one with fewer PCR cycles for sequencing. View Image
Videos
Literature Cited
Boon, K., Osorio, E.C., Greenhut, S.F., Schaefer, C.F., Shoemaker, J., Polyak, K., Morin, P.J., Buetow, K.H., Strausberg, R.L., De Souza, S.J., and Riggins, G.J. 2002. An anatomy of normal and malignant gene expression. Proc. Natl. Acad. Sci. U.S.A. 99:11287‐11292. | |
Gerhard, D.S., Wagner, L., Feingold, E.A., Shenmen, C.M., Grouse, L.H., Schuler, G., Klein, S.L., Old, S., Rasooly, R., Good, P., Guyer, M., Peck, A.M., Derge, J.G., Lipman, D., Collins, F.S., Jang, W., Sherry, S., Feolo, M., Misquitta, L., Lee, E., Rotmistrovsky, K., Greenhut, S.F., Schaefer, C.F., Buetow, K., Bonner, T.I., Haussler, D., Kent, J., Kiekhaus, M., Furey, T., Brent, M., Prange, C., Schreiber, K., Shapiro, N., Bhat, N.K., Hopkins, R.F., Hsie, F., Driscoll, T., Soares, M.B., Casavant, T.L., Scheetz, T.E., Brown‐stein, M.J., Usdin, T.B., Toshiyuki, S., Carninci, P., Piao, Y., Dudekula, D.B., Ko, M.S., Kawakami, K., Suzuki, Y., Sugano, S., Gruber, C.E., Smith, M.R., Simmons, B., Moore, T., Waterman, R., Johnson, S.L., Ruan, Y., Wei, C.L., Mathavan, S., Gunaratne, P.H., Wu, J., Garcia, A.M., Hulyk, S.W., Fuh, E., Yuan, Y., Sneed, A., Kowis, C., Hodgson, A., Muzny, D.M., McPherson, J., Gibbs, R.A., Fahey, J., Helton, E., Ketteman, M., Madan, A., Rodrigues, S., Sanchez, A., Whiting, M., Madari, A., Young, A.C., Wetherby, K.D., Granite, S.J., Kwong, P.N., Brinkley, C.P., Pearson, R.L., Bouffard, G.G., Blakesly, R.W., Green, E.D., Dickson, M.C., Rodriguez, A.C., Grimwood, J., Schmutz, J., Myers, R.M., Butterfield, Y.S., Griffith, M., Griffith, O.L., Krzywinski, M.I., Liao, N., Morin, R., Palmquist, D., Petrescu, A.S., Skalska, U., Smailus, D.E., Stott, J.M., Schnerch, A., Schein, J.E., Jones, S.J., Holt, R.A., Baross, A., Marra, M.A., Clifton, S., Makowski, K.A., Bosak, S., Malek, J.; MGC Project Team. 2004. The Status, Quality, and Expansion of the NIH Full‐Length cDNA Project: The Mammalian Gene Collection (MGC). Genome Res. 14:2121‐2127. | |
Gowda, M., Jantasuriyarat, C., Dean, R.A., and Wang, G.L. 2004. Robust‐LongSAGE (RL‐SAGE): A substantially improved LongSAGE method for gene discovery and transcriptome analysis. Plant Physiol. 134:890‐897. | |
Heidenblut, A.M., Luttges, J., Buchholz, M., Heinitz, C., Emmersen, J., Nielsen, K.L., Schreiter, P., Souquet, M., Nowacki, S., Herbrand, U., Klöppel, G., Schmiegel, W., Gress, T., and Hahn, S.A. 2004. aRNA‐longSAGE: A new approach to generate SAGE libraries from microdissected cells. Nucleic Acids Res. 32:E131. | |
Khattra, J., Delaney, A.D., Zhao, Y., Siddiqui, A.S., Asano, J., McDonald, H., Pandoh, P., Dhalla, N., Prabhu, A., Ma, K., Lee, S., Ally, A., Tam, A., Sa, D., Rogers, S., Charest, D., Stott, J., Zuyderduyn, S., Varhol, R., Eaves, C., Jones, S., Holt, R.A., Hirst, M., Hoodless, P.A., and Marra, M.A. 2007. Large‐scale production of SAGE libraries from microdissected tissues, flow‐sorted cells, and cell lines. Genome Res. 17:108‐116. | |
Kodzius, R., Kojima, M., Nishiyori, H., Nakamura, M., Fukuda, S., Tagami, M., Sasaki, D., Imamura, K., Kai, C., Harbers, M., Hayashizaki, Y., and Carninci, P. 2006. CAGE: Cap Analysis of Gene Expression. Nat. Methods 3:211‐222. | |
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., and Gilad, Y. 2008. RNA‐seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18:1509‐1517. | |
Matsumura, H., Reich, S., Ito, A., Saitoh, H., Kamoun, S., Winter, P., Kahl, G., Reuter, M., Kruger, D.H., and Terauchi, R. 2003. Gene expression analysis of plant host‐pathogen interactions by SuperSAGE. Proc. Natl. Acad. Sci. U.S.A. 100:15718‐15723. | |
Morrissy, A.S., Morin, R.D., Delaney, A., Zeng, T., McDonald, H., Jones, S., Zhao, Y., Hirst, M., and Marra, M.A. 2009. Next‐generation tag sequencing for cancer gene expression profiling. Genome Res. 19:1825‐1835. | |
Peters, D.G., Kassam, A.B., Yonas, H., O'Hare, E.H., Ferrell, R.E., and Brufsky, A.M. 1999. Comprehensive transcript analysis in small quantities of mRNA by SAGE‐lite. Nucleic Acids Res. 27:e39. | |
Pontius, J.U., Wagner, L., and Schuler, G.D. 2003. UniGene: A unified view of the transcriptome. In The NCBI Handbook, National Center for Biotechnology Information, Bethesda Md. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books. | |
Rosenkranz, R., Borodina, T., Lehrach, H., and Himmelbauer, H. 2008. Characterizing the mouse ES cell transcriptome with Illumina sequencing. Genomics 92:187‐194. | |
Saha, S., Sparks, A.B., Rago, C., Akmaev, V., Wang, C.J., Vogelstein, B., Kinzler, K.W., and Velculescu, V.E. 2002. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20:508‐512. | |
Siddiqui, A.S., Khattra, J., Delaney, A.D., Zhao, Y., Astell, C., Asano, J., Babakaiff, R., Barber, S., Beland, J., Bohacec, S., Brown‐John, M., Chand, S., Charest, D., Charters, A.M., Cullum, R., Dhalla, N., Featherstone, R., Gerhard, D.S., Hoffman, B., Holt, R.A., Hou, J., Kuo, B.Y.‐L., Lee, L.L.C., Lee, S., Leung, D., Ma, K., Matsuo, C., Mayo, M., McDonald, H., Prabhu, A., Pandoh, P., Riggins, G.J., Ruiz de Algara, T., Rupert, J.L., Smailus, D., Stott, J., Tsai, M., Varhol, R., Vrljicak, P., Wong, D., Wu, M.K., Xie, Y., Yang, G., Zhang, I., Hirst, M., Jones, S.J.M., Helgason, C.D., Simpson, E.M., Hoodless, P.A., and Marra, M.A. 2005. A mouse atlas of gene expression: Large‐scale digital gene‐expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. Proc. Natl. Acad. Sci. U.S.A. 102:18485‐18490. | |
Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial analysis of gene expression. Science 270:484‐487. | |
Wei, C.L., Ng, P., Chiu, K.P., Wong, C.H., Ang, C.C., Lipovich, L., Liu, E.T., and Ruan, Y. 2004. 5′ Long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. U.S.A. 101:11701‐11706. | |
Internet Resources | |
http://bioinfo.au.tsinghua.edu.cn/micrornadb/ | |
MicroRNAdb:A Comprehensive Database for MicroRNAs. MOE Key Laboratory of Bioinfomatics,Tsinghua University, Beijing. | |
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books | |
The Reference Sequence (RefSeq) Project. 2002. Chapter 18, The NCBI Handbook. National Library of Medicine (US), National Center for Biotechnology Information. Bethesda, Md. |