谈谈SAGE

丁香园论坛2015-06-29

2818

DNA上所携带的遗传信息，需要通过RNA为中介体，合成出组织和正常生理功能所需要的蛋白质，这个过程被称为基因的表达。在生物体中不同的组织和器官所表达的基因群是不一样的，我们把基因群的表达状况称为基因表达谱。目前，高通量地研究基因表达谱的方法主要有两种，即生物芯片和基因表达串联分析（serial analysis of gene expression, SAGE）。基因芯片所能检测的基因必须是已知的基因，放在芯片上几种基因的探针就只能检测这几种基因的表达谱；相比之下，SAGE能以远高于DNA芯片的精确度和重复性来检测在病理条件下基因表达谱的改变，而不必考虑所检测的基因是已知的还是未知的。因此在检测疾病相关的新基因，特别是无法用基因芯片进行检测的低表达量致病基因时，SAGE是目前的最佳手段，无可取代。
SAGE技术为Genzyme公司所拥有的专利技术。其技术简介如下：

SAGE技术得以建立的理论基础

首先，一段来自于任一转录本特定区域的"标签"（Tag），即长度仅9-14bp的短核苷酸序列，就已包含足够的信息以特异性地确定该转录本。例如：一个9碱基的序列能有49=262144种不同的排列组合，而人类基因组据估计仅编码80000种转录本，因此在理论上每一个9碱基标签就能够代表一种转录本的特征序列。

第二，如果将短片段标签相互连接、集中形成长的DNA分子，则对该克隆进行测序将得到大量连续的单个标签，并能以连续的数据形式进行处理，这样就可对数以千计的mRNA转录本进行批量分析。

第三，各转录本的表达水平可以用特定标签被测得的次数进行定量。
Serial Analysis of Gene Expression (SAGE)

SAGE, or Serial Analysis of Gene Expression, is a powerful analytical tool that can be used in a wide variety of applications. These include identifying disease-related genes, analyzing the effects of drugs on tissues, and providing insight into disease pathways. SAGE is an integral part of Genzyme Molecular Oncology's therapeutic discovery efforts and is used extensively in its identification of novel tumor antigens and antiangiogenic targets.

SAGE was invented by a group of researchers led by Kenneth Kinzler, Ph.D., of The Johns Hopkins University and Bert Vogelstein, M.D., of The Johns Hopkins University and Howard Hughes Medical Institute. The commercial application of the technology is exclusively licensed to Genzyme Molecular Oncology. Genzyme Molecular Oncology makes the SAGE technology and data available under non-exclusive licenses.

Principle of SAGE

SAGE is a powerful technology for the analysis of gene expression. SAGE characterizes a 14-base pair segment of DNA, called a SAGE tag, from a defined location in each expressed gene, as a unique identifier for that gene. It is possible to sequence many thousands of such tags from a tissue or cell specimen in order to obtain an accurate quantitative analysis of the relative levels of the genes expressed in that specimen. 不是随机选取的The ability to count many thousands of genes allows the detection of those that are expressed at very low levels in a high-throughput manner.

An additional benefit of SAGE is that the data generated may be used in multiple comparisons. As the use of SAGE has increased, it has been possible to combine data from different experiments and sites to produce large-scale analyses of the genes expressed in a particular species. A new development of SAGE characterizes a 21-base pair segment. This new method, called LongSAGETM, enables matching of SAGE tags back to primary genome data and is anticipated to be of significant value in moving scientists closer to conclusive identification of all the genes in the human genome.

Applications

SAGE is applicable to any situation where relative levels of gene expression are important. In human disease, SAGE has been used to identify genes that are involved in the disease process. Genes identified by SAGE may be validated by other technologies and used as potential targets for therapeutics or diagnostics. SAGE may also be used once a therapeutic has been identified to determine the mode of action in an experimental animal or cell culture model.

At Genzyme Molecular Oncology, SAGE is used at different stages of the drug discovery and development program. SAGE is used in association with another proprietary platform, SPHERETM, as an antigen discovery tool in support of the cancer vaccine program. SAGE is also used to identify tumor endothelial markers and as a pathway elucidation tool to rationalize candidates in the antiangiogenesis cancer program. SAGE has been used to analyze the most prevalent cancer types and corresponding normal tissues. This has resulted in a proprietary database of over seven million SAGE tags representing over 125,000 unique transcripts.
Publications

The availability of SAGE to academic laboratories has led to more than 100 publications. In a key publication in 1999 (Velculescu et al. Nature Genetics), SAGE data from a number of public sources, as well as the Genzyme Molecular Oncology database, were combined. The analysis gave a unique interpretation on the diversity of genes expressed from the human genome. This work was further supplemented in 2002 with an article in Nature Biotechnology highlighting the utility of LongSAGE as a tool to enable completion of the human genome.

SAGE technology also was used to identify the distinctive molecular signatures of two of the most common forms of lung cancer. These findings were highlighted in an article published in The Proceedings of the National Academy of Sciences.

SAGE publications have covered the areas of human cardiovascular, dermatological, inflammatory, neurological and genetic diseases, as well as cancer. Publications on other species include yeast and agricultural crops. SAGE has also been selected by the NCI as a method of choice for the Cancer Genome Anatomy Project (CGAP).

SAGE Database

The Genzyme Molecular Oncology proprietary SAGE database covers all of the major cancer types as well as normal tissues. Access to this database is available through collaborations with Compugen Ltd., where the database may be accessed atwww.labonweb.com and also with Celera Genomics. In addition, public SAGE data is available at a number of sites including the National Cancer Institute.

Commercial Access to SAGE

SAGE is offered under sub-license to pharmaceutical, biotechnology and agricultural companies. Genzyme Molecular Oncology provides training, software and ongoing technical support to customers adopting this approach. In addition, Genzyme Molecular Oncology provides a service in SAGE library construction, sequencing and analysis. This is appropriate where companies have a requirement for a smaller number of SAGE libraries or where an evaluation of SAGE in practice is needed. SAGE is also available through use of the I-SAGETM kit supplied by Invitrogen.
短的Tag代替EST，通量会更高，大约1个sequencing read可得到30个左右的SAGE Tag（10个碱基），这样20块板的测序量就可得5.6万个SAGE Tag。
肯定会有一些非特异性的问题，但我们的实际经验得出的结论是：影响不大且可通过GLGI技术完善。从一个10碱基的SAGE Tag通过PCR追溯到3’EST，进而得到完整的基因信息。
如果是longSAGE也有缺陷，Tag的长度增加，特异性也增加，相应的测序成本也会上去，因而得不偿失。

公司网站上有比较详细的说明
http://www.hgbiochip.com/cservices-6.html