读懂GenBank文件格式中的资料
互联网
一、 LOCUS
在 GenBank 格式中,
LOCUS NM_001469 2156 bp mRNA linear (家系血统) PRI ( primate 猿类) 16-DEC-2004DEFINITION Homo sapiens thyroid autoantigen 70kDa (Ku antigen) (G22P1), mRNA.
The LOCUS field contains a number of different data elements, including locus name, sequence length, molecule type, GenBank division, and modification date. Each element is described below.
二、 COMMENT
1 、 REVIEWED REFSEQ :说明了该 RefSeq 生成的过程。
2 、 Summary :说明了该序列的功能。三、 Feature
名词解释: information about genes and gene products, as well as regions of ( biological significance reported in the ) sequence. These can include regions of the sequence that code for proteins and RNA molecules.
Feature 下的副标题内容太复杂,必要时到这里 The DDBJ/EMBL/GenBank Feature Table 查 .
1 、 key :一般选择 Location/Qualifier 。
2 、 complement : cDNA 。 If a feature is located on the complementary strand, the word "complement" will appear before the base span.
3 、 5< :指向 5 ’端。 If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>.
4 、 /db_xref :其字符串是通往其他数据库的链接。
/db_xref="taxon:9606" taxonomy 物种分类学/db_xref="GeneID:2547" 链接到 Gene 。
/db_xref="LocusID:2547" 链接到 Locuslink 。
/db_xref="MIM:152690" 链接到 OMIM 。
四、两个例子:
====/product="alcohol dehydrogenase"
====/gene="adhI"
might be read as:
The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and is coded for by a gene called “ adhI ”
A more complex description:
Key=Location/Qualifiers
CDS=join(544..589,688..>1032)
====/product="T-cell receptor beta-chain"
which might be read as:
This feature, which is a partial coding sequence is formed by joining elements indicated to form one contiguous sequence encoding a product called T-cell receptor beta-chain.