The TIGR Human cDNA Database

互联网2014-02-13

559

The Human cDNA Database (HCD) is a repository for human cDNA sequences and related data that is curated and maintained at The Institute for Genomic Research (TIGR). The foundation of the database is ∼160,000 partial cDNA sequences that have been generated by TIGR and Human Genome Sciences (HGS). These expressed sequence tags (ESTs) were derived from 250 cDNA libraries that represent the expressed genes of 37 distinct human organs and tissues (1 ). A combination of these ESTs and the human ESTs from dbEST has been assembled to yield ∼56,000 consensus sequences and ∼106,000 nonoverlapping ESTs. Of these ∼162,000 distinct sequences, approx 18% display statistically significant similarity to previously known genes, whereas the remainder identify previously unknown cDNAs. The HCD data include nucleotide sequences, putative identifications of the sequences, and tissue-based expression information. New EST sequences that are acquired by TIGR are assembled and curated in an ongoing process that aims to provide a comprehensive database of human genes and their expression patterns. The development of HCD required the construction of a sister database, the Expressed Gene Anatomy Database (EGAD). This database contains a nonredundant dataset of human transcript sequences, together with information on expression patterns and cellular roles. Extensive links between HCD and EGAD permit browsing of sequence information, functional classifications, and tissue expression patterns for particular genes of interest. Owing to the random nature of large-scale EST sampling and the wide range of tissues that have been sampled, the data that can be extracted from HCD are of particular value for several common types of cDNA sequence analyses.