Disease and Phenotype Data at Ensembl
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
Biological databases are an important resource for the life sciences community. Accessing the hundreds of databases supporting molecular biology and related fields is a daunting and time?consuming task. Integrating this information into one access point is a necessity for the life sciences community, which includes researchers focusing on human disease. Here we discuss the Ensembl genome browser, which acts as a single entry point with Graphical User Interface to data from multiple projects, including OMIM, dbSNP, and the NHGRI GWAS catalog. Ensembl provides a comprehensive source of annotation for the human genome, along with other species of biomedical interest. In this unit, we explore how to use the Ensembl genome browser in example queries related to human genetic diseases. Support protocols demonstrate quick sequence export using the BioMart tool. Curr. Protoc. Hum. Genet. 69:6.11.1?6.11.34 © 2011 by John Wiley & Sons, Inc.
Keywords: computer graphics; databases; genetic variation; genomics; cytogenetics; sequence homology; sequence alignment; informatics; computational biology
Table of Contents
- Introduction
- Basic Protocol 1: Exploring an SNP Associated with Hemochromatosis
- Basic Protocol 2: Exploring a Nonsynonymous Variation in the MYC Gene
- Basic Protocol 3: Sequence Matches and Individual Genomes
- Basic Protocol 4: A Cytogeneticist's View
- Support Protocol 1: Sequence Export
- Support Protocol 2: Variation Export
- Commentary
- Literature Cited
- Figures
Materials
Basic Protocol 1: Exploring an SNP Associated with Hemochromatosis
Materials
Basic Protocol 2: Exploring a Nonsynonymous Variation in the MYC Gene
Materials
Basic Protocol 3: Sequence Matches and Individual Genomes
Materials
Basic Protocol 4: A Cytogeneticist's View
Necessary Resources
Support Protocol 1: Sequence Export
Necessary Resources
Support Protocol 2: Variation Export
Necessary Resources
|
Figures
-
Figure 6.11.1 The Ensembl home page at http://www.ensembl.org. A link to all available vertebrate and invertebrate species is circled in the figure. News for each release is shown at the bottom right hand corner of the page, and older releases can be accessed through the “View in archive site” link at the very bottom left. View Image -
Figure 6.11.2 The variation tab for rs1800562. Ensembl genes and transcripts containing this variation are available through the Gene/Transcript link (1), genotype information for this variation can be accessed by “Individual genotypes” (2), phenotypes associated with rs1900562 through the GWAS catalog are shown in the Phenotype Data view (3), and external data in DAS format (SNPedia) can be accessed using the External Data link (4). View Image -
Figure 6.11.3 Phylogenetic content for rs1333049. Variations are highlighted within the mammals in the alignment. The view is centered on rs1333049 in human. View Image -
Figure 6.11.4 (A ) Transcript table in the gene tab for the human HFE gene. Fourteen transcripts are shown. The twelve protein‐coding transcripts are listed first. Seven transcripts are found in the CCDS set, and all transcripts have been identified by manual annotation (the HAVANA project), as indicated by transcript numbers beginning with “0.” (B ) Transcript diagrams in the gene summary view in the gene tab. Gold transcripts are agreed upon by HAVANA and Ensembl automatic annotation. Boxes are exons, and connecting lines are introns. Filled boxes show coding sequence, while unfilled boxes indicate untranslated sequence (UTR). Transcripts are on the forward strand of the chromosome, as they are drawn above the blue line corresponding to the assembly. A greater than or less than sign after the transcript identifier indicates the strand by showing direction of translation. View Image -
Figure 6.11.5 Sequence alignment of a short query sequence to the human genome using BLAT. (1) Matches on the human karyotype are indicated by filled triangles, with the best match boxed. (2) The query sequence is drawn in an alternating black and white “racetrack.” High‐scoring pairs (HSP) are drawn along the query. In this case, one HSP matches the full length of the query. (3) A table of BLAT hits is shown. Links to the sequence alignment (A) and a graphical view of the BLAT hit (C) are circled in the diagram. View Image -
Figure 6.11.6 Alignment of the query sequence to chromosome 8 on the human genome. Clicking on “A” (circled in Fig. ) reveals this view. The positive signs (circled) indicate the forward strand of the chromosome. A mismatch is seen in the 33rd nucleotide of the query sequence. View Image -
Figure 6.11.7 Location tab: “region in detail” view, centered on the human MYC gene. (1) The chromosome panel reveals banding from homochromatin/heterochromatin staining. The red box shows the position of the MYC gene. (2) The “top panel” is centered on the MYC gene, indicated by the red box. Contigs are colored in light and dark blue, to differentiate them in the assembly. Clicking on a contig will show its identifier. Neighboring genes to MYC are indicated along the genomic assembly. Click on any gene to recenter the display. (3) The “main panel” is zoomed in to the MYC gene, further explored in Figure . The BLAT hit is circled. All panels can be exported as images using a button at the lower right‐hand corner of each panel. View Image -
Figure 6.11.8 The main panel of the “region in detail” view shown in Figure , zoomed out. A region of chromosome 8 (base pairs 128725525‐128775524) is shown. The zoom ladder is circled. Sequences from the NCBI reference sequence set and EMBL nucleotides are drawn in collapsed format above the blue bar, indicating they are on the forward strand of the chromosome. One coding sequence in the CCDS set is displayed, along with six MYC protein‐coding transcripts. The BLAT hit matches to part of a common exon in the MYC transcripts. Below the blue bar are regulatory features indicating regions of open chromatin detected in the ENCODE project. View Image -
Figure 6.11.9 A zoomed‐in view of the main panel shown in Figure . Decreasing the range to chromosome 8, base pairs 128,747,219 to 128,751,603 shows, with more clarity, the BLAT hit alignment to the 5′ end of the MYC‐201 transcript. The constrained elements block and GERP scoring indicate this exon is highly conserved across sixteen species. View Image -
Figure 6.11.10 The configuration dialog for the location tab: “region in detail” view. Click on the “configure this page” link at the left of the “region in detail” to select or deselect data tracks. Active tracks are shown by default, and list the selected data already shown in “region in detail.” (1) Data options are separated into menus, such as Germline Variations (circled). (2) A search box at the top right allows the name of a data source to be entered, revealing the appropriate track. (3) Once data are selected and/or deselected, the “check mark” must be clicked on to redraw the “region in detail” page according to the new configuration. View Image -
Figure 6.11.11 Searching with the term MeDIP in the configuration dialog reveals multiple tracks showing DNA methylation across different tissue types. These tracks can be found in the “Functional genomics” menu. View Image -
Figure 6.11.12 The location tab: “region in detail” view displaying the “MeDIP‐chip B‐cell,” “CTCF peaks” (CTCF binding sites), and “Sequence variants” tracks. A CTCF binding site corresponds to the 5′ end and upstream region of MYC‐201. DNA methylation sites are found in the region shown in B‐cells, indicated by the “MeDIP‐chip B‐cells” track. Variations are drawn as vertical lines, and are color‐coded according to the legend below. View Image -
Figure 6.11.13 A zoomed‐in view of the main panel shown in Figure . The filled, yellow box in the “sequence variants” track represents a nonsynonymous SNP at nucleotide position 128,750,540 on chromosome 8. This location, along with the dbSNP ID (rs4645959) and possible alleles (A and G), is revealed by clicking on the yellow box, which opens the pop‐up box shown in the figure. View Image -
Figure 6.11.14 The “variation image” in the gene tab. (1) All variations in MYC transcripts are displayed as vertical lines, color‐coded according to the legend at the bottom of the view (not shown in the figure). (2) The six MYC transcripts are drawn. (3) The MYC‐001 transcript is drawn, along with variations. Synonymous and nonsynonymous SNPs show encoded amino acid(s) in single letter code. For example, the yellow box showing “N/S” reveals that asparagine or serine can be coded for at that position. Clicking on a variation will open an information box, and a link to the variation tab. (4) Protein domains from various sources are drawn along the transcripts. For example, a transcript regulation domain in Pfam maps to the second and third exons of MYC‐001. View Image -
Figure 6.11.15 A selection of sequence from the transcript tab: cDNA view. Sequence and line numbering in the top line corresponds to the transcript, including UTR, highlighted in bright yellow. The second line corresponds to the coding sequence only. Codons in the first two lines are revealed by light yellow highlighting, alternating with no highlighting. The third line shows the amino acid sequence. The “N” at position 26 in the amino acid sequence is shown in red, indicating another amino acid is possible, depending on the nucleotide allele. The “R” above the highlighted nucleotide is the IUPAC code for purine (A or G), and can be clicked to open the variation tab. View Image -
Figure 6.11.16 The transcript tab: population comparison view. Variations are shown in Jim Watson's genome. One synonymous coding SNP (rs12628) and one intronic SNP (rs61877782) differ in allele, when compared to the reference genome. View Image -
Figure 6.11.17 The location tab: Linkage Data view. This view is reachable from the variation tab: summary view, if linkage disequilibrium (LD) values have been calculated for the specific variant. Clicking on the LD plot for the population “CSHL‐HAPMAP:CHB” in the variation summary view for rs12628 will open the “Linkage Data” view. Click “Export data” at the left of the view to export the table of LD values shown in this figure. View Image -
Figure 6.11.18 Location tab: “region in detail” view. The main panel is shown. Constrained elements (circled) and GERP scoring (labeled “1”) of each nucleotide in the 34 species alignment are shown. The circled constrained element falls in the first intron of the TALD01 transcript. View Image -
Figure 6.11.19 Gene tab: regulation view. A graphical display of predicted and known sequences associated with gene regulation is shown for ENSG00000177156. Features from the Ensembl Regulatory Build are shown, along with sequences from cisRED (circled). Click on any feature for an information box. View Image -
Figure 6.11.20 Location tab: “region in detail” view. The main panel is centered on the TALDO1 transcript. Markers are displayed as pink blocks. Clicking on either RH36444 or D11S3271 will reveal more information about the marker. View Image -
Figure 6.11.21 Location tab: “region in detail” view. The main panel has been zoomed in to chromosome 11, base pairs 755,875 to 756,005. Sequence and translated sequence are selected in the configuration dialog. Sequence is only displayed if the base pair range is small enough. View Image -
Figure 6.11.22 Location tab: region overview. Contigs, genes, and tilepath clones are displayed. Gold clones indicate finished sequence, and a black triangle at the upper left‐hand corner of a gold rectangle indicates the clone was mapped using fluorescence in situ hybridization. For regions of over 1 Mb, the region overview should be used rather than “region in detail.” View Image -
Figure 6.11.23 BioMart: the database and dataset have been selected to be Ensembl Genes 60 and Homo sapiens genes. View Image -
Figure 6.11.24 BioMart: the HGNC symbol MYC has been entered in the “ID list limit” filter, and the “count” button reveals that 1 gene out of 52,580 potential human noncoding and coding genes passes the filter (circled). View Image -
Figure 6.11.25 BioMart: the “results” button shows a preview window. View Image
Videos
Literature Cited
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel‐Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25:25‐29. | |
Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., and Zygouri, C. 2003. PRINTS and its automatic supplement, preprints. Nucleic Acids Res. 31:400‐402. | |
Benyamin, B., McRae, A.F., Zhu, G., Gordon, S., Henders, A.K., Palotie, A., Peltonen, L., Martin, N.G., Montgomery, G.W., Whitfield, J.B., and Visscher, P.M. 2009. Variants in TF and HFE explain approximately 40% of genetic variation in serum‐transferrin levels. Am. J. Hum. Genet. 84:60‐65. | |
Betel, D., Wilson, M., Gabow, A., Marks, D.S., and Sander, C. 2008. The microRNA.org resource: Targets and expression. Nucleic Acids Res. 36:D149‐D153. | |
Borate, B. and Baxevanis, A.D. 2009. Searching Online Mendelian Inheritance in Man (OMIM) for information on genetic loci involved in human disease. Curr. Protoc. Bioinform. 27:1.2.1‐1.2.13. | |
Chen, Y., Cunningham, F., Rios, D., McLaren, W., Smith, J., Pritchard, B., Spudich, G.M., Brent, S., Kulesha, E., Marin‐Garcia, P., Smedley, D., Birney, E., and Flicek, P. 2010. Ensembl variation resources. BMC Genomics 11:293. | |
Cooper, G.M., Stone, E.A., Asimenos, G., NISC Comparative Sequencing Program, Green, E.D., Batzoglou, S., and Sidow, A. 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15:901‐913. | |
Cullen, L.M., Anderson, G.J., Ramm, G.A., Jazwinska, E.C., and Powell, L.W. 1999. Genetics of hemochromatosis. Annu. Rev. Med. 50:87‐98. | |
Curwen, V., Eyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., and Clamp, M. 2004. The Ensembl automatic gene annotation system. Genome Res. 14:942‐950. | |
Dalgleish, R., Flicek, P., Cunningham, F., Astashyn, A., Tully, R.E., Proctor, G., Chen, Y., McLaren, W.M., Larsson, P., Vaughan, B.W., Broud, C., Dobson, G., Lehvslaiho, H., Taschner, P.E., den Dunnen, J.T., Devereau, A., Birney, E., Brookes, A.J., and Maglott, D.R. 2010. Locus reference genomic sequences: An improved basis for describing human DNA variants. Genome Med. 2:24. | |
Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E.M., Antosiewicz‐Bourget, J., Egli, D., Maherali, N., Park, I.H., Yu, J., Daley, G.Q., Eggan, K., Hochedlinger, K., Thomson, J., Wang, W., Gao, Y., and Zhang, K. 2009. Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. Nat. Biotechnol. 27:353‐360. | |
ENCODE Project Consortium. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799‐816. | |
Fernández‐Suárez, X.M. and Schuster, M.K. 2010. Using the Ensembl genome server to browse genomic sequence data. Curr. Protoc. Bioinformatics 30:1.15.1‐1.15.48. | |
Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., and Bateman, A. 2010. The Pfam protein families database. Nucleic Acids Res. 38:D211‐D222. | |
Galperin, M.T. and Cochrane, G.R. 2011. The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Res. 39:D1‐D6. | |
Gene Ontology Consortium. 2010. The Gene Ontology in 2010: Extensions and refinements. Nucleic Acids Res. 38:D331‐D335. | |
Gross, D.S. and Garrard, W.T. 1988. Nuclease hypersensitive sites in chromatin. Annu. Rev. Biochem. 57:159‐197. | |
Haider, S., Ballester, B., Smedley, D., Zhang, J., Rice, P., and Kasprzyk, A. 2009. BioMart central portal: Unified access to biological data. Nucleic Acids Res. 37:W23‐W27. | |
Hillier, L.W., Miller, W., Birney, E., Warren, W., Hardison, R.C., Ponting, C.P., Bork, P., Burt, D.W., Groenen, M.A., Delany, M.E., Dodgson, J.B., Chinwalla, A.T., Cliften, P.F., Clifton, S.W., Delehaunty, K.D., Fronick, C., Fulton, R.S., Graves, T.A., Kremitzki, C., Layman, D., Magrini, V., McPherson, J.D., Miner, T.L., Minx, P., Nash, W.E., Nhan, M.N., Nelson, J.O., Oddy, L.G., Pohl, C.S., Randall‐Maher, J., Smith, S.M., Wallis, J.W., Yang, S.P., Romanov, M.N., Rondelli, C.M., Paton, B., Smith, J., Morrice, D., Daniels, L., Tempest, H.G., Robertson, L., Masabanda, J.S., Griffin, D.K., Vignal, A., Fillon, V., Jacobbson, L., Kerje, S., Andersson, L., Crooijmans, R.P., Aerts, J., van der Poel, J.J., Ellegren, H., Caldwell, R.B., Hubbard, S.J., Grafham, D.V., Kierzek, A.M., McLaren, S.R., Overton, I.M., Arakawa, H., Beattie, K.J., Bezzubov, Y., Boardman, P.E., Bonfield, J.K., Croning, M.D., Davies, R.M., Francis, M.D., Humphray, S.J., Scott, C.E., Taylor, R.G., Tickle, C., Brown, W.R., Rogers, J., Buerstedde, J.M., Wilson, S.A., Stubbs, L., Ovcharenko, I., Gordon, L., Lucas, S., Miller, M.M., Inoko, H., Shiina, T., Kaufman, J., Salomonsen, J., Skjoedt, K., Wong, G.K., Wang, J., Liu, B., Wang, J., Yu, J., Yang, H., Nefedov, M., Koriabine, M., Dejong, P.J., Goodstadt, L., Webber, C., Dickens, N.J., Letunic, I., Suyama, M., Torrents, D., von Mering, C., Zdobnov, E.M., Makova, K., Nekrutenko, A., Elnitski, L., Eswara, P., King, D.C., Yang, S., Tyekucheva, S., Radakrishnan, A., Harris, R.S., Chiaromonte, F., Taylor, J., He, J., Rijnkels, M., Griffiths‐Jones, S., Ureta‐Vidal, A., Hoffman, M.M., Severin, J., Searle, S.M., Law, A.S., Speed, D., Waddington, D., Cheng, Z., Tuzun, E., Eichler, E., Bao, Z., Flicek, P., Shteynberg, D.D., Brent, M.R., Bye, J.M., Huckle, E.J., Chatterji, S., Dewey, C., Pachter, L., Kouranov, A., Mourelatos, Z., Hatzigeorgiou, A.G., Paterson, A.H., Ivarie, R., Brandstrom, M., Axelsson, E., Backstrom, N., Berlin, S., Webster, M.T., Pourquie, O., Reymond, A., Ucla, C., Antonarakis, S.E., Long, M., Emerson, J.J., Betran, E., Dupanloup, I., Kaessmann, H., Hinrichs, A.S., Bejerano, G., Furey, T.S., Harte, R.A., Raney, B., Siepel, A., Kent, W.J., Haussler, D., Eyras, E., Castelo, R., Abril, J.F., Castellano, S., Camara, F., Parra, G., Guigo, R., Bourque, G., Tesler, G., and Pevzner, P.A. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695‐716. | |
Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., and Manolio, T.A. 2009. Potential etiologic and functional implications of genome‐wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U.S.A. 106:9362‐9367. | |
Horaitis, O. and Cotton, R.G. 2005. Human mutation databases. Curr. Protoc. Hum. Genet. 44:7.11.1‐7.11.13. | |
International HapMap Consortium, Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., Belmont, J.W., Boudreau, A., Hardenbol, P., Leal, S.M., Pasternak, S., Wheeler, D.A., Willis, T.D., Yu, F., Yang, H., Zeng, C., Gao, Y., Hu, H., Hu, W., Li, C., Lin, W., Liu, S., Pan, H., Tang, X., Wang, J., Wang, W., Yu, J., Zhang, B., Zhang, Q., Zhao, H., Zhao, H., Zhou, J., Gabriel, S.B., Barry, R., Blumenstiel, B., Camargo, A., Defelice, M., Faggart, M., Goyette, M., Gupta, S., Moore, J., Nguyen, H., Onofrio, R.C., Parkin, M., Roy, J., Stahl, E., Winchester, E., Ziaugra, L., Altshuler, D., Shen, Y., Yao, Z., Huang, W., Chu, X., He, Y., Jin, L., Liu, Y., Shen, Y., Sun, W., Wang, H., Wang, Y., Wang, Y., Xiong, X., Xu, L., Waye, M.M., Tsui, S.K., Xue, H., Wong, J.T., Galver, L.M., Fan, J.B., Gunderson, K., Murray, S.S., Oliphant, A.R., Chee, M.S., Montpetit, A., Chagnon, F., Ferretti, V., Leboeuf, M., Olivier, J.F., Phillips, M.S., Roumy, S., Sallee, C., Verner, A., Hudson, T.J., Kwok, P.Y., Cai, D., Koboldt, D.C., Miller, R.D., Pawlikowska, L., Taillon‐Miller, P., Xiao, M., Tsui, L.C., Mak, W., Song, Y.Q., Tam, P.K., Nakamura, Y., Kawaguchi, T., Kitamoto, T., Morizono, T., Nagashima, A., Ohnishi, Y., Sekine, A., Tanaka, T., Tsunoda, T., Deloukas, P., Bird, C.P., Delgado, M., Dermitzakis, E.T., Gwilliam, R., Hunt, S., Morrison, J., Powell, D., Stranger, B.E., Whittaker, P., Bentley, D.R., Daly, M.J., de Bakker, P.I., Barrett, J., Chretien, Y.R., Maller, J., McCarroll, S., Patterson, N., Pe'er, I., Price, A., Purcell, S., Richter, D.J., Sabeti, P., Saxena, R., Schaffner, S.F., Sham, P.C., Varilly, P., Altshuler, D., Stein, L.D., Krishnan, L., Smith, A.V., Tello‐Ruiz, M.K., Thorisson, G.A., Chakravarti, A., Chen, P.E., Cutler, D.J., Kashuk, C.S., Lin, S., Abecasis, G.R., Guan, W., Li, Y., Munro, H.M., Qin, Z.S., Thomas, D.J., McVean, G., Auton, A., Bottolo, L., Cardin, N., Eyheramendy, S., Freeman, C., Marchini, J., Myers, S., Spencer, C., Stephens, M., Donnelly, P., Cardon, L.R., Clarke, G., Evans, D.M., Morris, A.P., Weir, B.S., Tsunoda, T., Mullikin, J.C., Sherry, S.T., Feolo, M., Skol, A., Zhang, H., Zeng, C., Zhao, H., Matsuda, I., Fukushima, Y., Macer, D.R., Suda, E., Rotimi, C.N., Adebamowo, C.A., Ajayi, I., Aniagwu, T., Marshall, P.A., Nkwodimmah, C., Royal, C.D., Leppert, M.F., Dixon, M., Peiffer, A., Qiu, R., Kent, A., Kato, K., Niikawa, N., Adewole, I.F., Knoppers, B.M., Foster, M.W., Clayton, E.W., Watkin, J., Gibbs, R.A., Belmont, J.W., Muzny, D., Nazareth, L., Sodergren, E., Weinstock, G.M., Wheeler, D.A., Yakub, I., Gabriel, S.B., Onofrio, R.C., Richter, D.J., Ziaugra, L., Birren, B.W., Daly, M.J., Altshuler, D., Wilson, R.K., Fulton, L.L., Rogers, J., Burton, J., Carter, N.P., Clee, C.M., Griffiths, M., Jones, M.C., McLay, K., Plumb, R.W., Ross, M.T., Sims, S.K., Willey, D.L., Chen, Z., Han, H., Kang, L., Godbout, M., Wallenburg, J.C., L'Archeveque, P., Bellemare, G., Saeki, K., Wang, H., An, D., Fu, H., Li, Q., Wang, Z., Wang, R., Holden, A.L., Brooks, L.D., McEwen, J.E., Guyer, M.S., Wang, V.O., Peterson, J.L., Shi, M., Spiegel, J., Sung, L.M., Zacharia, L.F., Collins, F.S., Kennedy, K., Jamieson, R., and Stewart, J. 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851‐861. | |
Kapushesky, M., Emam, I., Holloway, E., Kurnosov, P., Zorin, A., Malone, J., Rustici, G., Williams, E., Parkinson, H., and Brazma, A. 2010. Gene expression atlas at the European bioinformatics institute. Nucleic Acids Res. 38:D690‐D698. | |
Karolchik, D., Hinrichs, A.S., and Kent, W.J. 2009. The UCSC genome browser. Curr. Protoc. Bioinform. 28:1.4.1‐1.4.26. | |
Kent, W.J. 2002. BLAT: The BLAST‐like alignment tool. Genome Res. 12:656‐664. | |
Letunic, I., Doerks, T., and Bork, P. 2009. SMART 6: Recent updates and new developments. Nucleic Acids Res. 37:D229‐D232. | |
Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., Lin, Y., MacDonald, J.R., Pang, A.W., Shago, M., Stockwell, T.B., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S.A., Busam, D.A., Beeson, K.Y., McIntosh, T.C., Remington, K.A., Abril, J.F., Gill, J., Borman, J., Rogers, Y.H., Frazier, M.E., Scherer, S.W., Strausberg, R.L., and Venter, J.C. 2007. The diploid genome sequence of an individual human. PLoS Biol. 5:e254. | |
Lucotte, G. and Dieterlen, F. 2003. A European allele map of the C282Y mutation of hemochromatosis: Celtic versus viking origin of the mutation? Blood Cells Mol. Dis. 31:262‐267. | |
McDowall, J., and Hunter, S. 2011. InterPro protein classification. Methods Mol. Biol. 694:37‐47. | |
Nikolaev, L.G., Akopov, S.B., Didych, D.A., and Sverdlov, E.D. 2009. Vertebrate protein CTCF and its multiple roles in a large‐scale regulation of genome activity. Cur. Genomics 10:294‐302. | |
Parkinson, H., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., Holloway, E., Lukk, M., Malone, J., Mani, R., Pilicheva, E., Rayner, T.F., Rezwan, F., Sharma, A., Williams, E., Bradley, X.Z., Adamusiak, T., Brandizi, M., Burdett, T., Coulson, R., Krestyaninova, M., Kurnosov, P., Maguire, E., Neogi, S.G., Rocca‐Serra, P., Sansone, S.A., Sklyar, N., Zhao, M., Sarkans, U., and Brazma, A. 2009. ArrayExpress update: From an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 37:D868‐D872. | |
Paten, B., Herrero, J., Beal, K., Fitzgerald, S. and Birney, E. 2008. Enredo and Pecan: Genome‐wide mammalian consistency‐based multiple alignment with paralogs. Genome Res. 18:1814‐1828. | |
Ponten, F., Jirstrom, K., and Uhlen, M. 2008. The Human Protein Atlas: A tool for pathology. J. Pathol. 216:387‐393. | |
Pruitt, K.D., Harrow, J., Harte, R.A., Wallin, C., Diekhans, M., Maglott, D.R., Searle, S., Farrell, C.M., Loveland, J.E., Ruef, B.J., Hart, E., Suner, M.M., Landrum, M.J., Aken, B., Ayling, S., Baertsch, R., Fernandez‐Banet, J., Cherry, J.L., Curwen, V., Dicuccio, M., Kellis, M., Lee, J., Lin, M.F., Schuster, M., Shkeda, A., Amid, C., Brown, G., Dukhanina, O., Frankish, A., Hart, J., Maidak, B.L., Mudge, J., Murphy, M.R., Murphy, T., Rajan, J., Rajput, B., Riddick, L.D., Snow, C., Steward, C., Webb, D., Weber, J.A., Wilming, L., Wu, W., Birney, E., Haussler, D., Hubbard, T., Ostell, J., Durbin, R., and Lipman, D. 2009a. The consensus coding sequence (CCDS) project: Identifying a common protein‐coding gene set for the human and mouse genomes. Genome Res. 19:1316‐1323. | |
Pruitt, K.D., Tatusova, T., Klimke, W., and Maglott, D.R. 2009b. NCBI reference sequences: Current status, policy and new initiatives. Nucleic Acids Res. 37:D32‐D36. | |
Rakyan, V.K., Down, T.A., Thorne, N.P., Flicek, P., Kulesha, E., Graf, S., Tomazou, E.M., Backdahl, L., Johnson, N., Herberth, M., Howe, K.L., Jackson, D.K., Miretti, M.M., Fiegler, H., Marioni, J.C., Birney, E., Hubbard, T.J., Carter, N.P., Tavare, S., and Beck, S. 2008. An integrated resource for genome‐wide identification and analysis of human tissue‐specific differentially methylated regions (tDMRs). Genome Res. 18:1518‐1529. | |
Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., Dicuccio, M., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D.J., Lu, Z., Madden, T.L., Madej, T., Maglott, D.R., Marchler‐Bauer, A., Miller, V., Mizrachi, I., Ostell, J., Panchenko, A., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T.A., Wagner, L., Wang, Y., John Wilbur, W., Yaschenko, E., and Ye, J. 2009. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 38:D5‐D16. | |
Searle, S., Frankish, A., Bignell, A., Aken, B., Derrien, T., Diekhans, M., Harte, R., Howald, C., Kokocinski, F., Lin, M., Tress, M., Van Baren, M., Barnes, I., Hunt, T., Carvalho‐Silva, D., Davidson, C., Donaldson, S., Gilbert, J., Kay, M., Lloyd, D., Loveland, J., Mudge, J., Snow, C., Vamathevan, J., Wilming, L., Brent, M., Gerstein, M., Guigó, R., Kellis, M., Reymond, A., Zadissa, A., Valencia, A., Harrow, J., and Hubbard, T. 2010. The GENCODE human gene set. Genome Biol. 11:P36. | |
Sigrist, C.J., Cerutti, L., de Castro, E., Langendijk‐Genevaux, P.S., Bulliard, V., Bairoch, A., and Hulo, N. 2010. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38:D161‐D166. | |
Sterk, P., Kulikova, T., Kersey, P., and Apweiler, R. 2007. The EMBL nucleotide sequence and genome reviews databases. Methods Mol. Biol. 406:1‐21. | |
UniProt Consortium. 2010. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 38:D142‐D148. | |
Visel, A., Minovitsky, S., Dubchak, I., and Pennacchio, L.A. 2007. VISTA enhancer browser: A database of tissue‐specific human enhancers. Nucleic Acids Res. 35:D88‐D92. | |
Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y.J., Makhijani, V., Roth, G.T., Gomes, X., Tartaro, K., Niazi, F., Turcotte, C.L., Irzyk, G.P., Lupski, J.R., Chinault, C., Song, X.Z., Liu, Y., Yuan, Y., Nazareth, L., Qin, X., Muzny, D.M., Margulies, M., Weinstock, G.M., Gibbs, R.A., and Rothberg, J.M. 2008. The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872‐876. | |
Wilming, L.G., Gilbert, J.G., Howe, K., Trevanion, S., Hubbard, T., and Harrow, J.L. 2008. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 36:D753‐D760. | |
Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., Madera, M., Chothia, C., and Gough, J. 2009. SUPERFAMILY: Sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37:D380‐D386. | |
Internet Resources | |
http://www.ensembl.org/ | |
Ensembl project home page. | |
http://www.ensembl.org/info/website/tutorials/index.html | |
Support videos and other tutorials for Ensembl. | |
http://www.biomart.org/ | |
BioMart Project. | |
http://biodas.org/ | |
Distributed Annotation System (DAS) and BioDAS. | |
http://www.ncbi.nlm.nih.gov/projects/SNP/ | |
dbSNP: a repository of polymorphisms. | |
http://www.geneontology.org/ | |
Gene Ontology Consortium. | |
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/index.shtml | |
Genome Reference Consortium: houses the reference human genome. | |
http://www.genome.gov/26525384#1 | |
NCBI GWAS catalog. | |
http://www.hapmap.org | |
An international organization working towards a haplotype map of the human genome. | |
http://www.genenames.org/ | |
HUGO Gene Nomenclature Committee (HGNC). | |
http://www.ebi.ac.uk/interpro/ | |
InterPro, a collection of protein signatures. | |
http://www.ncbi.nlm.nih.gov/omim | |
Online Mendelian Inheritance in Man, a set of human genes and phenotypes. | |
http://www.ncbi.nih.gov/RefSeq/ | |
A multi‐organism, nonredundant database of sequences. | |
http://www.uniprot.org | |
UniProtKB, a catalog of information on proteins. | |
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unists | |
UniSTS, databank for chromosomal markers. | |
http://vega.sanger.ac.uk/ | |
Vertebrate Genome Annotation (VEGA) at Sanger Institute. | |
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/index.shtml | |
International Human Genome Sequencing Consortium. |