Obtaining Comparative Genomic Data with the VISTA Family of Computational Tools
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements, such as exons or enhancers, as they tend to exhibit significant sequence similarity due to purifying selection. Availability of whole?genome sequences for a constantly growing number of organisms makes identification of such elements within these genomes possible. There are two distinct phases in comparisons of genomic sequences: in the first, the sequences are aligned, and in the second, the resulting alignments are analyzed to find conservation signals that may be indicative of functional regions. Due to the considerable length of alignments, good visual representation techniques are a necessity for effective isolation of regions of interest. The VISTA family of tools provides biomedical investigators with a unified framework for the alignment of long genomic sequences and whole?genome assemblies, interactive visual analysis of alignments along with functional annotation, and many other comparative genomics capabilities. Curr. Protoc. Bioinform. 26:10.6.1?10.6.17. © 2009 by John Wiley & Sons, Inc.
Keywords: comparative genomics; DNA alignment; VISTA; genome browser
Table of Contents
- Introduction
- Basic Protocol 1: Analyzing Comparative Genomic Data with the VISTA Browser
- Basic Protocol 2: Browsing the Alignment and Retrieving SNP Information Using Base‐Pair Level Alignment Panel
- Basic Protocol 3: Obtaining Detailed Comparative Data Including Genomic Coordinates and Parameters of Conserved Regions with the Text Browser
- Basic Protocol 4: Finding Candidate Orthologous Regions on a Base Genome Using the GenomeVISTA Server
- Basic Protocol 5: Finding Putative Transcription Factor Binding Sites (TFBS) Using rVISTA Server
- Basic Protocol 6: Finding Experimentally Verified Enhancers Using the VISTA Enhancer Browser
- Understanding Results
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Figures
-
Figure 10.6.1 VISTA Browser gateway page consists of three main parts. The top panel contains a toolbar with the links to different VISTA tools and servers. The middle panel contains a menu for the selection of a base (reference) genome and a position on it, which are coordinates on the genome, a gene name, a SNP index, or a contig name. The bottom panel contains the links to the sources of the genome assemblies. View Image -
Figure 10.6.2 VISTA Genome Browser shows the results of the multiple alignment. To change the order in which the curves are displayed, select the curve you want to move up or down and click the Up/Down buttons next to the curve name at the bottom of the screen. View Image -
Figure 10.6.3 Changing curve parameters. Calc Window: the size of the sliding window used to calculate conservation scores at each base pair for the VISTA curve. Min Cons Width: minimum width of a conserved region. Cons Identity: minimum percent identity over the window (Min Cons Width) for a region to be considered conserved. Minimum Y, Maximum Y: lower and upper boundaries of the graph, dropping the minimum Y value in areas of low conservation will allow you to see the smaller peaks. Curve Name: the label that is associated with the curve. Provider: the source of the reference genome assembly. View Image -
Figure 10.6.4 Changing the base genome. Pull‐down context menu is invoked on the Human/Mouse graph. Click the Change base genome button to use a selected organism as a base or reference. In this case, more than one region of the Mouse genome was aligned to the human interval displayed in the browser. Using the mouse, click on the selected region to bring up the VISTA browser with the VISTA curves generated for all mouse alignments covering this region. View Image -
Figure 10.6.5 Zooming in the high conservation noncoding interval of the alignment. Highly conserved noncoding regions are identified by pink color. To zoom in, highlight the area you want to see in detail by holding down the left mouse button, while moving the mouse over the region of interest, the same way you would highlight a sentence in Word. The browser will zoom in on the selected area once you let go of the mouse button. View Image -
Figure 10.6.6 Base‐pair level alignment panel (the bottom panel). Base organism's name is displayed in red. Strand directions are indicated by a (+) or (‐) sign. Coordinates on the base genome are drawn above the sequences. A single nucleotide polymorphism (SNP) is indicated by a red border around a base pair. Positioning a mouse over a SNP displays its summary information. View Image -
Figure 10.6.7 Text Browser shows detailed information about the aligned regions. The annotation of the base genome can be retrieved by clicking on the Download RefSeq genes link. If there are several annotations available, you can change the annotation to download by using the Change Annotation list. View Image -
Figure 10.6.8 Genome VISTA submission form. The list of reference genomes available for comparison is shown in the drop‐down list. View Image -
Figure 10.6.9 VISTA Browser displays the alignments with Human May 2004 genome assembly as a base genome. Sequence1—sequence from Mouse, July 2007 genome assembly. The rest of the graphs belong to the multiple alignment. View Image -
Figure 10.6.10 The rVISTA submission options. rVISTA makes predictions by the Match program based on the TRANSFAC Professional database, user‐defined consensus sequences, or user‐defined matrices. TRANSFAC searches are performed using a default matrix‐similarity value of 0.70 and a core similarity value of 0.75. Please consult the extensive help page supplied with the tool on the explanation of the TRANSFAC cut‐off selections and on the format of user‐defined consensus sequences or matrices. View Image -
Figure 10.6.11 The rVISTA visualization options. Clustering allows the users to identify transcription factor binding sites that are present in groups or clusters. Conserved binding sites are defined as predicted binding sites located in the sequence fragments conserved between two species at the level of over 80% over a 24 bp window. Aligned binding sites are those where core positions of the potential binding sites on the sequences corresponded to each other in the alignment. “All” binding sites shows all sites, regardless of the alignment and conservation. View Image -
Figure 10.6.12 rVISTA results. Transcription factors binding sites are shown as tick marks above a regular VISTA curve. Green tick marks represent conserved binding sites, red represents aligned sites, and blue tick marks represent all found sites. View Image -
Figure 10.6.13 VISTA Enhancer Browser home page. Searchable keywords include Gene Symbols, GenBank Accession Numbers, and Entrez Gene Numbers. Additional search options are available in the Advanced Query form. View Image -
Figure 10.6.14 VISTA Enhancer Browser advanced query form. Restricting searches to ultra‐conserved elements (i.e., conserved in human/mouse/rat at 100% identity over 200 base pair or more), also conserved in human‐chicken, expressed in the neural tube, positive for enhancer activity. View Image -
Figure 10.6.15 VISTA Enhancer Browser search results. Search was restricted to ultra‐conserved elements, also conserved in human‐chicken, expressed in the neural tube, positive for enhancer activity. View Image
Videos
Literature Cited
Literature Cited | |
Bray, N., Dubchak, I., and Pachter, L. 2003. AVID: A global alignment program. Genome Res 13:97‐102. | |
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., and Batzoglou, S. 2003a. LAGAN and Multi‐LAGAN: Efficient tools for large‐scale multiple alignment of genomic DNA. Genome Res. 13:721‐731. | |
Brudno, M., Malde, S., Poliakov, A., Do, C.B., Couronne, O., Dubchak, I., and Batzoglou, S. 2003b. Global alignment: Finding rearrangements during alignment. Bioinformatics 19:i54‐i62. | |
Brudno, M., Poliakov, A., Salamov, A., Cooper, G.M., Sidow, A., Rubin, E.M., Solovyev, V., Batzoglou, S., and Dubchak, I. 2004. Automated whole‐genome multiple alignment of rat, mouse, and human. Genome Res. 14:685‐692. | |
Couronne, O., Poliakov, A., Bray, N., Ishkhanov, T., Ryaboy, D., Rubin, E., Pachter, L., and Dubchak, I. 2003. Strategies and tools for whole‐genome alignments. Genome Res. 13:73‐80. | |
Dubchak, I., Poliakov, A., Kislyuk, A., and Brudno, M. 2009. Multiple whole‐genome alignments without a reference organism. Genome Res. 19:682‐689. | |
Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta‐Vidal, A., Vastrik, I., and Clamp, M. 2002. The Ensembl genome database project. Nucleic Acids Res. 30:38‐41. | |
Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Kober, K.M., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., and Kent, W.J. 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36:D773‐D779. | |
Loots, G.G., Ovcharenko, I., Pachter, L., Dubchak, I., and Rubin, E.M. 2002. rVista for comparative sequence‐based discovery of functional transcription factor binding sites. Genome Res. 12:832‐839. | |
Markowitz, V.M., Szeto, E., Palaniappan, K., Grechkin, Y., Chu, K., Chen, I.M., Dubchak, I., Anderson, I., Lykidis, A., Mavromatis, K., Ivanova, N.N., and Kyrpides, N.C. 2008. The integrated microbial genomes (IMG) system in 2007: Data content and analysis tool extensions. Nucleic Acids Res. 36:D528‐D533. | |
Mayor, C., Brudno, M., Schwartz, J.R., Poliakov, A., Rubin, E.M., Frazer, K.A., Pachter, L.S., and Dubchak, I. 2000. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16:1046‐1047. | |
Ramensky, V., Bork, P., and Sunyaev, S. 2002. Human non‐synonymous SNPs: Server and survey. Nucleic Acids Res. 30:3894‐3900. | |
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L.Y., Helmberg, W., Kapustin, Y., Kenton, D.L., Khovayko, O., Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pruitt, K.D., Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Suzek, T.O., Tatusov, R., Tatusova, T.A., Wagner, L., and Yaschenko, E. 2007. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35:D5‐D12. |