Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
High?throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high?throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large?scale analysis with just a Web browser. Curr. Protoc. Mol. Biol. 89:19.10.1?19.10.21. © 2010 by John Wiley & Sons, Inc.
Keywords: Galaxy; analysis; bioinformatics; workflow; algorithm; pipeline; genomics; SNPs
Table of Contents
- Introduction
- Basic Protocol 1: An Introduction to the Galaxy Approach: Finding Promoters Containing TAF1 Binding Sites Identified From a ChIP‐Seq Experiment
- Basic Protocol 2: Combining and Filtering Genome Annotations: Finding Exons with the Highest Number of Nucleotide Polymorphisms
- Support Protocol 1: Saving Results in Galaxy and Sharing Data with Others
- Basic Protocol 3: Generating a Workflow From a History in Galaxy
- Support Protocol 2: Modify a Parameter in the Workflow in Galaxy
- Support Protocol 3: Running Workflows with Galaxy
- Support Protocol 4: Sharing Workflows with Galaxy
- Basic Protocol 4: Generating Workflows from Scratch with Galaxy
- Basic Protocol 5: Extracting Sequences and Alignments with Galaxy: An SNPs in Exons Example
- Commentary
- Literature Cited
- Figures
Materials
Basic Protocol 1: An Introduction to the Galaxy Approach: Finding Promoters Containing TAF1 Binding Sites Identified From a ChIP‐Seq Experiment
Materials
Basic Protocol 2: Combining and Filtering Genome Annotations: Finding Exons with the Highest Number of Nucleotide Polymorphisms
Materials
Support Protocol 1: Saving Results in Galaxy and Sharing Data with Others
Materials
Basic Protocol 3: Generating a Workflow From a History in Galaxy
Materials
Support Protocol 2: Modify a Parameter in the Workflow in Galaxy
Materials
Support Protocol 3: Running Workflows with Galaxy
Materials
Support Protocol 4: Sharing Workflows with Galaxy
Materials
Basic Protocol 4: Generating Workflows from Scratch with Galaxy
Materials
Basic Protocol 5: Extracting Sequences and Alignments with Galaxy: An SNPs in Exons Example
Materials
|
Figures
-
Figure 19.10.1 Galaxy's Analyze Data interface consists of four regions: the masthead (A ) at the top, the tool menu; (B ) on the left‐hand side, the work area; (C ) in the middle; and the history panel (D ) on the right. The Get Data section has been expanded in the tool menu and the Upload File tool has been selected. In the work area, a local file containing TAF1 ChIP‐Seq data has been chosen (see , step 1); clicking the Execute button will cause the data to be uploaded and appear in the history panel. See the TAF1 screencast (http://galaxycast.org/cpmb‐2009‐1) for more details. View Image -
Figure 19.10.2 To change the properties of a dataset (see , step 2), click on the question mark (or the pencil icon) associated with the dataset in the history panel (A ). This causes the Edit Attributes page to appear in the center panel (B ) where the datatype has been changed from tabular to interval. Clicking Save causes the page to refresh, allowing additional interval‐specific information to be set (C ). View Image -
Figure 19.10.3 The UCSC Table browser tool has been selected and its interface (A ) appears in the center panel. The refGene table has been selected and the output is marked to be sent to Galaxy (see , step 3). Once output style is specified (B ), clicking Send query to Galaxy will create a new dataset in the history panel. The history item has been renamed to RefSeq after clicking on the pencil icon next to its name and making the required changes in the Edit Attributes page (see Fig. ) which appears. View Image -
Figure 19.10.4 Selecting the Get flanks tool (see , step 4) from the Operate on Genomic Intervals Section (A ) allows the creation of new data containing the region 1000 nucleotides upstream of our RefSeq genes (B ). View Image -
Figure 19.10.5 The Join tool is used to create a dataset that contains the coordinates of putative promoters and TAF1 binding sites side by side (see , step 6). View Image -
Figure 19.10.6 The Build custom track tool (see , step 7) allows the user to design a custom track suitable for display at the UCSC Genome Browser (D ) by progressively adding new tracks containing varying datasets (A ‐C ). View Image -
Figure 19.10.7 A dataset containing exons and overlapping SNPs was created (see , step 4) using the Join tool and displayed in the middle panel by clicking on the eye icon next to dataset 3. A red rectangle has been drawn around an exon, which overlaps with four SNPs. See the Exons and SNPs screencast (http://galaxycast.org/cpmb‐2009‐2) for more details. View Image -
Figure 19.10.8 To create a workflow from an existing history (see ), the user needs to make sure that they are logged in and then select History Options and click Extract Workflow. A new workflow will be populated from the current history as shown; the workflow can now be renamed and created. See the Workflow screencast (http://galaxycast.org/cpmb‐2009‐4) for more details. View Image -
Figure 19.10.9 The Workflow Editor allows users to click to add new tools and connect the output of one tool to the input of another by simple clicking and dragging. Here, the output of the Sort tool is being connected to the Select first tool (see , step 9), as is shown by the green rope; when the mouse button is released, the connection will be created and the rope will become white. View Image -
Figure 19.10.10 Several options exist for obtaining multi‐species alignments (see ). The Extract MAF blocks tool (A ) creates a MAF dataset, which contains only the trimmed alignment blocks that overlap a specified set of intervals. The Stitch MAF blocks tool (B ) creates a FASTA file, which contains a single alignment block per provided interval. See the SeqAlign screencast (http://galaxycast.org/cpmb‐2009‐6) for more details. View Image
Videos
Literature Cited
Literature Cited | |
Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32:D493‐D496. | |
Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., and Kent, W.J. 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36:D773‐D779. | |
Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond, T.A., Wu, Y., Green, R.D., and Ren, B. 2005. A high‐resolution map of active promoters in the human genome. Nature 436:876‐880. | |
Taylor, J., Schenck, I., Blankenberg, D., and Nekrutenko, A. 2007. Using galaxy to perform large‐scale interactive data analyses. Curr. Protoc. Bioinformatics 19:10.5.1‐10.5.25. |