Some Phenotype Association Tools in Galaxy: Looking for Disease SNPs in a Full Genome

互联网2013-12-31

566

Abstract
Table of Contents
Figures
Literature Cited

Abstract

This unit focuses on some of the tools available on the public Galaxy server that are useful for exploring possible associations between human genetic variants and phenotypes. We trace step?by?step through an example illustrating several methods for examining a single full?coverage genome to look for single?nucleotide polymorphisms (SNPs) that are either known to be associated with disease or suspected to have impact for other reasons. It makes use of public genomic data, tools designed specifically for working with variants, and also some general tools for text manipulation and operations on genomic coordinates. Curr. Protoc. Bioinform. 39:15.2.1?15.2.27. © 2012 by John Wiley & Sons, Inc.

Keywords: disease; SNP; genome variation; coding; non?coding; gene?based analysis; Web application

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Using Galaxy to Look for Disease SNPs in a Full Genome: Preparing Input Data
Basic Protocol 2: Selecting Known Coding SNPs Predicted to be Damaging, then Finding Their Genes and Associated Pathways
Basic Protocol 3: Running New Predictions of Coding SNPs Likely to be Detrimental
Basic Protocol 4: Finding SNPs that Fall in Suspected Functional Regions
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 15.2.1 Uploading a data file. See text for details.

View Image
Figure 15.2.2 Converting to pgSnp format. See text for details.

View Image
Figure 15.2.3 Putative SNP Phenotypes library. See text for details.

View Image
Figure 15.2.4 Removing SNPs found in healthy individuals. See text for details.

View Image
Figure 15.2.5 Completed input dataset. See text for details.

View Image
Figure 15.2.6 Details about the PolyPhen‐2 dataset. See text for details.

View Image
Figure 15.2.7 Joining on genomic intervals. See text for details.

View Image
Figure 15.2.8 Selecting damaging results. See text for details.

View Image
Figure 15.2.9 PolyPhen‐2 results. See text for details.

View Image
Figure 15.2.10 Mapping between identifiers. See text for details.

View Image
Figure 15.2.11 Choosing the identifier fields. See text for details.

View Image
Figure 15.2.12 Joining on identifiers. See text for details.

View Image
Figure 15.2.13 CTD. See text for details.

View Image
Figure 15.2.14 CTD results. See text for details.

View Image
Figure 15.2.15 Input for SIFT. See text for details.

View Image
Figure 15.2.16 Viewing the workflow. See text for details.

View Image
Figure 15.2.17 Running the workflow. See text for details.

View Image
Figure 15.2.18 SIFT. See text for details.

View Image
Figure 15.2.19 Selecting damaging SNPs. See text for details.

View Image
Figure 15.2.20 SIFT results. See text for details.

View Image
Figure 15.2.21 Intersecting with the PRPs. See text for details.

View Image
Figure 15.2.22 SNPs in PRPs. See text for details.

View Image
Figure 15.2.23 DNase hypersensitive sites (HSSs) from ENCODE. See text for details.

View Image
Figure 15.2.24 Intersecting with the HSSs. See text for details.

View Image
Figure 15.2.25 SNPs in HSSs. See text for details.

View Image
Figure 15.2.26 PhyloP. See text for details.

View Image
Figure 15.2.27 Distribution of phyloP scores. See text for details.

View Image
Figure 15.2.28 Histogram. See text for details.

View Image
Figure 15.2.29 Filtering the SNPs based on phyloP score. See text for details.

View Image
Figure 15.2.30 Highly conserved SNPs. See text for details.

View Image

Videos

Literature Cited

	Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. 2010. A method and server for predicting damaging missense mutations. Nat. Methods 7:248‐249.
	Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.
	Davis, A.P., Murphy, C.G., Saraceni‐Richards, C.A., Rosenstein, M.C., Wiegers, T.C., and Mattingly, C.J. 2009. Comparative Toxicogenomics Database: A knowledgebase and discovery tool for chemical.gene.disease networks. Nucleic Acids Res. 37:D786‐D792.
	Drmanac, R., Sparks, A.B., Callow, J.M., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G, Dahl, F., Fernandez, A., Staker, B., Pant, K.P., Baccash, J., Borcherding, A.P., Brownley, A., Cedeno, R., Chen, L., Chernikoff, D., Cheung, A., Chirita, R., Curson, B., Ebert, J.C., Hacker, C.R., Hartlage, R., Hauser, B., Huang, S., Jiang, Y., Karpinchyk, V., Koenig, M., Kong, C., Landers, T., Le, C., Liu, J., McBride, C.E., Morenzoni, M., Morey, R.E., Mutch, K., Perazich, H., Perry, K., Peters, B.A., Peterson, J., Pethiyagoda, C.L., Pothuraju, K., Richter, C., Rosenbaum, A.M., Roy, S., Shafto, J., Sharanhovich, U., Shannon, K.W., Sheppy, C.G., Sun, M., Thakuria, J.V., Tran, A., Vu, D., Zaranek, A.W., Wu, X., Drmanac, S., Oliphant, A.R., Banyai, W.C., Martin, B., Ballinger, D.G., Church, G.M., and Reid, C.A. 2009. Human genome sequencing using unchained base reads on self‐assembling DNA nanoarrays. Science 327:78‐81.
	Ferretti, V., Poitras, C., Bergeron, D., Coulombe, B., Robert, F., and Blanchette, M. 2007. PReMod: A database of genome‐wide mammalian cis‐regulatory module predictions. Nucleic Acids Res. 35:D122‐D126.
	Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W.J., and Nekrutenko, A. 2005. Galaxy: A platform for interactive large‐scale genome analysis. Genome Res. 15:1451‐1455.
	Giardine, B., Riemer, C., Hefferon, T., Thomas, D., Hsu, F., Zielenski, J., Sang, Y., Elnitski, L., Cutting, G., Trumbower, H., Kern, A., Kuhn, R., Patrinos, G.P., Hughes, J., Higgs, D., Chui, D., Scriver, C., Phommarinh, M., Patnaik, S.K., Blumenfeld, O., Gottlieb, B., Vihinen, M., Väliaho, J., Kent, J., Miller, W., and Hardison, R.C. 2007. PhenCode: Connecting ENCODE data with mutations and phenotype. Hum. Mutat. 28:554‐562.
	Goecks, J., Nekrutenko, A., Taylor, J.; Galaxy Team. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86.
	Huang, D.W., Sherman, B.T., and Lempicki, R.A. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4:44‐57.
	Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32:D493‐D496.
	Kumar, P., Henikoff, S., and Ng, P.C. 2009. Predicting the effects of coding non‐synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4:1073‐1081.
	Reimand, J., Kull, M., Peterson, H., Hansen, J., and Vilo, J. 2007. g:Profiler: A web‐based toolset for functional profiling of gene lists from large‐scale experiments. Nucleic Acids Res. 35:W193‐W200.
	Seal, R.L., Gordon, S.M., Lush, M.J., Wright, M.W., and Bruford, E.A. 2011. genenames.org: The HGNC resources in 2011. Nucleic Acids Res. 39:D514‐519.
	Siepel, A., Pollard, K.S., and Haussler, D. 2006. New methods for detecting lineage‐specific selection. In Proceedings of the 10th International Conference on Research in Computational Molecular Biology (RECOMB 2006), pp. 190‐205, Venice, Italy.
	Taylor, J., Tyekucheva, S., King, D.C., Hardison, R.C., Miller, W., and Chiaromonte, F. 2006. ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res. 16:1596‐1604.
Internet Resources
	http://galaxyproject.org
	The main public instance of Galaxy.
	http://phencode.bx.psu.edu
	A collection of human phenotype‐associated SNPs from Locus‐Specific Databases.
	http://www.bx.psu.edu/miller_lab/docs/galaxy_phen_assoc/tutorial/
	A version of this tutorial in HTML format.
	http://genome.ucsc.edu/FAQ/FAQformat.html
	Descriptions of file formats used by the UCSC Table Browser.
Supplementary File
	http://www.currentprotocols.com/protocol/bi1502
	This is an alternate URL to access the file “test.masterVar.gz” cited in , Necessary Resources, Files on page 15.2.3.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Some Phenotype Association Tools in Galaxy: Looking for Disease SNPs in a Full Genome

Abstract

Table of Contents

Materials

Figures

Videos

Literature Cited