丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Genotyping in the Cloud with Crossbow

互联网

415
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high?coverage, short?read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service. Curr. Protoc. Bioinform. 39:15.3.1?15.3.15. © 2012 by John Wiley & Sons, Inc.

Keywords: short reads; read alignment; SNP calling; cloud computing; Hadoop; software package

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Running Crossbow on a Local Hadoop Cluster
  • Basic Protocol 2: Running Crossbow in Single‐Computer Mode
  • Basic Protocol 3: Running Crossbow on Amazon Web Services via the Command Line
  • Alternate Protocol 1: Running Crossbow on Amazon Web Services via the Web Interface
  • Support Protocol 1: Obtaining and Installing Crossbow
  • Support Protocol 2: Preparing Manifest Files with Sequence Read Information
  • Support Protocol 3: Preparing Reference Jars with Reference Genome Information
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 15.3.1 Screenshot of the Crossbow Web interface for executing Crossbow on the Amazon cloud entirely from within your Web browser.
    View Image

Videos

Literature Cited

   Dean, J. and Ghemawat, S. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51:107‐113.
   Langmead, B., Schatz, M.C., Lin, J., Pop, M., and Salzberg, S.L. 2009a. Searching for SNPs with cloud computing. Genome Biol. 10:R134.
   Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. 2009b. Ultrafast and memory‐efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25.
   Langmead, B., Hansen, K.D., and Leek, J. 2010. T. Cloud‐scale RNA‐sequencing differential expression analysis with Myrna. Genome Biol. 11:R83.
   Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., Wang, J. 2009. SNP detection for massively parallel whole‐genome resequencing. Genome Res. 19:1124‐1132.
   Parkhomchuk, D., Amstislavskiy, V., Soldatov, A., and Ogryzko, V. 2009. Use of high throughput sequencing to observe genome dynamics at a single cell level. Proc. Natl. Acad. Sci. U.S.A. 106:20830‐20835.
   Pennisi, E. 2011. Human genome 10th anniversary: Will computers crash genomics? Science 331:666‐668.
   Schatz, M.C., Langmead, B., and Salzberg, S.L. 2010. Cloud computing and the DNA data race. Nat. Biotechnol. 28:691‐693.
   Sudbery, I., Stalker, J., Simpson, J.T., Keane, T., Rust, A.G., Hurles, M.E., Walter, K., Lynch, D., Teboul, L., Brown, S.D., Li, H., Ning, Z., Nadeau, J.H., Croniger, C.M., Durbin, R., and Adams, D.J. 2009. Deep short‐read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels. Genome Biol. 10:R112.
Key Reference
   Langmead et al., 2009a. See above.
   The original paper describing the Crossbow software tool.
Internet Resources
   http://bowtie‐bio.sourceforge.net/crossbow
   Web site where the latest version of the software as well as an extensive manual are available.
   http://hadoop.apache.org
   Web site with the Hadoop documentation and software.
   http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/
   Describes how to get started using the Amazon Web services including the elastic compute cloud (EC2) and the simple storage system (S3).
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序