Next Generation Sequence Assembly with AMOS

互联网2013-12-31

1070

Abstract
Table of Contents
Figures
Literature Cited

Abstract

A Modular Open?Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality.Curr. Protoc. Bioinform. 33:11.8.1?11.8.18. © 2011 by John Wiley & Sons, Inc.

Keywords: next?generation sequencing; genome assembly; open source

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Assembly of a Small Bacterial Genome
Basic Protocol 2: Assembly of a Phage Genome
Basic Protocol 3: Metagenomic Example Assembly
Support Protocol 1: Downloading and Installing AMOS
Support Protocol 2: Modifying the Minimus/Minimo Pipeline
Support Protocol 3: Validating an Assembly Inside AMOS
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 11.8.1 Overview of the AMOS assembly pipeline. On the left‐hand side, we see the various AMOS assembly pipelines and modules, including Minimus/Minimo and Bambus 2.0/goBambus2. On the right‐hand side, we see the interaction between each individual module and the AMOS bank, a database that stores the reads, overlaps, layouts, contigs, contig links, contig edges, repeats, scaffolds, and assembly information.

View Image

Figure 11.8.2 Screenshot of Minimus output.

View Image
Figure 11.8.3 Screenshot of C. ruddii assembled contigs.

View Image
Figure 11.8.4 Screenshot of analyze‐read‐depth output.

View Image
Figure 11.8.5 Visualization of an assembly with Hawkeye.

View Image
Figure 11.8.6 Screenshot of Minimo usage and options.

View Image
Figure 11.8.7 Screenshot of Minimo output.

View Image

Figure 11.8.8 Screenshot of the number of contigs, average contig length, and N50 calculated by getN50.

View Image

Figure 11.8.9 Screenshot of the assembly statistics generated with getN50.

View Image
Figure 11.8.10 Screenshot of the assembly statistics generated with getN50.

View Image
Figure 11.8.11 Screenshot of goBambus2.py output.

View Image
Figure 11.8.12 Screenshot of the modules comprising the Minimus pipeline.

View Image

Figure 11.8.13 Screenshot of the number of contigs, average contig length, and N50 calculated by getN50.

View Image

Figure 11.8.14 Screenshot of amosvalidate output.

View Image

Videos

Literature Cited

	Angly, F.E. Willner, D., Prieto‐Davó, A., Edwards, R.A., Schmieder, R., Vega‐Thurber, R., Antonopoulos, D.A., Barott, K., Cottrell, M.T., Desnues, C., Dinsdale, E.A., Furlan, M., Haynes, M., Henn, M.R., Hu, Y., Kirchman, D.L., McDole, T., McPherson, J.D., Meyer, F., Miller, R.M., Mundt, E., Naviaux, R.K., Rodriguez‐Mueller, B., Stevens, R., Wegley, L., Zhang, L., Zhu, B., and Rohwer, F. 2009. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput. Biol. 5:e100593.
	Berger, B., Laserson, J., Jojic, V., and Koller, D. 2010. Research in Computational Molecular Biology Springer, (B. Berger, ed.) Heidelberg, Germany.
	Dalloul, R.A., Long, J.A., Zimin, A.V., Aslam, L., Beal, K., Ann Blomberg, L., Bouffard, P., Burt, D.W., Crasta, O., Crooijmans, R.P., Cooper, K., Coulombe, R.A., De, S., Delany, M.E., Dodgson, J.B., Dong, J.J., Evans, C., Frederickson, K.M., Flicek, P., Florea, L., Folkerts, O., Groenen, M.A., Harkins, T.T., Herrero, J., Hoffmann, S., Megens, H.J., Jiang, A., de Jong, P., Kaiser, P., Kim, H., Kim, K.W., Kim, S., Langenberger, D., Lee, M.K., Lee, T., Mane, S., Marcais, G., Marz, M., McElroy, A.P., Modise, T., Nefedov, M., Notredame, C., Paton, I.R., Payne, W.S., Pertea, G., Prickett, D., Puiu, D., Qioa, D., Raineri, E., Ruffier, M., Salzberg, S.L., Schatz, M.C., Scheuring, C., Schmidt, C.J., Schroeder, S., Searle, S.M., Smith, E.J., Smith, J., Sonstegard, T.S., Stadler, P.F., Tafer, H., Tu, Z.J., Van Tassell, C.P., Vilella, A.J., Williams, K.P., Yorke, J.A., Zhang, L., Zhang, H.B., Zhang, X., Zhang, Y., and Reed, K.M. 2010. Multi‐platform next‐generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol. 8:e100475.
	Fleischmann, R., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., et al. 1995. Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496‐512.
	Idury, R.M. and Waterman, M.S. 1995. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2:291‐306.
	Kelley, D.R., Schatz, M.C., and Salzberg, S.L. 2010. Quake: quality‐aware detection and correction of sequencing errors. Genome Biol. 11:R116.
	Kislyuk, A.O., Katz, L.S., Agrawal, S., Hagen, M.S., Conley, A.B., Jayaraman, P., Nelakuditi, V., Humphrey, J.C., Sammons, S.A., Govil, D., Mair, R.D., Tatti, K.M., Tondella, M.L., Harcourt, B.H., Mayer, L.W., and Jordan, I.K. 2010. A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819‐1826.
	Li, Y., Hu, Y., Bolund, L., and Wang, J. 2010. State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum. Genomics 4:271‐277.
	Margulies, M. Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., and Rothberg, J.M. 2005. Genome sequencing in microfabricated high‐density picolitre reactors. Nature 437:376‐380.
	Miller, J.R., Koren, S., and Sutton, G. 2010. Assembly algorithms for next‐generation sequencing data. Genomics 95:315‐327.
	Myers, E.W. 2000. A Whole‐genome assembly of drosophila. Science 287:2196‐2204.
	Myers, E.W. 2005. The fragment assembly string graph. Bioinformatics 21:ii79‐ii85.
	Nagarajan, N. and Pop, M. 2010. Sequencing and genome assembly using next‐generation technologies. Methods Mol. Biol. 673:1‐17.
	Pevzner, P.A., Tang, H., and Waterman, M.S. 2001. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98:9748‐9753.
	Phillippy, A.M., Schatz, M.C, and Pop, M. 2008. Genome assembly forensics: Finding the elusive mis‐assembly. Genome Biol. 9:R55.
	Pop, M. and Salzberg, S.L. 2008. Bioinformatics challenges of new sequencing technology. Trends Genet. 24:142‐149.
	Pop, M., Kosack, D.S., and Salzberg, S.L. 2004. Hierarchical scaffolding with Bambus. Genome Res. 14:149‐159.
	Salmela, L. 2010. Correction of sequencing errors in a mixed set of reads. Bioinformatics 26:1284‐1290.
	Sanger, F. and Coulson, A. 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94:441‐448.
	Schatz, M.C., Phillippy, A. M., Shneiderman, B., and Salzberg, S.L. 2007. Hawkeye: An interactive visual analytics tool for genome assemblies. Genome Biol. 8:R34.
	Schatz, M.C., Delcher, A.L., and Salzberg, S.L. 2010. Assembly of large genomes using second‐generation sequencing. Genome Res. 20:1165‐1173.
	Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., and Birol, I. 2009. ABySS: A parallel assembler for short read sequence data. Genome Res. 19:1117‐1123.
	Sommer, D.D., Delcher, A.L., Salzberg, S.L., and Pop, M. 2007. Minimus: A fast, lightweight genome assembler. BMC Bioinformatics 8:64.
	Yang, X., Dorman, K.S., and Aluru, S. 2010. Reptile: Representative tiling for short read error correction. Bioinformatics 26:2526‐2533.
	Zerbino, D.R. and Birney, E. 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821‐829.
	Zhao, X., Palmer, L.E., Bolanos, R., Mircean, C., Fasulo, D., and Wittenberg, G.M. 2010. EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol. 17:1549‐1560.
Key References
	Sommer et al., 2007. See above.
	This first publication describing Minimus focused on the algorithm and implementation details. It also includes assemblies of a gene and bacterium.
	Pop et al., 2004. See above.
	This is the original Bambus publication describing the scaffolder's algorithm and implementation. A new publication describing Bambus 2, the updated scaffolder referenced throughout this manuscript, is currently under review.
Internet Resources
	http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS
	AMOS Sourceforge website, where code, tutorials and general information on AMOS can be accessed.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Next Generation Sequence Assembly with AMOS

Abstract

Table of Contents

Materials

Figures

Videos

Literature Cited