Next Generation Sequence Assembly with AMOS
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
A Modular Open?Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality.Curr. Protoc. Bioinform. 33:11.8.1?11.8.18. © 2011 by John Wiley & Sons, Inc.
Keywords: next?generation sequencing; genome assembly; open source
Table of Contents
- Introduction
- Basic Protocol 1: Assembly of a Small Bacterial Genome
- Basic Protocol 2: Assembly of a Phage Genome
- Basic Protocol 3: Metagenomic Example Assembly
- Support Protocol 1: Downloading and Installing AMOS
- Support Protocol 2: Modifying the Minimus/Minimo Pipeline
- Support Protocol 3: Validating an Assembly Inside AMOS
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Figures
-
Figure 11.8.1 Overview of the AMOS assembly pipeline. On the left‐hand side, we see the various AMOS assembly pipelines and modules, including Minimus/Minimo and Bambus 2.0/goBambus2. On the right‐hand side, we see the interaction between each individual module and the AMOS bank, a database that stores the reads, overlaps, layouts, contigs, contig links, contig edges, repeats, scaffolds, and assembly information. View Image -
Figure 11.8.2 Screenshot of Minimus output. View Image -
Figure 11.8.3 Screenshot of C. ruddii assembled contigs. View Image -
Figure 11.8.4 Screenshot of analyze‐read‐depth output. View Image -
Figure 11.8.5 Visualization of an assembly with Hawkeye. View Image -
Figure 11.8.6 Screenshot of Minimo usage and options. View Image -
Figure 11.8.7 Screenshot of Minimo output. View Image -
Figure 11.8.8 Screenshot of the number of contigs, average contig length, and N50 calculated by getN50. View Image -
Figure 11.8.9 Screenshot of the assembly statistics generated with getN50. View Image -
Figure 11.8.10 Screenshot of the assembly statistics generated with getN50. View Image -
Figure 11.8.11 Screenshot of goBambus2.py output. View Image -
Figure 11.8.12 Screenshot of the modules comprising the Minimus pipeline. View Image -
Figure 11.8.13 Screenshot of the number of contigs, average contig length, and N50 calculated by getN50. View Image -
Figure 11.8.14 Screenshot of amosvalidate output. View Image
Videos
Literature Cited
Angly, F.E. Willner, D., Prieto‐Davó, A., Edwards, R.A., Schmieder, R., Vega‐Thurber, R., Antonopoulos, D.A., Barott, K., Cottrell, M.T., Desnues, C., Dinsdale, E.A., Furlan, M., Haynes, M., Henn, M.R., Hu, Y., Kirchman, D.L., McDole, T., McPherson, J.D., Meyer, F., Miller, R.M., Mundt, E., Naviaux, R.K., Rodriguez‐Mueller, B., Stevens, R., Wegley, L., Zhang, L., Zhu, B., and Rohwer, F. 2009. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput. Biol. 5:e100593. | |
Berger, B., Laserson, J., Jojic, V., and Koller, D. 2010. Research in Computational Molecular Biology Springer, (B. Berger, ed.) Heidelberg, Germany. | |
Dalloul, R.A., Long, J.A., Zimin, A.V., Aslam, L., Beal, K., Ann Blomberg, L., Bouffard, P., Burt, D.W., Crasta, O., Crooijmans, R.P., Cooper, K., Coulombe, R.A., De, S., Delany, M.E., Dodgson, J.B., Dong, J.J., Evans, C., Frederickson, K.M., Flicek, P., Florea, L., Folkerts, O., Groenen, M.A., Harkins, T.T., Herrero, J., Hoffmann, S., Megens, H.J., Jiang, A., de Jong, P., Kaiser, P., Kim, H., Kim, K.W., Kim, S., Langenberger, D., Lee, M.K., Lee, T., Mane, S., Marcais, G., Marz, M., McElroy, A.P., Modise, T., Nefedov, M., Notredame, C., Paton, I.R., Payne, W.S., Pertea, G., Prickett, D., Puiu, D., Qioa, D., Raineri, E., Ruffier, M., Salzberg, S.L., Schatz, M.C., Scheuring, C., Schmidt, C.J., Schroeder, S., Searle, S.M., Smith, E.J., Smith, J., Sonstegard, T.S., Stadler, P.F., Tafer, H., Tu, Z.J., Van Tassell, C.P., Vilella, A.J., Williams, K.P., Yorke, J.A., Zhang, L., Zhang, H.B., Zhang, X., Zhang, Y., and Reed, K.M. 2010. Multi‐platform next‐generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol. 8:e100475. | |
Fleischmann, R., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., et al. 1995. Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496‐512. | |
Idury, R.M. and Waterman, M.S. 1995. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2:291‐306. | |
Kelley, D.R., Schatz, M.C., and Salzberg, S.L. 2010. Quake: quality‐aware detection and correction of sequencing errors. Genome Biol. 11:R116. | |
Kislyuk, A.O., Katz, L.S., Agrawal, S., Hagen, M.S., Conley, A.B., Jayaraman, P., Nelakuditi, V., Humphrey, J.C., Sammons, S.A., Govil, D., Mair, R.D., Tatti, K.M., Tondella, M.L., Harcourt, B.H., Mayer, L.W., and Jordan, I.K. 2010. A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819‐1826. | |
Li, Y., Hu, Y., Bolund, L., and Wang, J. 2010. State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum. Genomics 4:271‐277. | |
Margulies, M. Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., and Rothberg, J.M. 2005. Genome sequencing in microfabricated high‐density picolitre reactors. Nature 437:376‐380. | |
Miller, J.R., Koren, S., and Sutton, G. 2010. Assembly algorithms for next‐generation sequencing data. Genomics 95:315‐327. | |
Myers, E.W. 2000. A Whole‐genome assembly of drosophila. Science 287:2196‐2204. | |
Myers, E.W. 2005. The fragment assembly string graph. Bioinformatics 21:ii79‐ii85. | |
Nagarajan, N. and Pop, M. 2010. Sequencing and genome assembly using next‐generation technologies. Methods Mol. Biol. 673:1‐17. | |
Pevzner, P.A., Tang, H., and Waterman, M.S. 2001. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98:9748‐9753. | |
Phillippy, A.M., Schatz, M.C, and Pop, M. 2008. Genome assembly forensics: Finding the elusive mis‐assembly. Genome Biol. 9:R55. | |
Pop, M. and Salzberg, S.L. 2008. Bioinformatics challenges of new sequencing technology. Trends Genet. 24:142‐149. | |
Pop, M., Kosack, D.S., and Salzberg, S.L. 2004. Hierarchical scaffolding with Bambus. Genome Res. 14:149‐159. | |
Salmela, L. 2010. Correction of sequencing errors in a mixed set of reads. Bioinformatics 26:1284‐1290. | |
Sanger, F. and Coulson, A. 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94:441‐448. | |
Schatz, M.C., Phillippy, A. M., Shneiderman, B., and Salzberg, S.L. 2007. Hawkeye: An interactive visual analytics tool for genome assemblies. Genome Biol. 8:R34. | |
Schatz, M.C., Delcher, A.L., and Salzberg, S.L. 2010. Assembly of large genomes using second‐generation sequencing. Genome Res. 20:1165‐1173. | |
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., and Birol, I. 2009. ABySS: A parallel assembler for short read sequence data. Genome Res. 19:1117‐1123. | |
Sommer, D.D., Delcher, A.L., Salzberg, S.L., and Pop, M. 2007. Minimus: A fast, lightweight genome assembler. BMC Bioinformatics 8:64. | |
Yang, X., Dorman, K.S., and Aluru, S. 2010. Reptile: Representative tiling for short read error correction. Bioinformatics 26:2526‐2533. | |
Zerbino, D.R. and Birney, E. 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821‐829. | |
Zhao, X., Palmer, L.E., Bolanos, R., Mircean, C., Fasulo, D., and Wittenberg, G.M. 2010. EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol. 17:1549‐1560. | |
Key References | |
Sommer et al., 2007. See above. | |
This first publication describing Minimus focused on the algorithm and implementation details. It also includes assemblies of a gene and bacterium. | |
Pop et al., 2004. See above. | |
This is the original Bambus publication describing the scaffolder's algorithm and implementation. A new publication describing Bambus 2, the updated scaffolder referenced throughout this manuscript, is currently under review. | |
Internet Resources | |
http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS | |
AMOS Sourceforge website, where code, tutorials and general information on AMOS can be accessed. |