454 Sequencing
互联网
454 Life Sciences is a biotechnology company based in Branford, Connecticut specializing in high-throughput DNA sequencing using a novel massively parallel sequencing-by-synthesis approach. 454 has experienced rapid growth since its partnership with Roche Diagnostics and release of its GS20 sequencing machine in 2005 and GS FLX machine in 2007. As of 2005, the majority of 454's revenue came from sales of sequencing machines ($5.4 million), not in-house sequencing projects ($2.3 million).[2] 454 was founded by Jonathan Rothberg, and the underlying technology is based on pyrosequencing and was conceived while he was on paternity leave and wanted a way to sequence the genome of his new born son who had been placed in new born intensive care. For their invention, Dr. Rothberg and 454 Life Sciences were awarded the Wall Street Journal's Gold Medal for Innovation in 2005.
In November 2006, Dr. Rothberg, Michael Egholm, and colleagues at 454 published a cover article with Svante Paabo in Nature describing the first million base pairs of the Neanderthal genome, and initiated the Neanderthal Genome Project to complete the sequence of the Neanderthal genome by 2009.
In late March, 2007, Roche Diagnostics announced an agreement to purchase 454 Life Sciences for US$154.9 million. It will remain a separate business unit.
In May 2007, Project "Jim", a project initiated by Dr. Rothberg and 454 Life Sciences to determine the first sequence of an individual was completed. The results of the project, the complete genome sequence of James Dewey Watson, was handed to Dr. Watson at a ceremony taking place at Baylor College of Medicine.
Technology
454 Sequencing is a massively-parallel pyrosequencing system capable of sequencing roughly 100 megabases of raw DNA per 7-hour run of their current sequencing machine, the GSFLX. The system relies on fixing nebulized and adapter-ligated DNA fragments to small DNA-capture beads in a water-in-oil emulsion. The DNA fixed to these beads is then amplified by PCR. Finally, each DNA-bound bead is placed into a ~44 μm well on a PicoTiterPlate, a fiber optic chip. A mix of enzymes such as polymerase, ATP sulfurylase, and luciferase are also packed into the well. The PicoTiterPlate is then placed into the GS20 for sequencing.
DNA-bound beads placed into wellsAt this stage, the four nucleotides (TAGC) are washed in series over the PicoTiterPlate. During the nucleotide flow, each of the hundreds of thousands of beads with millions of copies of DNA is sequenced in parallel. If a nucleotide complementary to the template strand is flowed into a well, the polymerase extends the existing DNA strand by adding nucleotide(s). Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the instrument. This technique is based on sequencing-by-synthesis and is called pyrosequencing (Ronaghi et al. 1996 and 1998). The signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow. However, the signal strength for homopolymer stretches is linear only up to eight consecutive nucleotides after which the signal falls-off rapidly.[3]
DNA Library Preparation and emPCR
Genomic DNA is fractionated into smaller fragments (300-500 base pairs) that are subsequently polished (blunted). Short adaptors are then ligated onto the ends of the fragments. These adaptors provide priming sequences for both amplification and sequencing of the sample-library fragments. Adaptor B contains a 5'-biotin tag that enables immobilization of the library onto streptavidin coated beads. After nick repair, the non-biotinylated strand is released and used as a single-stranded template DNA (sstDNA) library. The sstDNA library is assessed for its quality and the optimal amount (DNA copies per bead) needed for emPCR is determined by titration.
The sstDNA library is immobilized onto beads. The beads containing a library fragment carry a single sstDNA molecule. The bead-bound library is emulsified with the amplification reagents in a water-in-oil mixture. Each bead is captured within its own microreactor where PCR amplification occurs. This results in bead-immobilized, clonally amplified DNA fragments.
Sequencing
sstDNA library beads are added to the DNA Bead Incubation Mix (containing DNA polymerase) and are layered with Enzyme Beads (containing sulfurylase and luciferase) onto the PicoTiterPlate device. The device is centrifuged to deposit the beads into the wells. The layer of Enzyme Beads ensures that the DNA beads remain positioned in the wells during the sequencing reaction. The bead-deposition process maximizes the number of wells that contain a single amplified library bead (avoiding more than one sstDNA library bead per well).
The loaded PicoTiterPlate device is placed into the GS20 Instrument. The fluidics sub-system flows sequencing reagents (containing buffers and nucleotides) across the wells of the plate. Nucleotides are flowed sequentially in a fixed order across the PicoTiterPlate device during a sequencing run. During the nucleotide flow, each of the hundreds of thousands of beads with millions of copies of DNA is sequenced in parallel. If a nucleotide complementary to the template strand is flowed into a well, the polymerase extends the existing DNA strand by adding nucleotide(s). Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded by the CCD camera in the Instrument. The signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow.[3]
Applications
While 454 Sequencing can sequence any double-stranded DNA, certain applications work better than others with the technology given its penchant for extremely high throughput and relatively short read lengths. In all, there are three major types of projects.
Whole Genome Assembly
Whole Genome Assembly (WGA) consists of projects dealing with the sequencing of the entire genome of an organism, for example, humans, dogs, mice, viruses or bacteria. Historically, the sequencing technique has focused on bacterial and viral genomes due to their lack of repetitive regions and relative ease of assembly. However, in June 2006 they launched a project with the Max Planck Institute for Evolutionary Anthropology to sequence the genome of the Neanderthal, the extinct closest relative of humans. This has implications for the understanding of human evolution and development. At 3 billion base pairs, a complete sequence of the Neanderthal genome is expected to take two years to finish.[4][5]
Ultra Broad PCR
Ultra Broad PCR includes fields such as cDNA pools, small RNA, SAGE/Ditag libraries, and other amplicon pools. The high-throughput nature of 454 Sequencing allows researchers to obtain up to 400,000 100-base pair reads per run, which works well with smaller amplicons which do not have to be nebulized before sequencing or reassembled bioinformatically after sequencing.
Ultra Deep PCR
Ultra Deep PCR is a very new field which is largely being enabled through 454 Sequencing technology. Unlike Sanger chain-termination sequencing, 454 sequencing allows mutations to be detected at extremely low levels. Thus, researchers are able to PCR amplify specific pools of cDNA or genomic segments.
Ultra Deep (Super)SAGE
454-sequencing allows for high throughput analysis of hundred-thousands of ditags from “Serial Analysis of Gene-Expression” (SAGE, or the most recent SuperSAGE), circumventing the laborious concatemerisation of the ditags required for Sanger-Sequencing and permits quantification of even rarely expressed genes at low costs.
Advantages and Disadvantages
454 Sequencing runs at 20 megabases per 4.5-hour run, allowing large amounts of DNA to be sequenced at low cost compared to Sanger chain-termination methods; G-C rich content is not as much of a problem, and the lack of reliance on cloning means that unclonable segments are not skipped. Also, it is capable of detecting mutations in an amplicon pool at a low sensitivity level, which may have implications in clinical research, especially cancer and HIV.[6][7]
However, each read of the GS20 is only 100 base pairs long at this time, resulting in some problems when dealing with highly repetitive genomes, as repetitive regions of over 100 base pairs cannot be "bridged" and thus must be left as separate contigs. Also, the nature of the technology lends itself to problems with long homopolymer runs.
The new FLX system does 200-300 base pairs and 454 has said they expect 500 in '08.