SOLiD Sequencing

互联网2008-09-19

9032

Applied Biosystems has just launched their instrument, which supports their version of high-throughput sequencing chemistry, termed “SOLiD™” (little “i”, please). Acquired from Agencourt Personal Genomics in late 2006, SOLiD is a unique parallel chemistry which enables simultaneous sequencing of thousands of individual DNA molecules.

Here I will present a brief overview of the technology, aimed at those who haven’t had time to become intimately familiar with the chemistry. Figures and information taken directly from this presentation from ABI’s website.

Sequencing on the SOLiD machine starts with library preparation. In the simplest fragment library, two different adapters are ligated to sheared genomic DNA (left panel of Fig. 1). If more rigorous structural analysis is desired, a “mate-pair” library can be generated in a similar fashion, be incorporating a circularization/cleavage step prior to adapter ligation (right panel of Fig.1).

Figure 1. Library generation schematic.

Once the adapters are ligated to the library, emulsion PCR is conducted using the common primers to generate “bead clones” which each contain a single nucleic acid species.

Figure 2. Clonal bead library generation via emulsion PCR.

Each bead is then attached to the surface of a flow cell via 3’ modifications to the DNA strands.

Figure 3. Depositing beads into flow cell via end modifications.

At this point, we have a flow cell (basically a microscope slide that can be serially exposed to any liquids desired) whose surface is coated with thousands of beads each containing a single genomic DNA species, with unique adapters on either end. Each microbead can be considered a separate sequencing reaction which is monitored simultaneously via sequential digital imaging. Up to this point all next-gen sequencing technologies are very similar, this is where ABI/SOLiD diverges dramatically (see figure 4).

Figure 4. Schematic of ABI SOLiD sequencing chemistry. Note: Base 4/5 encoded probes as shown above were initial version of chemistry. SOLiD 2.0 chemistry utilizes 1/2 encoding (meaning bases 1/2 of the probe are the specific bases linked to the colorspace calls.

The actual base detection is no longer done by the polymerase-driven incorporation of labeled dideoxy terminators. Instead, SOLiD uses a mixture of labeled oligonucleotides and queries the input strand with ligase. Understanding the labeled oligo mixture is key to understanding SOLiD technology.

Each oligo has degenerate positions at 3’ bases 1-3 (N’s), and one of 16 specific dinucleotides at positions 4-5. Positions 6 through the 5’ are also degenerate, and hold one of four fluorescent dyes. The sequencing involves:
Hybridization and ligation of a specific oligo whose 4th & 5th bases match that of the template
Detection of the specific fluor
Cleavage of all bases to the 5’ of base 5
Repeat, this time querying the 9th & 10th bases
After 5-7 cycles of this, perform a “reset”, in which the initial primer and all ligated portions are melted from the template and discarded.
Next a new initial primer is used that is N-1 in length. Repeating the initial cycling (steps 1-4) now generates an overlapping data set (bases 3/4, 8/9, etc, see Fig 5).

Figure 5. Sequencing coverage during SOLiD sequencing cycles. Again this is original chemistry. Details of base coverage per cycle are different for SOLiD 2.0. Update coming soon.

Thus, 5-7 ligation reactions followed by a 4-5 primer reset cycles are repeated generating sequence data for ~35 contiguous bases, in which each base has been queried by two different oligonucleotides.

If you’re doing the math you’ve realized there are 16 possible dinucleotides (4^2) and only 4 dyes. So data from a single color does not tell you what base is at a given position. This is where the brilliance (and potential confusion) comes about with regard to SOLiD. There are 4 oligos for every dye, meaning there are four dinucleotides that are encoded by each dye.

For example (see Fig.4), the dinucleotides CA, AC, TG, and GT are all encoded by the green dye. Because each base is queried twice it is possible, using the two colors, to determine which bases were at which positions. This two color query system (known as “color space” in ABI-speak) has some interesting consequences with regard to the identification of errors. A detailed explanation of color space and it’s unique issues can be found in the PDF file attached to this post (“2Base_Pair_SOLiD_Data_V1.pdf”).

Hopefully that gives you a brief introduction to ABI’s SOLiD technology. Now if someone would just publish a study using the system so we could see what it can really do…I’ll save that for another day.