【精华】vol 687 Chapter 5 PCR引物与PCR引物设计

丁香园2015-04-01

6332

Methods in Molecular Biology vol.687 Park D.J. (ed.) PCR Protocols Chapter 5

PART II ：Cloning and Sequencing

CODEHOP PCR and CODEHOP PCR Primer Design

Jeannette P. Staheli, Richard Boyce, Dina Kovarik,and Timothy M. Rose

Abstract

While PCR primer design for the amplification of known sequences is usually quite straightforward, thedesign, and successful application of primers aimed at the detection of as yet unknown genes is often not. The search for genes that are presumed to be distantly related to a known gene sequence, such as homologousgenes in different species, paralogs in the same genome, or novel pathogens in diverse hosts, oftenturns into the proverbial search for the needle in the haystack. PCR-based methods commonly used toaddress this issue involve the use of either consensus primers or degenerate primers, both of which havesignificant shortcomings regarding sensitivity and specificity. We have developed a novel primer design approach that diminishes these shortcomings and instead takes advantage of the strengths of both consensusand degenerate primer designs, by combining the two concepts into a Consensus–Degenerate HybridOligonucleotide Primer (CODEHOP) approach. CODEHOP PCR primers contain a relatively shortdegenerate 3? core and a 5? nondegenerate clamp. The 3? degenerate core consists of a pool of primerscontaining all possible codons for a 3–4 aminoacid motif that is highly conserved in multiply alignedsequences from known members of a protein family. Each primer in the pool also contains a single 5' nondegeneratenucleotide sequence derived from a codon consensus across the aligned aminoacid sequencesflanking the conserved motif. During the initial PCR amplification cycles, the degenerate core is responsiblefor specific binding to sequences encoding the conserved aminoacid motif. The longer consensusclamp region serves to stabilize the primer and allows the participation of all primers in the pool in theefficient amplification of products during later PCR cycles. We have developed an interactive web site andalgorithm (iCODEHOP) for designing CODEHOP PCR primers from multiply aligned protein sequences,which is freely available online. Here, we describe the workflow of a typical CODEHOP PCR assay designand optimization and give a specific implementation example along with “best-practice” advice.

Key words: PCR, CODEHOP, Consensus, Degenerate, Acyl-coA binding protein

1. Introduction

The development of PCR-based assays to identify unknowndistantly related genes or pathogens is problematic and reliesupon mixtures of nucleic acid primers and the ability of primers to hybridize to noncomplementary sequences with a requireddegree of specificity. Pools of related primers carrying known orpredicted nucleotide sequence differences throughout the lengthof the primer have been used with moderate success to amplifyunknown or distantly related genes. These are referred to asdegenerate primers and can contain hundreds or thousands ofindividual primers in the pool to cover all possible nucleotidevariations in a particular sequence. Alternatively, consensus PCRprimers have also been utilized to amplify unknown or relatedsequence variants. A consensus primer carries the most commonactual or predicted nucleotide variant in each position of a primersequence and relies on its ability to specifically hybridize to a targetsequence with mismatched or unpaired bases. When basingprimer design on protein coding sequences, standard degenerateprimers will contain most or all of the possible nucleotidesequences encoding a large conserved aminoacid motif, whileconsensus primers will contain the most common nucleotide ateach codon position in the targeted motif. While useful with adequateconcentrations of closely related template targets in noncomplexmixtures, both standard degenerate- and consensus-primerapproaches suffer from a lack of specificity and sensitivity whenthese conditions are not met.

We have developed a PCR approach for detecting andidentifying unknown and distantly related gene sequencesusing consensus–degenerate hybrid oligonucleotide primers(CODEHOPs) (1–3) (Fig. 1). CODEHOPs are designed fromshort highly conserved motifs identified in multiply aligned proteinsequences from members of a gene family and are used inPCR amplification to identify unknown members of the family. Each CODEHOP consists of a short 3? degenerate core regioncorresponding to all possible codons specifying 3–4 highly conservedamino acids and a longer 5? consensus clamp region containinga single “best guess” nucleotide sequence derived fromthe consensus sequences flanking the target motif. Thus, aCODEHOP PCR primer consists of a pool of primers that areheterogeneous at the 3? end and homogeneous at the 5' end.

The CODEHOP primer design strategy overcomes problemsof both degenerate and consensus PCR primer methods. The limited degeneracy in the short 3? core region minimizes thetotal number of individual primers in the degenerate pool yetprovides a broad specificity during the initial PCR amplificationcycles. Hybridization of the 3? degenerate core is stabilized by the5? consensus clamp, which allows higher annealing temperatureswithout increasing the degeneracy of the primer pool. Althoughmismatches between the 5? consensus clamp and the targetsequence may occur during the initial PCR cycles, they are situatedaway from the 3? hydroxyl extension site of the polymerase,thus minimizing their disruptive effects on polymerase priming and extension. Further amplification of primed PCR productsduring subsequent rounds of primer hybridization and extensionis enhanced by the sequence similarity of all primers in the pool. This allows utilization of all primers in the PCR reaction cycles,and a shortage of the one or a few specific primers does notbecome a limiting factor. The CODEHOP PCR approach providesthe necessary specificity and sensitivity to allow for theamplification of distantly related genetic homologs or paralogs indiverse species or disparate pathogen species, at low titer, in complexmixtures of genetic material (4–15).

The following protocol describes the general method for thedesign and use of CODEHOP PCR primers for the amplificationof novel genes, with a specific example targeting the acyl-CoAbindingprotein (ACBP) family. The ACBP family is involved inmultiple essential cellular tasks including modulation of fatty acidbiosynthesis, enzyme regulation, regulation of the intracellularacyl-CoA pool size, donation of acyl-CoA esters for b-oxidation,vesicular trafficking, complex lipid synthesis, and gene regulation(16). ACBP homologs have been identified in all four eukaryotickingdoms, Animalia, Plantae, Fungi, and Protista, and in 11eubacterial species. Using ACBP-specific CODEHOP primerssimilar to the ones used in the example below, we discovered anovel ACBP gene in yeast (4) and identified an ACBP pseudogenein humans (5). In 2005, we published an extensive analysisof ACBP sequences from over 50 different species (16). The ACBP protein is highly conserved across phylums and anumber of species, ranging from protozoa to vertebrates, andhave evolved two to six lineage-specific paralogs through geneduplication and/or retrotransposition events. In a recent collaborationwith the Northwest Association for Biomedical Research(NWABR), we have designed CODEHOP PCR primers to detectnovel ACBPs in diverse species of plants. In this project, theNWABR partnered with high-school teachers to develop theBIO-ITEST curriculum, which was designed to help secondaryschool teachers and their students learn how information technologyis used in biological research (17). Below, the design andimplementation of the ACBP-specific CODEHOP primers targetedat plants is used as an example. However, the same conceptcan be applied to the discovery of distantly related unknownmembers of any other gene family of interest.

Fig. 1. CODEHOP PCR assay development. The general flowchart for development of CODEHOP PCR assays is shown. Thedifferent steps are described in the text.

2. Materials

2.1. Software and Databases Used for Primer Design

1. NCBI BLAST suite (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

2. NCBI Protein Database (http://www.ncbi.nlm.nih.gov/protein).

3. NCBI Nucleotide Database (http://www.ncbi.nlm.nih.gov/nucleotide).

4. ClustalW multiple sequence alignment software (e.g., http://www.ch.embnet.org/software/ClustalW.html).

5. iCODEHOP primer design software (https://icodehop.cphi.washington.edu/) (3).

2.2. Template DNA Preparation

1. DNeasy Plant Mini Kit (Qiagen) (see Note 1).

2. Cells or tissue from which DNA will be extracted.

3. Proteinase K digestion: 50 ug/ml Proteinase K in 100 ug/mlin 0.01 M Tris–HCl, pH 7.8, 5 mM EDTA, 0.5% SDS.

4. Saturated phenol–chloroform–isoamyl ethanol, 25:24:1,stored at 4°C. Phenol and its vapors are highly corrosive;wear gloves and work in a fume hood when handling. Phenoldissolves polystyrene plastics, so glass or polypropylenepipettes and tubes should be used.

5. Chloroform (CHCl3) is a CNS depressant and a suspectedcarcinogen, so exposure to vapors should be avoided.

6. 100% Ethanol.

7. TE buffer: 10 mM Tris–HCl, pH 8.0, 1 mM EDTA.

2.3. PCR Analysis

2.3.1. PCR Master Mix (see Note 2)

1. 10× PCR buffer: 200 mM Tris–HCl (pH 8.4), 500 mM KCl(Invitrogen).

2. MgCl2 stock solution, 25 nM (Invitrogen).

3. PCR-grade nucleotide mix: 10 mM dNTP mix (Fermentas).

4. Platinum Taq DNA polymerase, 5 U/ml (Invitrogen).

5. PCR-grade water.

2.3.2. Oligonucleotides/DNA

1. Forward and reverse CODEHOP primers are dissolved at50 mM in nuclease-free water. Oligonucleotides are stable forseveral months at ?20°C. For long-term storage of stocks,samples can be dried down and kept at -20°C.

2. Template DNA, stored at 4°C in TE solution or at ?20°C innuclease-free water.

2.3.3. Gel Electrophoresis

1. DNA-grade agarose: GenePure LE (ISC BioExpress); 1.5 gof agarose is dissolved in 75 ml TAE running buffer, heatedin a microwave to dissolve the agarose, and poured into a gelbox to create a 2% gel with space to run 32 samples plus DNAladders to size the amplicons.

2. Loading buffer: 6× loading buffer (Fermentas).

3. Running buffer: 40 mM Tris–acetate, 1 mM EDTA (TAEbuffer).

4. DNA ladders: 1 kb GeneRuler (Fermentas).

5. Ethidium bromide: 0.5 ug/ml ethidium bromide in water.

6. UV imaging: BioRad UV box to image the bands on gels.

3. Methods

The search for a new gene using the CODEHOP approach consistsof four steps. First, CODEHOP PCR primers are designedmanually or using our recently improved interactive iCODEHOPdesign web site and algorithm (3). In both cases, online databasesand software tools are used to identify protein family members, todownload protein and nucleotide sequences, to perform sequencealignments, and to identify conserved sequence blocks fromwhich CODEHOP PCR primers are derived. The second taskinvolves the identification and preparation of suitable sources ofcontrol and targeted nucleic acid templates. The third stepincludes the actual CODEHOP PCR assay and optimization. Ifnecessary, the primer design can be adjusted. The fourth stepincludes sequencing of amplicons followed by sequence and phylogeneticanalysis.

3.1. Primer Design: Manual Method

1. Collect the protein sequences from known members of thetargeted gene family. Perform a BLAST search in the NCBIprotein database using the protein sequence of a member ofthe targeted gene family. In our example, the ACBP sequencefrom Arabidopsis thaliana (NP_174462) was used as queryin a BLAST search that yielded ten additional complete plantACBP sequences (see Fig. 2).

2. Align the protein sequences of interest using the ClustalWmultiple alignment program (18). The example in Fig. 2 showsan alignment of the plant ACBP sequences (see Note 3).

3. Identify a conserved N-terminal motif of ~4 amino acids thathas limited codon degeneracy and is flanked by additionalconserved residues upstream of that motif. In our example, ahighly conserved N-terminal motif “Y/FKQA” was identifiedwith restricted degeneracy that also was flanked upstreamby additional conserved residues (highlighted in Fig. 2). Identify a second C-terminal conserved motif within 40–500amino acids downstream of the first motif (see Note 4).In our example, a conserved C-terminal motif “KWDA” wasidentified 19 amino acids downstream of the “Y/FKQA” motif that also has restricted codon degeneracy due to theconserved tryptophan residue (highlighted in Fig. 2). Although the “Y/FKQA” and “KWDA” motifs are only separatedby 19 amino acids, they are interrupted by an intronthat would increase the size of the PCR product (see Fig. 2). Analysis of the intron size from the known A. thaliana andO. sativa ACBP genes reveals a range of 104–728 bp. Thus,amplification between primers derived from the “Y/FKQA” and “KWDA” motifs would yield a PCR product of 200–900 bpfor plant genes with similar size introns (see Note 5).

Fig. 2. Alignment of multiple protein sequences from the plant ACBP protein family targeted for CODEHOP PCR primerdesign. ACBP orthologs in plants were identified using a BLAST search of the NCBI protein database using the 92 aaArabidopsis thaliana ACBP sequence as probe and aligned using ClustalW. Conserved aminoacid motifs “YKQA” and“KWDA” selected for CODEHOP PCR primer design are highlighted. Intron positions within the A. thaliana ACBP gene areindicated with an arrow. The sequences and their corresponding GenBank accession numbers are: A. thaliana(NP_174462), Brassica napus (CAA54390), Zea mays (NP_001147418), Panax ginseng (BAB85987), Tropaeolum majus(AAP82942), Ricinus communis (CAA70200), Jatropha curcas (ABE72959); Digitalis lanata (CAB56693), Oryza sativa(japonica cultivar) (NP_001061062), Vitis vinifera (XP_002263421), Picea sitchensis (ABK25059).

4. To design the degenerate core of the “sense” strand CODEHOPPCR primer targeting the desired upstream motif, determinethe possible nucleotide sequences encoding the conservedaminoacid motif. We recommend not exceeding a primerdegeneracy of 32- or 64-fold unless there is strong flankingsequence conservation. In the current example, the degeneratenucleotide sequence 5' TWY AAR CAR GC 3' (usingI.U.B designations for multiple nucleotides: W = A,T; Y = C,T;R = A,G) contains all possible codons for the upstream “Y/FKQA” motif (Fig. 3a). The wobble position of the alaninecodons (GCN) was not used to reduce degeneracy. To designthe consensus clamp of the “sense” strand CODEHOP PCRtargeting the desired upstream motif, determine the mostcommon aminoacid at each position of the 6-7 amino acidsupstream of the targeted motif. The most frequently usedcodon for each of these amino acids for the targeted organismis determined. Alternatively, obtain the corresponding encodingnucleotide sequences from the NCBI database for each ofthe proteins aligned and choose the most common nucleotideat each position to obtain a complete primer of ~30-32nucleotides. In our example, the nucleotide sequences encodingthe “YKQA” motif and upstream flanking sequences wereobtained from the NCBI database for each of the 11 plantACBP proteins in the alignment (Fig. 3a). The most commonnucleotide at each position of the six codons upstreamof the “YKQA” motif were chosen to form the consensus clamp,yielding a clamp sequence of 5' GCTTATTCTCTATGGACTCT3'. The consensus clamp is positioned at the 5' endof the primer and is shown in upper case. Thus, the proposedsense strand CODEHOP PCR forward primer, namedYKQAa (the small case a denotes a sense strand primer), is apool of 16 different primers, heterogeneous at the 3' end andhomogeneous at the 5' end with the sequence 5' GCTTATTCTCTATGGACTCTwyaarcargc 3' (Fig. 3a).

5. To design the antisense-strand reverse CODEHOP primer,the same considerations are used. The degenerate core is determined from the possible nucleotide sequences encodingthe four aminoacid downstream motif. In our example, thesequences encoding the “KWDA” motif were determined tobe 5' AAR TGG GAY GC 3'. The wobble position in thealanine codon GCN was not utilized to reduce degeneracy. The complementary sequence used for the reverse-strand primer is 5' GC RTC CCA YTT 3' (Fig. 3b). The nonconservedglycine in the third position of the Jatropha curcas sequenceis ignored in this case. The consensus clamp region of thisprimer can be determined from the most common codon forthe sequences downstream of the motif. The actual sequencespresent in known members of the gene family can also beused, as described above. In our example, the nucleotidesequences encoding the “KWDA” motif and downstreamflanking sequences were obtained for each of the 11 plantACBP proteins and aligned (Fig. 3b). The consensus nucleotidesat each position of the six codons downstream of the“KWDA” motif were chosen, yielding the sense-strandsequence of 5' aartgggaygcATGGAAGGCTGTTGAAG″ 3'. The complement of this, i.e., 5' CTTCAACAGCCTTCCATgcrtcccaytt3' gives the desired antisense-strandreverse CODEHOP primer called KWDAb (the small case bdenotes an anti-sense strand reverse primer), as seen Fig. 3b. Thus, the KWDAWb reverse primer is a pool of four differentprimers, heterogeneous at the 3' end and homogeneous atthe 5' end.

Fig. 3. Manual design of CODEHOP PCR primers targeting the conserved YKQA andKWDA conserved motifs of plant ACBP orthologs. The nucleotide sequences encodingthe conserved (a) YKQA and (b) KWDA motifs and flanking regions from the plant ACBPgenes identified by BLAST search were aligned, and the encoded aminoacid sequencesare shown above. These aminoacid motifs were chosen for primer design due to thestrong sequence conservation across a 4–5 aminoacid region, the presence of aminoacids with restricted codons and regions of conserved flanking sequences. The degeneratecores of the CODEHOP PCR primers are shown in lower case using the I.U.B. codefor degenerate nucleotides: Y = T,C; R = A,G; W = A,T. The nondegenerate consensusclamp regions are shown in upper case. (a) The CODEHOP PCR primer “YKQAa” designedfrom the YKQA motif is 16-fold degenerate and corresponds to the sense “coding”strand. (b) The CODEHOP PCR primer “KWDAb” designed from the KWDA motif is fourfolddegenerate and corresponds to the antisense strand. The coding strand is shown forreference. The plant genes and corresponding GenBank accession numbers areArabidopsis thaliana (NM_102916), Brassica napus (X77134), Zea mays (NM_001153946),Panax ginseng (AB071376), Tropaeolum majus (AY319307), Ricinus communis (Y08996),Jatropha curcas (DQ452088), Digitalis lanata (AJ249833), Oryza sativa (japonica cultivar)(NM_001057071), Vitis vinifera (XM_002263385), Picea sitchensis (EF085763).

3.2. Primer Design: Using iCODEHOP Software

To aid in primer design, we have developed and recently improveda web-based site to predict CODEHOP PCR primers from blocksof conserved aminoacid sequences (3). The conserved sequenceblocks are obtained from multiple related protein sequences fromthe targeted gene family. The sequence block output is linkeddirectly to the iCODEHOP design software, which predicts andscores possible CODEHOP PCR primers from the differentsequence blocks present in the protein of interest.

1. Initiate the iCODEHOP program at [https://icodehop.cphi.washington.edu/i-codehop-context/Welcome] and chooseto run the program in a named session, saving data to theserver, or in a nonnamed session. Select the “Design primers”

option and enter or upload either protein sequences or proteinsequence alignments at the prompt and proceed withanalysis. In the first case, the program creates the alignmentfor the user using ClustalW. For our example, the 11 plantACBP protein sequences were used as input, and a Clustalalignment was generated. Examine the alignment to confirmthe expected sequence similarities. The program also providesa phylogenetic tree to determine the relatedness of the differentinput sequences. Provide a name for the clustal alignmentand proceed. iCODEHOP will carve out conserved sequenceblocks from the multiple alignment.

2. To design primers from the sequence blocks, accept thedefault parameters or provide new parameters to direct theoutput. Parameters include limitations on the degree of degeneracy (default = 128), primer Tm (which determinesprimer length; default = 60), the use of different codon tablesfor design of the consensus clamp (default = Homo sapiens),and the number of primers being displayed. Other parametersare indicated, and their affect is described in the help files. Inour example, the primer Tm default value of (60°C) wasincreased to 70°C to obtain a primer of approximately 30nucleotides (see Note 6).

3. Select “Design Primers” and iCODEHOP and then use theBLOCKMAKER program to identify sequence blocks conservedwithin the input protein family members, and fromthese blocks, propose forward and reverse CODEHOP PCRprimers for each of the sequence blocks. Choose primers thathave low degeneracy, and they together with a proposedreverse primer will produce an amplicon of the desired length. In our example, a single large conserved sequence block(x7451vuxAA) of 85 amino acids was identified within the87–92 aa input plant ACBP sequences (Fig. 4a). A consensussequence was derived, with highly conserved residues labeledwith an asterisk. Possible forward and reverse CODEHOPPCR primers are enumerated and displayed graphically, andare linked to the underlying multiple protein alignment (seeFig. 4b). Each primer is displayed as aligned to the proteinblock from which it has been derived, and values for degeneracy,ranges of predicted melting temperatures, and primerlength are provided. The output includes a phylogenetic treethat allows the user to evaluate the phylogenetic relationshipof the input protein sequences to aid in primer design (notshown) (see Note 7).

4. After choosing a forward primer, the program will then suggesta list of possible complementary primers, indicating theresulting amplicon length and the degree of predicted meltingtemperature overlap. Make sure that both primers have areasonably low degeneracy (we prefer a degeneracy of 64 orlower) and overlapping Tm range which is given for eachprimer.

5. iCODEHOP designs the consensus clamp based on thecodon for the aminoacid that is most common in each positionof the alignment. The actual codon sequence used is themost preferred codon for that aminoacid for the targetedorganism, taken from a codon usage table specified by theuser. Alternatively, the user can determine the actual nucleotidesused to encode the amino acids in the consensus regionmanually, as shown in the manual method above and utilizethese sequences in the primer design. This step will be anintegrated part of the next version of iCODEHOP.

Fig. 4. Output of the iCODEHOP automated PCR primer design for the YKQA motif. The ACBP protein sequences from thediverse set of plants in Fig. 2 were used as input in the new interactive iCODEHOP web server to design CODEHOP PCRprimers from the multiply aligned sequences. Primer design was performed using the default conditions, except for theuse of a 68°C clamp melting temperature (discussed in the text). The portion of the graphical output of the analysis of aportion of the 85-aminoacid “(a)” block (x7451vuxAA) containing the YKQA motif is shown (asterisks indicate highlyconserved residues). The arrows underneath the block indicate the length and position of possible sense- and antisensestrandCODEHOP primers, labeled with their block, and a number designation. The A-12 primer is derived from thecomplete YKQA motif. The predicted primer and the multiply aligned protein sequences that it is derived from [Shown in(b)] are linked to the A-12 primer arrow in (a). The information provided for the A-12 primer includes the length and meltingtemperature of the consensus clamp region, as well as the length and degeneracy of the core. Dots in the multiplealignment indicate identity with the consensus sequence. Further details regarding the output of the program aredescribed on the iCODEHOP web site (19).

3.3. Template DNA Preparation

The source of template is dependent upon the targeted gene. In thepresent example, we were interested to identify novel ACBP genes indifferent plant species. For this purpose, DNA was isolated fromleaves of various plants using a plant DNA isolation kit from Qiagen.

1. Isolate DNA from the tissue source of choice using standardprocedures. Proteinase K extractions work well, although differentisolation kits may be used.

2. Resuspend DNA in TE buffer at approximately 250 ng/ul.

3.4. CODEHOP PCR Amplification

CODEHOP PCR amplification can be performed using classicaland touch-down approaches with a hot-start initiation (1). Werecommend using a thermal gradient PCR amplification to empiricallydetermine optimal annealing and amplification conditionsfor the pool of primers (12). Different buffers, salt concentrations,and enzymes have been employed with varying success dueto differences in DNA template preparation and the unknownnature of the targeted sequence. We have had the best success byusing the following steps in sequence:

1. Prepare eight PCR reaction mixtures on ice, each containing:

2.5 ul – 10× Gibco PCR buffer

1.0 ul – MgCl2 (50 mM) – 2 mM final concentration2.0 ml – dNTPs (10 mM each)

0.5 ul – Forward CODEHOP primer (current example isYKQAa primer)

0.5 ul – Reverse CODEHOP primer (current example isKWDAb primer)

0.25 ul – Platinum Taq (5 U/ul)

17.75 ul – Water

1.0 ul – DNA template (current example is soybean DNA250 ng/ul)

25 ul – Total

2. Place the eight tubes in a thermal gradient PCR thermocycler(current example was performed on a Bio-Rad iCycler) suchthat each tube will have a different annealing temperatureranging from 50 to 65°C.

3. Amplify the DNA using the following conditions:

1 min – 95°C

45 cycles of:

●● 30 s – 95°C – melting step

●● 30 s – annealing step (50–65°C gradient)●● 30 s – 72°C – elongation step

1 min – 72°C

Hold – 4°C

4. Analyze amplification products by electrophoresing 5 ml of thePCR reaction on a 2% agarose gel in TAE running buffer. Gelsare stained for 5–20 min in ethidium bromide, destained for5–20 min in distilled water, and visualized under a UV lamp.

We use the GelDoc system, which allows us to save pictures ofthe gels as TIF or JPEG files for presentations or publications.

Amplification of soybean DNA using the YKQAa andKWDAWb primers across a 50–65°C annealing gradient isshown in Fig. 5. PCR fragments of ~270 bp were detected atall annealing temperatures except 65°C (lanes 2–9). Optimalamplification was obtained at the lower annealing temperatures (lanes 3, 4). The amplification product obtained from pea DNA(~400 bp) using the same primers is shown in lane 10.

Fig. 5. Thermal gradient PCR amplification of soybean DNA using the YKQAa and KWDAbCODEHOP PCR primers. The YKQAa and KWDAb CODEHOP PCR primers were used toamplify soybean DNA template (250 ng) with a gradient of annealing temperatures from50–65°C, and the amplification products were analyzed on an ethidium bromide-stainedagarose gel. Lane (1) nontemplate control (nonspecific primer/dimer products below250 bp); Lanes (2–9) YKQAa-KWDAb PCR amplification of soybean DNA template at differentannealing temperatures (amplicon size = 269 bp); Lane (10) YKQAa-KWDAb PCR amplificationof pea DNA template at 50°C annealing temperature, amplicon size = 416 bp.

3.5. Sequence Analysis of PCR Product

The PCR product can be gel-purified and directly submitted forsequencing.

1. Take 20 ml PCR reaction and electrophorese on a 2% agarosegel with wide combs.

2. Cut out the band representing the amplicon under a UV lampwith a razor blade. The piece of agarose containing the ampliconDNA is then cut into small pieces and placed into a 1.7-mlmicrocentrifuge tube.

3. Use a DNA cleanup procedure to extract the DNA out of theagarose. We have good success using the GeneClean III Kitby Q-Biogen, eluting DNA in a final volume of 50 ml buffer.

If the band on the gel is very strong, we use up to 200 mlelution buffer.

4. Use 1 ml of purified amplicon DNA to sequence. The sequencingprimers consist of a specific primer derived from the consensusclamp region only, of each of the CODEHOP primers. For example, the sequencing primer from the YKQAaCODEHOP is 5'GCTTATTCTCTATGGACTCT, whilethe sequencing primer from the KWDAWb CODEHOPis 5'CTTCAACAGCCTTCCATGC (see Fig. 3a and b).

We use the ABI version 3.1 sequencing reagents as detailedhere:

2 ul Version 3.1 (ABI)

2 ul 2.5× Mix (ABI)

0.5 ul Sequencing primer (use a or b primer)

1 ul PCR product

4.5 ul H2O

10 ul

This mix is then amplified on a PCR machine (30 cycles of96°C for 10 s; 50°C for 5 s; 60°C for 4 min) and submittedfor sequence determination.

5. Align the resulting novel sequences with each other and withknown sequences. When introns are present as in the currentexample of the ACBP gene, the intron–exon boundaries andintron sizes can be determined from such alignments (Fig. 6).Analysis of the PCR fragments obtained from soybean(G. max) and two species of pea revealed introns of differentsizes flanked by two conserved regions of ACBP codingsequences. Thus, three novel ACBP genes (partial sequences)were identified using the YKQAa and KWDAb CODEHOPprimer pairs. Using similar approaches, novel genes from differentgene families can be readily and quickly identified.

Fig. 6. Alignment of nucleotide and encoded aminoacid sequences of the YKQAa/KWDAb PCR products obtained frompea, snap pea, and soybean with the corresponding sequences of thale cress and rice. The YKQAa/KWDAb CODEHOPPCR products obtained from pea (Pisum sativum; 416 bp), snap pea (Pisum sativum, macrocarpon cultivar; 254 bp),and soybean (Glycine max; 269 bp) were sequenced, and the nucleotide and encoded aminoacid sequences (b) werealigned with the corresponding ACBP sequences from thale cress (A. thaliana) and rice (O. sativa) (a) obtained from theNCBI database. The position and size of the intron interrupting the exons encoding the YKQA and KWDA motifs (see Fig. 2)are shown. The genera Pisum and Glycine both belong to the subfamily Faboideae within the family of Fabaceae.

4. Notes

1. For the example presented here, we used the Plant DNAIsolation kit from Qiagen instead of classical proteinase Kextraction. However, both methods work well.

2. This mixture was developed for use in the BioRad iCycler,but generic reagents or commercially available master mixescan be used as necessary. In order to obtain the most consistentresults and to facilitate reaction setup, it is a good practiceto mix all the reagents that are in common. A master mixincluding the Taq polymerase is stable at '20°C, and aliquotscan be stored at 4°C for convenience.

3. For a gene, such as the ACBP gene that contains severalintrons of diverse lengths, establish the position of conservedexon junctions within the aligned sequences by a TBLASTNsearch of genomic contigs of the different species that aretargeted. This allows the identification of intron positionsthat could impact the PCR amplification of unknown geneswhen using genomic DNA as template (see Fig. 2).

4. A DNA fragment amplified between the motifs would be120–1,500 bp, a size that is easily amplified under normalPCR conditions.

5. Optimal conserved sequence blocks contain 3–4 highly conservedamino acids with restricted codon degeneracy fromwhich the 3' degenerate core is derived; the presence of serines,arginines, and leucines are not favored due to the presenceof six possible codons for each aminoacid. On the otherhand, the presence of tryptophan or methionine, eachencoded by only one codon, is highly desirable to achieve lowdegeneracy. In the current example, the C-terminal motifcontained seven highly conserved amino acids “RAKWDAW”(Fig. 2); however, the initial arginine and alanine positionswere not considered for the degenerate core due to highcodon degeneracy.

6. The optimal primer length for a particular CODEHOP PCRprimer may vary depending on the DNA template used andshould be determined empirically. We typically use CODEHOPPCR primers of 30–32 nucleotides in length.

7. If too many possible CODEHOP primers are displayed, tryreducing the degree of degeneracy from the default value of128 to 64 or 32. If no CODEHOP primers are predicted,examine the phylogenetic analysis of the input proteinsequences and remove unrelated or very distantly relatedsequences from consideration and redo the analysis. In somecases, the input sequences from a targeted gene family maycluster into separate genetic groupings, such as the casewith gene paralogs. In this case, primer prediction may belimited to one of the groups to obtain optimal CODEHOPprimers.

Acknowledgments

The authors would like to thank Greg Bruce and ElieKarabunarlieva for advice and assistance in preparing this chapter. This work was supported in part by R24 RR021346 from theNational Center for Research Resource, NIH and 0833779 fromthe National Science Foundation.

References

1. Rose, T.M., Schultz, E.R., Henikoff, J.G.,Pietrokovski, S., McCallum, C.M., and

Henikoff, S. (1998) Consensus-degenerate

hybrid oligonucleotide primers for amplificationof distantly related sequences. Nucleic

Acids Res 26, 1628–35.

2. Rose, T.M., Henikoff, J.G., and Henikoff, S.

(2003) CODEHOP (COnsensus-DEgenerate

Hybrid Oligonucleotide Primer) PCR primer

design. Nucleic Acids Res 31, 3763–6.

3. Boyce, R., Chilana, P., and Rose, T.M. (2009)iCODEHOP: a new interactive program for

designing COnsensus-DEgenerate Hybrid

Oligonucleotide primers from multiply

aligned protein sequences. Nucleic Acids Res

37, W222–8.

4. Rose, T.M., Schultz, E.R., and Todaro, G.J.

(1992) Molecular cloning of the gene for the

yeast homolog (ACB) of diazepam binding

inhibitor/endozepine/acyl-CoA-binding

protein. Proc Natl Acad Sci U S A 89,

11287–91.

5. Gersuk, V.H., Rose, T.M., and Todaro, G.J.

(1995) Molecular cloning and chromosomal

localization of a pseudogene related to the

human acyl-CoA binding protein/diazepam

binding inhibitor. Genomics 25, 469–76.

6. VanDevanter, D.R., Warrener, P., Bennett, L.,Schultz, E.R., Coulter, S., Garber, R.L., andRose, T.M. (1996) Detection and analysis of

diverse herpes viral species by consensus

primer PCR. J Clin Microbiol 34, 1666–71.

7. Rose, T.M., Strand, K.B., Schultz, E.R.,

Schaefer, G., Rankin, G.W., Jr., Thailess,

M.E., Tsai, C.C., and Bosch, M.L. (1997)

Identification of two homologs of the Kaposi’ssarcoma-associated herpesvirus (human herpesvirus8) in retroperitoneal fibromatosis

of different macaque species. J Virol 71,

4138–44.

8. CODEHOP: consensus-degenerate hybrid

oligonucleotide primers. (1998) (Accessed at

http://blocks.fhcrc.org/blocks/codehop.

html).

9. Wilson, C.A., Wong, S., Muller, J., Davidson,C.E., Rose, T.M., and Burd, P. (1998) Type

C retrovirus released from porcine primary

peripheral blood mononuclear cells infects

human cells. J Virol 72, 3082–7.

10. Osterhaus, A.D., Pedersen, N., van

Amerongen, G., Frankenhuis, M.T., Marthas,

M., Reay, E., Rose, T.M., Pamungkas, J., and

Bosch, M.L. (1999) Isolation and partial

characterization of a lentivirus from talapoinmonkeys (Myopithecus talapoin). Virology

260, 116–24.

11. Schultz, E.R., Rankin, G.W., Jr., Blanc, M.P.,Raden, B.W., Tsai, C.C., and Rose, T.M.

(2000) Characterization of two divergent lineagesof macaque rhadinoviruses related to

Kaposi’s sarcoma-associated herpesvirus.

J Virol 74, 4919–28.

12. Rose, T.M., Ryan, J.T., Schultz, E.R., Raden,B.W., and Tsai, C.C. (2003) Analysis of

4.3 kilobases of divergent locus B of macaqueretroperitoneal fibromatosis-associated herpesvirusreveals a close similarity in gene

sequence and genome organization to

Kaposi’s sarcoma-associated herpesvirus.

J Virol 77, 5084–97.

13. Rose, T.M. (2005) CODEHOP-mediated

PCR – a powerful technique for the identificationand characterization of viral genomes.

Virol J 2, 20.

14. Staheli, J.P., Ryan, J.T., Bruce, A.G., Boyce,R., and Rose, T.M. (2009) Consensusdegeneratehybrid oligonucleotide primers

(CODEHOPs) for the detection of novel

viruses in non-human primates. Methods 49,

32–41.

15. Bruce, A.G., Bakke, A.M., Gravett, C.A.,

DeMaster, L.K., Bielefeldt-Ohmann, H.,

Burnside, K.L., and Rose, T.M. (2009) The

ORF59 DNA polymerase processivity factor

homologs of Old World primate RV2 rhadinovirusesare highly conserved nuclear antigens

expressed in differentiated epithelium in

infected macaques. Virol J 6, 205.

16. Burton, M., Rose, T.M., Faergeman, N.J.,

and Knudsen, J. (2005) Evolution of the acyl-CoA binding protein (ACBP). Biochem J 392,

299–307.

17. Bio-itest. Northwest Association for

Biomedical Research. (Accessed at http://

www.nwabr.org/education/itest.html).

18. Larkin, M.A., Blackshields, G., Brown, N.P.,Chenna, R., McGettigan, P.A., McWilliam, H.,

Valentin, F., Wallace, I.M., Wilm, A., Lopez,R., Thompson, J.D., Gibson, T.J., and

Higgins, D.G. (2007) Clustal W and Clustal

X version 2.0. Bioinformatics 23, 2947–8.

19. iCODEHOP: Interactive Program for

Creating Consensus-Degenerate Hybrid

Oligonucleotide Primers. (2008) (Accessed at

https://icodehop.cphi.washington.edu/

i-codehop-context/Welcome).