Promoter analysis by saturation mutagenesis

互联网2013-09-06

988

Abstract

Gene expression and regulation are mediated by DNA sequences, in most instances, directly upstream to the coding sequences by recruiting transcription factors, regulators, and a RNA polymerase in a spatially defined fashion. Few nucleotides within a promoter make contact with the bound proteins. The minimal set of nucleotides that can recruit a protein factor is called a cis-acting element. This article addresses a powerful mutagenesis strategy that can be employed to define cis-acting elements at a molecular level. Technical details including primer design, saturation mutagenesis, construction of promoter libraries, phenotypic analysis, data analysis, and interpretation are discussed.

Introduction

Saturation mutagenesis is a method implemented to generate a library of mutations within a targeted DNA sequence. This approach allows for rapid and unbiased identification and functional evaluation of cis-acting element(s) within a promoter.

1 ) ( 7 , 13 ). Therefore, some groundwork is essential prior to applying saturation mutagenesis to address a scientific problem. This article focuses on the strategy and use of saturation mutagenesis to identify and derive consensus sequence(s) for cis-acting-element(s).

2 ).

Fusion of a reporter gene to the minimal promoter can also facilitate this process by allowing for high-throughput selection or screening of the library mutants prior to biochemical analysis (Fig 1 ). Table 1 lists a few genes that have been successfully used as reporters in saturation mutagenesis experiments. Direct transcription analysis can also be used when the number of mutants is not too large ( 9 , 11 , 12 ). [The section on mutant analysis details the method that was used to characterize the bacterio-opsin (bop ) promoter using a colorimetric screen.]

The reporter gene downstream to the un-mutated wild type promoter should serve as a control to account for anomalous results attributable to mutations outside the targeted region or the altered behavior, if any, of the promoter on a plasmid backbone.

The primer used for mutagenesis should have clamps of at least 15-bp on either side of the randomized sequence to ensure high annealing specificity (for e.g the primer used for saturation mutagenesis of 7 bp of the bop gene TATA box had 22 and 24 bp clamps). Use of degenerate primers with smaller clamps may result in non-specific amplification or frameshift mutations.

The number of nucleotides mutagenized has a direct effect on the level of statistical significance at which the promoter elements are defined. DNA transformation efficiencies of the host strains used for library construction and analysis should also factor into this equation. Maximal representation for all mutations can be achieved with a library population that is at least 5 times the total number of mutants expected; for e.g. saturation mutagenesis of a 7 bp stretch yields 16,384 mutants, hence >80,000 clones are required to constitute a good library for saturation mutagenesis of a 7-bp region (Fig 3 ). Furthermore, secondary mutations at otherwise non-critical nucleotides may compensate for deleterious effects of mutations at a critical nucleotide position(s). The consensus sequence derived from simultaneous mutagenesis of too long a stretch of DNA will be loosely defined with a greater abundance R, W, S, Y, etc. instead of G, A, T, and C. In other words, to have maximal representation for all mutations and to avoid interference by secondary mutations, the length of DNA mutagenized should be kept at an optimal minimum.

The mutagenesis is conducted by amplifying the promoter with the degenerate oligonucleotide and a second downstream non-mutagenic oligonucleotide. The promoter can be amplified with or without the reporter gene, in case of the latter the promoter is spliced back into the promoter reporter plasmid construct. The mutated-promoter PCR product should be gel-purified from unincorporated nucleotides, primers, and primer-dimers, which may interfere with downstream processing and cloning.

Nucleotides within long promoters can be mutagenized by mega-primer or recombinant PCR techniques ( 3 , 4 , 5 ). Alternately, restriction sites can be engineered into the PCR product to splice it back into the promoter-reporter plasmid construct.

A stretch of 28 nucleotides can be scanned for cis-acting elements by mutagenesis in four separate PCR reactions using four oligonucleotides, differing in the location of the seven randomized bases, and the same downstream oligonucleotide (Fig 3 ).

The mutated-promoter library should be constructed in an appropriate vector if phenotypic analysis is desired. The PCR product can be used directly for cloning as a blunt-ended insert using the Sureclone ligation kit (Amersham Pharmacia Biotech) or digested with restriction enzymes for directional cloning (if unique restriction sites are incorporated at the 5' ends of the primers). For most restriction enzymes it may be necessary to include additional bases 5' to the restriction site to improve the efficiency of cutting (see NEB technical resource for details).

To construct the library electroporate 0.5 μl of the ligation into 20 μl of Electromax DH10B cells (Invitrogen Lifetechnologies Inc.). Repeat the procedure for the entire ligation mix (20 electroporations for a 10 μl ligation reaction). Plate 50 μl of one transformation to estimate the size of the library and inoculate 1L LB + 100μg/ml Ampicillin with the rest of the transformations (after the 1 hr incubation step). Culture overnight at 37°C, prepare plasmid DNA: this is your library of mutated-promoter plasmids. Sequence plasmid DNA prepared from randomly selected transformants from the efficiency plate for assessing the randomness of the library (see below).

4 ). A distinct bias in nucleotide distribution post-selection/screening is indicative of a requirement for specific nucleotides for promoter function.

Phenotypic analysis allows for a quick estimation of the number of functionally critical nucleotides within the mutagenized region. The mutated nucleotides are not important for promoter function if all transformants have functional phenotypes. On the other hand, presence of only 25% phenotypically positive transformants is indicative of one highly conserved nucleotide position. The number of active promoters decreases exponentially with mutations in each additional critical nucleotide. The phenotypes should be classified into "negative", "weak", "moderate", and "strong" to attribute relative importance to conserved nucleotides within functional promoters. Colorimetric analysis of mutants from the bop promoter mutagenesis allowed for a similar classification of promoter strengths. The phenotypes in this screen were subject to an arbitrary classification of orange (Pum^- ), weak purple (Pum^+/- ), purple (Pum⁺ ), and intense-purple (Pum⁺⁺ ) ( 1 , 2 ). Danner and Soppa ( 8 ) utilized trimethoprim resistance as a method for classification of promoter strength by culturing a randomly selected set of mutants in growth medium supplemented with various dilutions of the drug. The classification in this case was in terms of minimum inhibitory concentrations (MIC); MIC<1μgml^-1 (sensitive), MIC 5-200 μgml^-1 (partially resistant), MIC >400μgml^-1 (very resistant).

Primer extension analysis is routinely used for expression profiling (Fig 5 ) ( 1 , 2 ). This method has advantages over other profiling methods like RT-PCR, and Northern blotting in that it is easy, quantitative, and allows for mapping of the transcription start site. It is essential to map the start site since the mutagenesis might lead to activation of secondary promoters which can interfere in the final analysis. Primer extension kits are commercially available from Promega Corporation. Primer extension with an oligonucleotide specific for a constitutively expressed gene, for e.g. 16S rRNA, should be included as an internal control in each primer extension reaction to normalize both priming efficiency and message levels. The message levels can be quantified by densitometry of autoradiograms or phosphoimager analysis.

A 1:1 (or direct) correspondence of phenotype to transcript levels confirms that the effect of the mutagenesis is at the level of transcription (Fig 5 ) ( 1 , 2 ). An absence of correlation may be observed if additional mutations exist within the coding sequence or if the targeted region is within the transcript (translational effect). Therefore it is essential to map the transcription start site of the gene prior to oligonucleotide design for mutagenesis.

Deletion or insertion mutations were rarely identified during saturation mutagenesis of the bop promoter. The rare instance when a single base deletion was identified resulted in a corresponding shift in the transcription start site (Baliga and DasSarma, unpublished). Therefore I strongly recommend using primer extension analysis as means to quantify expression, since it also maps the transcription start site. Clearly, such mutants should be excluded from the analysis. Moreover, though the location of the TATA box is relatively flexible between promoters, for a given promoter it is centered at a relatively fixed position from the transcription initiation site. Therefore, saturation mutagenesis has been successfully used to characterize TATA box consensus sequences ( 1 , 8 ).

The DNA sequence in the promoter regions is sequenced and aligned with the wild-type promoter sequence to identify the nature and position(s) of mutations. Sequencing of both strands is essential to confirm the result. Sequencing reads of 400-500bp on both strands should be analyzed to confirm lack of mutations at positions external to the mutagenized region.

The consensus sequence represents conserved nucleotides in active promoter sequences. The importance of a nucleotide in the function of the promoters is reflected in the extent to which mutations are tolerated at the nucleotide position. Algorithms for generating consensus sequences are commercially available, for e.g. the program CONSENSUS of the GCG sequence analysis suite, accepts aligned sequences as input and derives a consensus sequence at a confidence level defined by the user. The consensus sequence can also be derived manually by tabulating a tally of nucleotides at each position (Fig 6 ). If the desired confidence level for the consensus is 75%, i.e. a match with the derived consensus has a 3 in 4 chance of being a functional promoter, then at any given position the nucleotide represented in 75% or greater active promoters is given the consensus status.

Nucleotides highly conserved in all promoters are required for transcription, whereas those loosely-conserved in strong promoters but not in the weak promoters are preferred but not absolutely essential for a functional promoter. Nucleotides conserved in most strong promoters are displayed in upper case and those loosely conserved are displayed in lower case.

pdf全文下载

点击浏览该文件