In higher order eukaryotes, DNA is methylated primarily at cytosines that are located 5' to guanosines in a CpG dinucleotide. In mammalian species, 3’5% of the cytosine residues are modified to 5-methylcytosine (Fig. 1A ) and there is now considerable evidence to show that this post-transcriptional modification plays an important role in gene function (
1 ,
2 ). Some CpG dinucleotides are clustered together in 1-2-kb long stretches of DNA called “CpG islands” which account for approx 2% of the genome and have distinct properties when compared to the rest of the genome. CpG islands are often located in the promoter region or the first exon of expressed genes and show a high G + C content (60/270%), the remainder of the genomic DNA has a G + C content of 40% (3 and references therein). Furthermore, bulk genomic DNA has only 25% of the CpG dinucleotides one would expect from random base composition, whereas CpG islands show the expected number. The “depletion” of CpG dinucleotides may be a result of spontaneous deamination of 5-methylcytosine to thymidine, leading to the mutation of CpG to TpG and CpA on the sense and the antisense strands, respectively.
Fig. 1. ( A ) Cytosine is methylated at its 5-position to form 5-methylcytosine. ( B ) The chemical conversion of cytosine to uracil is achieved under the influence of high concentrations of bisulfite at low pH. Sulfonation of cytosine at its 6-position destabilizes the amino group in position 4, which is hydrolytically deaminated to form uracilsulfonate. Under alkaline conditions, the SO 3 - group is split off again, resulting in a PCR-amplifiable uracil. Methylation of position 5, however, prevents sulfonation of cytosine and its conversion to uracil.