DNA walks and their transformations
互联网
DNA walks and their transformations
Graphic representation of coding DNA sequences - a spider
To make a graphic representation of a coding DNA sequence in two dimensional space, we analyzed the displacement of a DNA walker which checked each position of codons separately. For the DNA walk we have used a modified method of Berthelsen, Glazier and Skolnick (1992). For each sequence we have performed three DNA walks, independently for each nucleotide position in triplets (Fig. 1).
Fig. 1. Three
DNA
walks done independently for each nucleotide position in triplets. Each
DNA
walk represents "history" of nucleotide composition of the first, the second or the third position of codons along the
DNA
sequence. The three walks together have been called a
spider
and a single walk has been called a
spider leg
. It is possible to extract some numerical information from these plots:
- the slope, (if measured in degrees it is equal to arcus tangent [(G-C)/(A-T)]), and
- the length of the vector determined by the origin and the end of the spider leg (it is equal to sqrt[(G-C)
2
+(A-T)
2
].
The first walker starts from the first nucleotide position of the first codon and jumps every third nucleotide until the end of the examined sequence has been reached. Similarly, the second and the third walkers start from the second and third nucleotide positions of the first codon, respectively. Every jump of a walker is associated with a unit shift in the two-dimensional space depending on the type of nucleotide visited. The shifts are: (0,1) for G, (1,0) for A, (0,-1) for C and (-1,0) for T. Hence, each
DNA
walk represents "history" of nucleotide composition of the first, the second or the third position of codons along the
DNA
sequence. The three walks together have been called a
spider
and a single walk has been called a
spider leg
.
Fig. 2a shows an example of a spider representing a typical gene of the yeast genome, SWP73 (YNR023w), a component of SWI/SNF complex activating transcription. In Fig. 2b a spider representing an intergenic sequence 921 triplets long is presented.
Fig. 2. Three
DNA
walks for a) a typical gene of the yeast genome, SWP73 (YNR023w); b) an intergenic sequence 921 triplets long.
Fig. 3. The genomic spider made for
Saccharomyces
|
It is possible to make a spider not only for a particular ORF or gene but for all ORFs (genes) found in the analyzed genome. To do that, all ORFs of the genome are spliced in tandem; stop to start. Such spiders are called
genomic spiders
. Genomic spiders show graphically the trends in nucleotide compositions of particular positions in codons. The
genomic spider
made for spliced yeast genes is presented in Fig. 3.
It is also possible to make spiders for all ORFs coded by the leading strand or the lagging strand in bacterial genomes. Comparing these spiders, one can easily notice differences in nucleotide composition of genes coded by the two strands. Other genomic spiders and their analyses are shown in the section: Sense-antisense DNA strand asymmetry and Bacterial chromosome asymmetry .
|
Distribution of ORFs in a torus projection
Spiders depict nucleotide composition of the three positions in codons, but it is possible to extract only some numerical information from these plots and to characterize whole sets of ORFs by this method. For each ORF we have measured (in degrees) the slopes of the vectors determined by the origins and the ends of the spider legs (Fig. 1). In fact the slopes are equal to arcus tangent [(G-C)/(A-T)] for a given position in codons. We have assumed that the slopes have positive values for the first two quarters of the plot and negative for the third and fourth quarters. This has enabled us to construct a plot where each ORF is represented by a point whose co-ordinates are: (x) - the slope representing the first leg, and (y) - the slope representing the second leg. It is also possible to use the slope of the third leg as one of the two co-ordinates or as the third co-ordinate in three-dimensional space. The distribution of intergenic sequences, all ORFs longer than 100 codons and genes from the yeast genome is presented in Fig. 4.
Fig. 4. Distribution of sequences from the Saccharomyces cerevisiae genome on the torus projection for a) intergenic sequences; b) all ORFs longer than 100 codons; c) genes.
Note that the surfaces of these plots are finite projections of toruses (Fig. 5).
Fig. 5. Distribution of all ORFs longer than 100 codons from the Saccharomyces cerevisiae genome on the torus.
Distributions of different sets of ORFs for other genomes are presented in the section:
Sense-antisense
DNA
strand asymmetry.
Distribution of ORFs on the torus projection was a base for our method of approximating the total number of protein coding ORFs in the yeast genome. See section:
Total number of coding ORFs in the yeast genome
DNA walks show bacterial chromosome asymmetry
To show
DNA
compositional bias, different
DNA
walks and their transformations were done. Detailed descriptions of
DNA
walks, their possible interpretation and nomenclature are according to
Cebrat and Dudek (1998)
.
To show local trends independent of coding functions, we performed ?detrended
DNA
walks? (DDW) in which we eliminated strong trends resulting from base composition of coding ORFs (
Cebrat
et al.
, 1997
,
Cebrat and Dudek, 1998
) which mask the asymmetry of strands introduced by mutational pressure.
To eliminate these ?coding trends? we counted for a given ORF the value:
J = [N] - (F x L), where:
J - is the value of the walker jump for the ORF,
N - is the number of nucleotide (A, T, G or C) in the analyzed positions of the ORF,
F - is the frequency of the given nucleotide at the examined positions in the whole set of analyzed ORFs,
L - is the length of the given ORF in codons.
When intergenic sequences were analyzed, F was the frequency of the nucleotide in the whole set of intergenic sequences and L was the length of the visited sequence in nucleotides.
We applied an analogous procedure to the analysis of distribution of codons and amino acids on chromosome. In this case we put in the above equation the number of the analyzed codons or the coded amino acid residues instead of N for a given ORF and the frequency of the given codon or amino acid in the set of the analyzed ORFs instead of F.
We have used the J values to make detrended
DNA
walks - walking along the chromosome the walker cumulated these values.
The idea of elimination of these trends is shown in Fig. 6. Analysis of the
Treponema pallidum
genome is an example. In Fig. 6a the "direct" method of sliding windows was used. Numbers on y-axis show the number of G in consecutive 600 nucleotide long sequences. In Fig. 6b numbers on y-axis indicate differences between the mean value of G content (red line in Fig. 6a) and the found value for a given window. In Fig. 6c the values shown in Fig. 6b for the consecutive windows were cumulated.
Fig. 6. Elimination of the trend for guanine for the
Treponema pallidum
genome; a) sliding windows, b) deviations from mean value,
c) cumulative plot.
In Fig.7 the asymmetry of the Treponema pallidum genome is presented. For other examples of chromosome asymmetry, see section: Bacterial chromosome asymmetry and Asymmetry of chromosomes in their coding properties .
Subtraction and addition of
DNA
walks
Fig. 7. Subtraction and addition of DNA walks made for ORFs longer than 150 codons of the Treponema pallidum genome. |
The idea of transformations of
DNA
walks (subtraction and addition) is shown in Fig. 7.
When the walks for the Crick strand were subtracted from the walks for the Watson strand, the value of the walker jump for each ORF lying in the Crick strand was multiplied by (-1). When the walks for the two strands were added, the walker visited non-overlapping ORFs of both strands as they appeared on the chromosome, scanned them in the proper reading frame and moved according to the result of scanning. When asymmetry is introduced by replication-associated mechanisms to ORFs located on different strands in the same region of chromosome, then values of asymmetry should have opposite signs, so if we add the values of the asymmetry of these strands, they will compensate each other and no effect of leading/lagging asymmetry will be observed. Moreover, subtraction of these values will cumulate the effect of asymmetry introduced by replication-associated mechanisms into leading and lagging strands. At the terminus of replication trends should inverse, because at this point strands change their role from leading to lagging and vice versa . On the other hand, transcription or ?coding trends? should introduce a bias independently of leading and lagging strand. Thus, addition will cumulate these trends while after subtraction they should diminish or disappear. Note, that additions and subtractions are done on detrended DNA walks.
|
Fig. 8.
DNA
walks made for ORFs longer
|
There are two significantly different classes of
DNA
walks analyzing coding sequences. The
DNA
walks of the first class are performed in the scale of chromosome (Fig. 8a). In these walks numbers on x-axis represents the real co-ordinates of ORFs on chromosome. Note that detrended walks done in the scale of chromosome lose their information on the asymmetry in the total length of ORFs on leading and lagging strands (coding density). That is why addition of these walks done for ORFs of Watson and Crick strands eliminates the effect of replication-associated mutational pressure and does not depend on differences in coding density of leading versus lagging strands.
It is also possible to make a DNA walk on spliced ORFs (Fig. 8b). All ORFs lying only on one strand (Watson or Crick) in their proper order are spliced together stop to start. In this analysis the x-axis is scaled in the numbers representing co-ordinates of the spliced sequence, not the chromosome. In case of a bacterial chromosome, when ORFs of one strand only are spliced (Watson strand for example), the walk can show the asymmetry in coding density between leading and lagging strands. The shifts of the extrema in Fig. 8b are the measure of the coding density differences between leading and lagging strands (see also section: Bacterial chromosome asymmetry ).
|
Such DNA walks transformations make it possible to show asymmetry in bacterial chromosomes and asymmetry of chromosomes in their coding properties, and enable distinguishing between the mutational pressure associated with replication and the mutational pressure associated with transcription and/or with other mechanisms introducing asymmetry into prokaryotic chromosomes.
<center> <p> </p> </center>上一篇:血液透析患者抗凝血酶检测的临床意义 下一篇:鲍曼不动杆菌耐药性动态变化特征及分析