丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

DNA walks and their transformations

互联网

707
 

DNA walks and their transformations

Graphic representation of coding DNA sequences - a spider

        To make a graphic representation of a coding DNA sequence in two dimensional space, we analyzed the displacement of a DNA walker which checked each position of codons separately. For the DNA walk we have used a modified method of Berthelsen, Glazier and Skolnick (1992). For each sequence we have performed three DNA walks, independently for each nucleotide position in triplets (Fig. 1).

Fig. 1. Three DNA walks done independently for each nucleotide position in triplets. Each DNA walk represents "history" of nucleotide composition of the first, the second or the third position of codons along the DNA sequence. The three walks together have been called a spider and a single walk has been called a spider leg . It is possible to extract some numerical information from these plots:
- the slope, (if measured in degrees it is equal to arcus tangent [(G-C)/(A-T)]), and
- the length of the vector determined by the origin and the end of the spider leg (it is equal to sqrt[(G-C) 2 +(A-T) 2 ].

The first walker starts from the first nucleotide position of the first codon and jumps every third nucleotide until the end of the examined sequence has been reached. Similarly, the second and the third walkers start from the second and third nucleotide positions of the first codon, respectively. Every jump of a walker is associated with a unit shift in the two-dimensional space depending on the type of nucleotide visited. The shifts are: (0,1) for G, (1,0) for A, (0,-1) for C and (-1,0) for T. Hence, each DNA walk represents "history" of nucleotide composition of the first, the second or the third position of codons along the DNA sequence. The three walks together have been called a spider and a single walk has been called a spider leg .
Fig. 2a shows an example of a spider representing a typical gene of the yeast genome, SWP73 (YNR023w), a component of SWI/SNF complex activating transcription. In Fig. 2b a spider representing an intergenic sequence 921 triplets long is presented.

Fig. 2. Three DNA walks for a) a typical gene of the yeast genome, SWP73 (YNR023w); b) an intergenic sequence 921 triplets long.
 

Fig. 3. The genomic spider made for Saccharomyces 
cerevisiae genes. 

    It is possible to make a spider not only for a particular ORF or gene but for all ORFs (genes) found in the analyzed genome. To do that, all ORFs of the genome are spliced in tandem; stop to start. Such spiders are called genomic spiders . Genomic spiders show graphically the trends in nucleotide compositions of particular positions in codons. The genomic spider made for spliced yeast genes is presented in Fig. 3.
    It is also possible to make spiders for all ORFs coded by the leading strand or the lagging strand in bacterial genomes. Comparing these spiders, one can easily notice differences in nucleotide composition of genes coded by the two strands. 
    Other genomic spiders and their analyses are shown in the section: Sense-antisense DNA strand asymmetry and Bacterial chromosome asymmetry .
 
 

 

Distribution of ORFs in a torus projection

         Spiders depict nucleotide composition of the three positions in codons, but it is possible to extract only some numerical information from these plots and to characterize whole sets of ORFs by this method. For each ORF we have measured (in degrees) the slopes of the vectors determined by the origins and the ends of the spider legs (Fig. 1). In fact the slopes are equal to arcus tangent [(G-C)/(A-T)] for a given position in codons. We have assumed that the slopes have positive values for the first two quarters of the plot and negative for the third and fourth quarters. This has enabled us to construct a plot where each ORF is represented by a point whose co-ordinates are: (x) - the slope representing the first leg, and (y) - the slope representing the second leg. It is also possible to use the slope of the third leg as one of the two co-ordinates or as the third co-ordinate in three-dimensional space. The distribution of intergenic sequences, all ORFs longer than 100 codons and genes from the yeast genome is presented in Fig. 4.

Fig. 4. Distribution of sequences from the Saccharomyces cerevisiae genome on the torus projection for a) intergenic sequences; b) all ORFs longer than 100 codons; c) genes.

Note that the surfaces of these plots are finite projections of toruses (Fig. 5).

Fig. 5. Distribution of all ORFs longer than 100 codons from the Saccharomyces cerevisiae genome on the torus.

Distributions of different sets of ORFs for other genomes are presented in the section: Sense-antisense DNA strand asymmetry.
Distribution of ORFs on the torus projection was a base for our method of approximating the total number of protein coding ORFs in the yeast genome. See section: Total number of coding ORFs in the yeast genome
 

DNA walks show bacterial chromosome asymmetry

        To show DNA compositional bias, different DNA walks and their transformations were done. Detailed descriptions of DNA walks, their possible interpretation and nomenclature are according to Cebrat and Dudek (1998) . To show local trends independent of coding functions, we performed ?detrended DNA walks? (DDW) in which we eliminated strong trends resulting from base composition of coding ORFs ( Cebrat et al. , 1997 , Cebrat and Dudek, 1998 ) which mask the asymmetry of strands introduced by mutational pressure.
        To eliminate these ?coding trends? we counted for a given ORF the value:
J = [N] - (F x L), where:
J - is the value of the walker jump for the ORF,
N - is the number of nucleotide (A, T, G or C) in the analyzed positions of the ORF,
F - is the frequency of the given nucleotide at the examined positions in the whole set of analyzed ORFs,
L - is the length of the given ORF in codons.
        When intergenic sequences were analyzed, F was the frequency of the nucleotide in the whole set of intergenic sequences and L was the length of the visited sequence in nucleotides.
      We applied an analogous procedure to the analysis of distribution of codons and amino acids on chromosome. In this case we put in the above equation the number of the analyzed codons or the coded amino acid residues instead of N for a given ORF and the frequency of the given codon or amino acid  in the set of the analyzed ORFs instead of F.
    We have used the J values to make detrended DNA walks - walking along the chromosome the walker cumulated these values.
        The idea of elimination of these trends is shown in Fig. 6. Analysis of the Treponema pallidum genome is an example. In Fig. 6a the "direct" method of sliding windows was used. Numbers on y-axis show the number of G in consecutive 600 nucleotide long sequences. In Fig. 6b numbers on y-axis indicate differences between the mean value of G content (red line in Fig. 6a) and the found value for a given window. In Fig. 6c the values shown in Fig. 6b for the consecutive windows were cumulated.

Fig. 6. Elimination of the trend for guanine for the Treponema pallidum genome; a) sliding windows, b) deviations from mean value,
           c) cumulative plot.

        In Fig.7 the asymmetry of the Treponema pallidum genome is presented. For other examples of chromosome asymmetry, see section: Bacterial chromosome asymmetry and Asymmetry of chromosomes in their coding properties .

Subtraction and addition of DNA walks
 

Fig. 7. Subtraction and addition of DNA walks made for ORFs longer than 150 codons of the Treponema pallidum genome.

    The idea of transformations of DNA walks (subtraction and addition) is shown in Fig. 7. 
     When the walks for the Crick strand were subtracted from the walks for the Watson strand, the value of the walker jump for each ORF lying in the Crick strand was multiplied by (-1). When the walks for the two strands were added, the walker visited non-overlapping ORFs of both strands as they appeared on the chromosome, scanned them in the proper reading frame and moved according to the result of scanning. 
     When asymmetry is introduced by replication-associated mechanisms to ORFs located on different strands in the same region of chromosome, then values of asymmetry should have opposite signs, so if we add the values of the asymmetry of these strands, they will compensate each other and no effect of leading/lagging asymmetry will be observed. Moreover, subtraction of these values will cumulate the effect of asymmetry introduced by replication-associated mechanisms into leading and lagging strands. At the terminus of replication trends should inverse, because at this point strands change their role from leading to lagging and vice versa . On the other hand, transcription or ?coding trends? should introduce a bias independently of leading and lagging strand. Thus, addition will cumulate these trends while after subtraction  they should diminish or disappear. Note, that additions and subtractions are done on detrended DNA walks.
 
 

 


 

Fig. 8. DNA walks made for ORFs longer 
than 150 codons on the Watson strand of the 
Treponema pallidum genome in the scale
of a) chromosome; b) spliced ORFs.

   There are two significantly different classes of DNA walks analyzing coding sequences. The DNA walks of the first class are performed in the scale of chromosome (Fig. 8a). In these walks numbers on x-axis represents the real co-ordinates of ORFs on chromosome. Note that detrended walks done in the scale of chromosome lose their information on the asymmetry in the total length of ORFs on leading and lagging strands (coding density). That is why addition of these walks done for ORFs of Watson and Crick strands eliminates the effect of replication-associated mutational pressure and does not depend on differences in coding density of leading versus lagging strands.
    It is also possible to make a DNA walk on spliced ORFs (Fig. 8b). All ORFs lying only on one strand (Watson or Crick) in their proper order are spliced together stop to start. In this analysis the x-axis is scaled in the numbers representing co-ordinates of the spliced sequence, not the chromosome. In case of a bacterial chromosome, when ORFs of one strand only are spliced (Watson strand for example), the walk can show the asymmetry in coding density between leading and lagging strands. The shifts of the extrema in Fig. 8b are the measure of the coding density differences between leading and lagging strands (see also section: Bacterial chromosome asymmetry ).
 
 
 
 
 
 
 

 

        Such DNA walks transformations make it possible to show asymmetry in bacterial chromosomes and asymmetry of chromosomes in their coding properties, and enable distinguishing between the mutational pressure associated with replication and the mutational pressure associated with transcription and/or with other mechanisms introducing asymmetry into prokaryotic chromosomes.

<center> <p> </p> </center>
上一篇:血液透析患者抗凝血酶检测的临床意义   下一篇:鲍曼不动杆菌耐药性动态变化特征及分析
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序