DNA walk
互联网
DNA walk
A DNA walk of a genome represents how the frequency of each nucleotide of a pairing nucleotide couple changes locally. This analysis implies measurement of the local distribution of Gs in the content of GC and of Ts in the content of TA. Lobry was the first to propose this analysis ( 1996 , 1999 ). Two complementary representations can be derived from the DNA walk: the cumulative TA- and the GC-skew analysis.
Aim : By reading these description of the algorithm, a reader not trained in genomics is able to redraw our graphs, using the basic genometric data file that is posted on our web resource for each organism as a zip file (.zip).
DNA walk
1) Drawing a DNA walk by reading a sequence file nucleotide by nucleotide.
A simple algorithm is used to draw a DNA walk by simply assigning a direction to each nucleotide. We propose the following assignment, slightly different from Lobry's: to T, C, A, and G correspond the E(ast), S(outh), W(est), and N(orth) directions, respectively (Lobry, 1999). Reading the nucleotide sequence nucleotide by nucleotide, and following the rule, a path clearly emerges on the graph: Figure 1.
|
Figure 1 : DNA walk of the sequence GTCTGGTGTCTGGAGTTCCTGGGTCTTGAG ACCACAGGACCCACCAGGGACCCAGGACCC Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line) |
2) Drawing a DNA walk by slicing a sequence file nucleotide into small windows
A simple way to draw quickly this kind of graph is suggested by Lobry (1996) by cutting a genome into windows of equal length.
|
Figure 2 : DNA walk of the same sequence as the one presented in Figure 1: GTCTGGTGTCTGGAGTTCCTGGGTCTTGAG ACCACAGGACCCACCAGGGACCCAGGACCC The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per window is plotted. We can also work with the mean values of the window… |
Comment : this method is not as precise as the first one. We could use it with a spreadsheet software without affecting the final resolution of the curve at the genome level.
2.1) The genome is cut into a number n of windows W, of equal size (the last window being smaller or equal to the other ones).
|
|
|
|
|
|
|
2.2) In each of these windows a count for each nucleotide is performed: cA , cC , cG , and cT respectively.
W1 |
cA 1 |
cC 1 |
cG 1 |
cT 1 |
W2 |
cA 2 |
cC 2 |
cG 2 |
cT 2 |
W3 |
cA 3 |
cC 3 |
cG 3 |
cT 3 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
Wn-1 |
cA n-1 |
cC n-1 |
cG n-1 |
cT n-1 |
Wn |
cA n |
cC n |
cG n |
cT n |
- Example: Mycoplasma genitalium genome ( download the compressed text file ), cut into windows of 1000 nucleotides.
- ( Mycoplasma genitalium G37 complete genome, L43967.1, 580074 bp, window: 1000 bp).
Center position |
|
|
|
|
Position of the window center (nt) |
cA |
cC |
cG |
cT |
500 |
453 |
93 |
86 |
368 |
1500 |
400 |
120 |
133 |
347 |
2500 |
374 |
122 |
164 |
340 |
3500 |
345 |
145 |
200 |
310 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
578500 |
313 |
138 |
141 |
408 |
579500 |
318 |
149 |
145 |
388 |
580037 |
33 |
8 |
4 |
29 |
2.3) Two calculations are performed for each window: x i and y i are determined.
W1 |
cA1 |
cC1 |
cG1 |
cT1 |
x 1=cT1-cA1 |
y 1=cG1-cC1 |
W2 |
cA2 |
cC2 |
cG2 |
cT2 |
x 2=cT2-cA2 |
y 2=cG2-cC2 |
... |
... |
... |
... |
... |
... |
... |
Wn |
cAn |
cCn |
cGn |
cTn |
x n=cTn-cAn |
y n=cGn-cCn |
2.4) A cumulative curve is calculated : X i and Y i are determined.
W1 ... |
x1=cT1-cA1 |
y1=cG1-cC1 |
X 1=sum(x1 to x1) |
Y 1=sum(y1 to y1) |
W2 ... |
x2=cT2-cA2 |
y2=cG2-cC2 |
X 2=sum(x1 to x2) |
Y 2=sum(y1 to y2) |
... |
... |
... |
... |
... |
Wn ... |
xn=cTn-cAn |
yn=cGn-cCn |
X n=sum(x1 to xn) |
Y n=sum(y1 to yn) |
2.5) A cumulative curve is drawn by respecting the order of data, from X1 to Xn and by assigning to Xi the value of Yi.
2.6) According to the previous description the DNA walk was written like this on our graphs, generated by the method "nucleotide by nucleotide":
- TmAc vs GmCc meaning that in x is plotted the cumulation of numbers of T s m inus numbers of A s vs in y the cumulation of numbers of G s m inus numbers of C s.
Lobry has chosen to use this assignment: T, G, A, and C correspond to E, S, W, and N directions, respectively. Lobry's outputs are similar to ours (mirror images along the X axis). Compare the DNA walk of Borrelia burgdorferi in Lobry's drawing system and ours.
Lobry's system
|
our system |
Figure 3 : DNA walk of Borrelia burdorferi |
上一篇:Cell-Support:Immune Therapy Strengthen Your Immune System 下一篇:琼脂糖凝胶回收