Genomic Libraries

互联网2013-09-06

2422

Genomic DNA libraries

Size of some genomes and chromosomes:

<center> <font color="#ff00ff"><font><b><font>Comparative Sequence Sizes</font> </b> </font></font></center>	<center> <font color="#ff00ff"><font><b><font>(Bases)</font> </b> </font></font></center>
(yeast chromosome 3)	<center> <font color="#ff00ff"><font><font color="#ff0000">350 Thousand</font> </font></font></center>
Escherichia coli (bacterium) genome	<center> <font color="#ff00ff"><font><font color="#ff0000">4.6 Million</font> </font></font></center>
Largest yeast chromosome now mapped	<center> <font color="#ff00ff"><font><font color="#ff0000">5.8 Million</font> </font></font></center>
Entire yeast genome (completed 5/96)	<center> <font color="#ff00ff"><font><font color="#ff0000">15 Million</font> </font></font></center>
Smallest human chromosome (Y)	<center> <font color="#ff00ff"><font><font color="#ff0000">50 Million</font> </font></font></center>
Largest human chromosome (1)	<center> <font color="#ff00ff"><font><font color="#ff0000">250 Million</font> </font></font></center>
Entire human genome	<center> <font color="#ff00ff"><font><font color="#ff0000">3 Billion</font> </font></font></center>

The human genome contains approximately 50,000 unique genes within 3-4 billion base pairs of DNA , scattered about in 23 pairs of chromosomes .

Fragmentation of genomic DNA for library construction

Restriction endonuclease digestion

A six-cutter (e.g. Eco RI) will cut on average every 4.1 Kb . Complete digestion of human DNA with this type of enzyme will result in approximately 1 x 10⁶ unique fragments.
What is the probability of finding a clone within a given library?

The exact probability of having any given DNA sequence in the library can be calculated from the equation

<center> P is the desired probability </center>

<center> f is the fractional proportion of the genome in a single recombinant </center>

<center> N is the necessary number of recombinants </center>

For example, how large a library (i.e. how many clones) would you need in order to have a 99% probability of finding a desired sequence represented in a library created by digestion with a 6-cutter?

<center> N = 3.37 x 106 clones </center>

Thus, from this type of analysis we can see that we need a technology which will allow us to achieve the following:

Stable insertion of relatively large DNA fragments into our cloning vector
High efficiency of insertion and the ability to handle large numbers of clones

For example, when plating E. coli colonies on a 3" petri plate, the maximum practical density to allow isolation of individual colonies is about 100-200 colonies per plate.
If we were to try to plate our library of 3.37 x 10⁶ in such a way would need about 22,500 plates .
Not only that, but such large DNA fragments are not well tolerated in typical E. coli cloning vectors such as pBR322.

Bacteriophage lambda vectors are commonly used for construction of genomic libraries

Bacteriophage l is an E. coli phage with a type of icosahedral phage particle which contains the viral genome:

During replication, the phage DNA is produced in a concatameric form, which is cleaved by appropriate endonucleases to allow packaging of a single genome within the phage capsid.
It was found that internal regions of the phage genome, which were not essential to phage replication, could be removed and replaced with DNA of interest.
This hybrid DNA could be efficiently packaged, and form an infective phage.

The advantages of this type of system vs plasmids like pBR322 are:

The phage genome is able to package efficiently with DNA inserts as large as 20 Kb .
Furthermore, the packaged phage are highly infectious and infect E. coli at a much higher efficiency than plasmid transformation methods .

Incomplete Digestion of Genomic DNA will allow identification of sequence overlaps

Complete digestion with an endonuclease will result in a library containing no overlapping fragments :

However, incomplete digestion will result in a library containing overlapping fragments:

Thus, the sequence information obtained from one clone will allow the isolation of clones containing neighboring (overlapping) sequence information .
This can allow large contiguous stretches of sequence information to be obtained ("Chromosome Walking ").

Probing libraries

Once a library (cDNA or genomic) has been constructed we want to be able to identify clones which contain DNA of interest.

For example, from protein sequence information we can deduce possible stretches of the corresponding DNA sequence (there will however be ambiguity due to the degeneracy of codons).
If we can synthesize an oligonucleotide complementary to our DNA sequence of interest we can use it to specifically hybridize to the appropriate clone in our libraray (i.e. to probe our library).

In standard methodologies the oligonucleotide is phosphorylated at the 5' end with radiolabeled g³² P-ATP and T4 polynucleotide kinase .

The probe is then incubated with individual phage plaques which have been fixed onto nitrocellulose and their DNA denatured by treatment with base.
If the plaque contains complementary DNA to to probe sequence, the probe will hybridize.
If the nitrocellulose (containing many individual plaques) is exposed to x-ray film, only those plaques with hybridized probe will show up (as a dark spot) :

<center> Note that its important to keep track of the orientation of the nitrocellulose in relationship to the x-ray film (usually radioactive ink is used to identify the nitrocellulose orientation). </center>

False positives

If we are designing DNA probes from protein sequence information we will have possible ambiguity in our deduced DNA sequence used for the design of the probe.

Usually 14-24mer oligonucleotides are used as probes, a 14-24mer probe means we need a stretch of 5-8 amino acids in the polypeptide.
Given the choice, the best amino acid sequences to look for in a polypeptide are those with low codon degeneracy (see above).
Thus, we would look for a short stretch of polypeptide sequence hopefully containing Met or Trp , and with the remaining amino acids comprising either Phe, Tyr, His, Gln, Asn , Lys, Asp, Glu or Cys .
Regions including Leu, Arg or Ser are to be avoided (6 codons each).

During oligonucleotide synthesis multiple bases will be incorporated at ambiguous positions.

Thus our probe will actually be a mixture of oligonucleotides .
The higher the degeneracy, the greater the posibility of "false positives", i.e. clones which hybridize but are unrelated to the actual sequence we want.
Positive clones are sequenced and the deduced amino acid sequence is compared to our polypeptide sequence information to identify correct clones.

Antibodies (Immunoglobulins)

If the particular vector, or phage, used to construct a cDNA library contains a promoter region upstream of the insertion site we may be able to screen for desired clones by looking for expression of the protein of interest .

In this case, we need an assay which is both sensitive (we will not be producing a lot of protein) and specific (we want to minimize any false positives).
One of the best assays, which is both sensitive and specific, makes use of antibodies .

Antigen, antibody, epitope

One of the defense mechanisms of vertebrates is the ability to distinguish between self and non-self molecules.

Thus, if a foreign molecule (either from another species or sometimes from another individual within a species) invades a vertebrate organism, the immune system functions to learn to identify that molecule.
In future invasions by the same molecule, the organism mounts a defense against it by producing specific antibodies which recognize and bind to the foreign antigen .
When antibodies bind to antigen certain white blood cells (macrophages and monocytes) recognize the invading body as foreign and respond by destroying it.

Antibodies are 'Y' shaped molecules which contain two identical heavy chains, and two identical light chains.

The stem of the 'Y' comprises the Fc (constant) domain , and the 'arms' of the 'Y' comprise the Fab (variable) domains .
Antigens bind to the complementarity-determining regions (CDR's) located at the ends of the Fab domains.

Antibodies are synthesized by B lymphocytes. Each B lymphocyte is capable of producing a single type of antibody directed against a specific structural determinant, or epitope , on an antigen.

Thus, an immune response to a protein antigen may result in a population of B lymphocytes each producing antibodies which recognize a different structural determinant of the foreign protein.
An epitope may be a contiguous region of 5 or 6 amino acids in the foreign polypeptide, or the epitope may comprise a half dozen or so amino acids brought in juxtaposition in the native protein, yet widely spaced in the polypeptide sequence .
Thus, some antibodies will recognize native and denatured forms of a foreign protein equally well, while other antibodies may only recognize one or the other.

If the protein of interest has been purified it can be used to induce an immune response in a host animal .

Typical host animals include mouse, chicken, rabbit, goat, sheep, horse and occasionally, human.
After an initial immunization, followed by one or more booster shots, the B lymphocytes of the host animal may produce antibodies directed against the antigen.
The antibodies can be be purified from blood samples withdrawn from the animal. Such preparations of antibodies are said to be polyclonal .
This refers to the fact that the antibodies present are from a collection of different B lymphocytes and thus will recognize a variety of different epitopes on the antigen protein.
The ability to isolate antibodies from blood samples means that the host animal does not need to be destroyed.
Of course, the size of the animal determines how much antibodies one can obtain. For example, a rabbit can provide 5 mls of blood every two weeks, a mouse provides significantly less, while a horse can provide quite a bit more.

An antibodiy isolated from a single B lymphocyte cell population is termed monoclonal .

It recognizes a single epitope on the antigenic protein.
Antibody producing B lymphocytes can be isolated from the spleen or from lymph nodes. However, they have a finite life span in culture, i.e. they will undergo a certain number of cell divisions and then die.
These cells can, however, be fused with immortal (cancerous myeloma) lymphocytes to produce a hybridoma cell.
Such a cell is immortal like the myeloma, and produces a specific antibody from the B lymphocyte. The ability to grow indefinitely in culture allows the isolation of useful amounts of specific monoclonal antibodies .

Sometimes immunizing with the protein of interest is problematic: appropriate amounts of purified material cannot be produced, or the protein is itself toxic at the dosage level necessary to produce an immune response.

If partial sequence information is known, then large amounts of polypeptides representing short fragments of the protein, can be synthesized and used to immunize the animal.
Often these polypeptides are covalently attached to a carrier protein (typically serum albumin) to enhance the antigenic response.
Antibodies produced against such peptides will recognize only epitopes within the polypeptide. Thus, even polyclonal antibodies would be quite limited in their epitope recognition.

As with radiolabeled oligonucleotides, antibodies can be used to identify library clones which contain a cDNA of interest. This method would of course rely upon a host vector or phage which contains a promoter upstream from the site of insertion of the genomic DNA .

Antibodies can be used to screen viral plaques or plasmid clonies which have been bound to nitrocellulose.
Bound antibodies can be identified using radiolabeled protein A (which binds to immunoglobulins) or via a second antibody (which, like protein A, can recognize general immunoglobulins) which has a dye or dye releasing enzyme covalently attached.