
Detecting the Signatures of Adaptive Evolution in Protein‐Coding Genes


  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited
  • Supplementary Materials



The field of molecular evolution, which includes genome evolution, is devoted to finding variation within and between groups of organisms and explaining the processes responsible for generating this variation. Many DNA changes are believed to have little to no functional effect, and a neutral process will best explain their evolution. Thus, a central task is to discover which changes had positive fitness consequences and were subject to Darwinian natural selection during the course of evolution. Due the size and complexity of modern molecular datasets, the field has come to rely extensively on statistical modeling techniques to meet this analytical challenge. For DNA sequences that encode proteins, one of the most powerful approaches is to employ a statistical model of codon evolution. This unit provides a general introduction to the practice of modeling codon evolution using the statistical framework of maximum likelihood. Four real?data analysis activities are used to illustrate the principles of parameter estimation, robustness, hypothesis testing, and site classification. Each activity includes an explicit analytical protocol based on programs provided by the Phylogenetic Analysis by Maximum Likelihood (PAML) package. Curr. Protoc. Mol. Biol. 101:19.1.1?19.1.21. © 2013 by John Wiley & Sons, Inc.

Keywords: molecular evolution; protein evolution; selection pressure; codon models; maximum likelihood

Table of Contents

  • Introduction
  • Codon Modeling Activities Using the CODEMI Program
  • Concluding Remarks
  • Literature Cited
  • Figures
  • Tables
  •   Figure 19.1.1 Modeling the intensity of natural selection via a codon model. (A ) The ω ratio ( d N / d S ) is a parameter used to measure the direction and intensity of natural selection pressure acting on a protein. (B ) Codon models specify the probability of substitution between the sense codons within a protein sequence, which depends on the value of the ω parameter. Given an explicit codon model, such as Goldman and Yang (), the value of the ω parameter can be estimated from a dataset via the method of maximum likelihood. Other model parameters are the transition/transversion ratio (κ) and the frequency of the j th codon (π j ). MHC, major histocompatibility complex.
  •   Figure 19.1.2 Likelihood of the GstD1 sequences from Drosophila simulans and Drosophila melanogaster as a function of both the ω and κ parameters of a codon model. The values of ω and κ that maximize the likelihood of the data (–756.57) are 0.067 and 2.53 respectively.
  •   Figure 19.1.3 Example of the plain‐text representation of (A ) a multiple sequence alignment in PAML format, and (B ) the phylogenetic relationships for five lineages of vertebrates. A tree diagram is provided for clarity but should not be included in the plain‐text file. The alignment comprises the first 20 codons (60 nucleotides) of the Ldh‐A (lactate dehydrogenase A).
  •   Figure 19.1.4 The log likelihood (ℓ) for the gamma globin genes from a chimpanzee ( Pan troglodytes ) and a gibbon ( Hylobates lar ) as a function of the ω parameter of codon model M0. The maximum likelihood estimate of ω is the value that maximizes the likelihood function (–568.58); for these data it is 0.1623.
  •   Figure 19.1.5 Phylogeny for 12 PR gene sequences. Branch lengths are not to scale. The one‐ratio model ( H 0 ) assumes the same intensity of selection pressure over all branches. The H 1 model is based on the hypothesis that a family‐wide shift in selection (both the blue‐ and green‐absorbing PRs) occurred following the evolution of blue‐absorbing PRs. H 1 assumes that selection intensity prior to this event (ωG0 ) differs from selection intensity after this event (ωG1 = ωB1 = ωB2 ). H 2 models a shift in selection pressure only within the blue‐absorbing PRs. Hence, selection intensity is uniform for all blue‐absorbing PRs (ωB1 = ωB2 ) and differs from all green‐absorbing PRs (ωG0 = ωG1 ). H 3 is based on an episodic model of functional divergence where altered selection occurred only in the branch associated with the evolution of blue‐absorbing PRs (ωB1 ), with all other branches being subject to the ancestral levels of selection intensity (ωG0 = ωG1 = ωB2 ).
  •   Figure 19.1.6 Graphical representation of the ω distributions of codon models M0, M1a, M2a, M7, and M8. The nested relationships between M0 vs. M3, M1a vs. M2a, and M7 vs. M8 permit three different likelihood ratio tests (LRTs). The LRT of M0 vs. M3 is a test for variable selection pressure among sites and has df = 4 when M3 has three discrete categories for ω. The LRTs of M1a vs. M2a and of M7 vs. M8 are alternative tests for a fraction of sites subject to positive selection. Each of these LRTs has df = 2.
  •   Figure 19.1.7 Posterior probability of ω0 , ω1 , and ω2 for every site in an alignment of cell‐surface proteins from Flavobacterium psychrophilum . The posterior probabilities are conditioned on the MLEs for ω under codon model M3. At each site the posterior probabilities sum to 1. Site 1 is an example of codons inferred to have evolved under strong purifying selection; this site has a high posterior probability of ω0 = 0.015 (posterior probability = 0.98), and a low posterior probability of ω1 = 0.64 (0.02) and ω2 = 3.58 (0.0). Site 65 is an example of codons having evolved under positive selection; this site has a high posterior probability of ω2 = 3.58 (0.998). Note that the data at some sites provide a less conclusive signal (e.g., site 38).
Literature Cited

Supplementary Materials

Codon usage bias in amy2 gene of D. melanogaster and D. pseudoobscura: Supplementary_FigureS1.pptx

The supporting files for Activities 1 through 4 for codon modeling activities using the CODEML program: Supporting_files_for_Activities.zip

