Detecting the Signatures of Adaptive Evolution in Protein‐Coding Genes

互联网2013-12-31

935

Abstract
Table of Contents
Figures
Literature Cited
Supplementary Materials

Abstract

The field of molecular evolution, which includes genome evolution, is devoted to finding variation within and between groups of organisms and explaining the processes responsible for generating this variation. Many DNA changes are believed to have little to no functional effect, and a neutral process will best explain their evolution. Thus, a central task is to discover which changes had positive fitness consequences and were subject to Darwinian natural selection during the course of evolution. Due the size and complexity of modern molecular datasets, the field has come to rely extensively on statistical modeling techniques to meet this analytical challenge. For DNA sequences that encode proteins, one of the most powerful approaches is to employ a statistical model of codon evolution. This unit provides a general introduction to the practice of modeling codon evolution using the statistical framework of maximum likelihood. Four real?data analysis activities are used to illustrate the principles of parameter estimation, robustness, hypothesis testing, and site classification. Each activity includes an explicit analytical protocol based on programs provided by the Phylogenetic Analysis by Maximum Likelihood (PAML) package. Curr. Protoc. Mol. Biol. 101:19.1.1?19.1.21. © 2013 by John Wiley & Sons, Inc.

Keywords: molecular evolution; protein evolution; selection pressure; codon models; maximum likelihood

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Codon Modeling Activities Using the CODEMI Program
Concluding Remarks
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 19.1.1 Modeling the intensity of natural selection via a codon model. (A ) The ω ratio ( d _N / d _S ) is a parameter used to measure the direction and intensity of natural selection pressure acting on a protein. (B ) Codon models specify the probability of substitution between the sense codons within a protein sequence, which depends on the value of the ω parameter. Given an explicit codon model, such as Goldman and Yang (), the value of the ω parameter can be estimated from a dataset via the method of maximum likelihood. Other model parameters are the transition/transversion ratio (κ) and the frequency of the j ^th codon (π _j ). MHC, major histocompatibility complex.

View Image

Figure 19.1.2 Likelihood of the GstD1 sequences from Drosophila simulans and Drosophila melanogaster as a function of both the ω and κ parameters of a codon model. The values of ω and κ that maximize the likelihood of the data (–756.57) are 0.067 and 2.53 respectively.

View Image

Figure 19.1.3 Example of the plain‐text representation of (A ) a multiple sequence alignment in PAML format, and (B ) the phylogenetic relationships for five lineages of vertebrates. A tree diagram is provided for clarity but should not be included in the plain‐text file. The alignment comprises the first 20 codons (60 nucleotides) of the Ldh‐A (lactate dehydrogenase A).

View Image

Figure 19.1.4 The log likelihood (ℓ) for the gamma globin genes from a chimpanzee ( Pan troglodytes ) and a gibbon ( Hylobates lar ) as a function of the ω parameter of codon model M0. The maximum likelihood estimate of ω is the value that maximizes the likelihood function (–568.58); for these data it is 0.1623.

View Image

Figure 19.1.5 Phylogeny for 12 PR gene sequences. Branch lengths are not to scale. The one‐ratio model ( H ₀ ) assumes the same intensity of selection pressure over all branches. The H ₁ model is based on the hypothesis that a family‐wide shift in selection (both the blue‐ and green‐absorbing PRs) occurred following the evolution of blue‐absorbing PRs. H ₁ assumes that selection intensity prior to this event (ω_G0 ) differs from selection intensity after this event (ω_G1 = ω_B1 = ω_B2 ). H ₂ models a shift in selection pressure only within the blue‐absorbing PRs. Hence, selection intensity is uniform for all blue‐absorbing PRs (ω_B1 = ω_B2 ) and differs from all green‐absorbing PRs (ω_G0 = ω_G1 ). H ₃ is based on an episodic model of functional divergence where altered selection occurred only in the branch associated with the evolution of blue‐absorbing PRs (ω_B1 ), with all other branches being subject to the ancestral levels of selection intensity (ω_G0 = ω_G1 = ω_B2 ).

View Image

Figure 19.1.6 Graphical representation of the ω distributions of codon models M0, M1a, M2a, M7, and M8. The nested relationships between M0 vs. M3, M1a vs. M2a, and M7 vs. M8 permit three different likelihood ratio tests (LRTs). The LRT of M0 vs. M3 is a test for variable selection pressure among sites and has df = 4 when M3 has three discrete categories for ω. The LRTs of M1a vs. M2a and of M7 vs. M8 are alternative tests for a fraction of sites subject to positive selection. Each of these LRTs has df = 2.

View Image

Figure 19.1.7 Posterior probability of ω₀ , ω₁ , and ω₂ for every site in an alignment of cell‐surface proteins from Flavobacterium psychrophilum . The posterior probabilities are conditioned on the MLEs for ω under codon model M3. At each site the posterior probabilities sum to 1. Site 1 is an example of codons inferred to have evolved under strong purifying selection; this site has a high posterior probability of ω₀ = 0.015 (posterior probability = 0.98), and a low posterior probability of ω₁ = 0.64 (0.02) and ω₂ = 3.58 (0.0). Site 65 is an example of codons having evolved under positive selection; this site has a high posterior probability of ω₂ = 3.58 (0.998). Note that the data at some sites provide a less conclusive signal (e.g., site 38).

View Image

Videos

Literature Cited

Literature Cited
	Anisimova, M. and Kosiol, C. 2009. Investigating protein‐coding sequence evolution with probabilistic codon substitution models. Mol. Biol. Evol. 26:255‐271.
	Anisimova, M. and Liberles, D. 2012. Detecting and understanding natural selection. In Codon Evolution: Mechanisms and Models (G. Cannarozzi and A. Schneider, eds.) Oxford University Press, New York.
	Anisimova, M., Bielawski, J.P., and Yang, Z. 2001. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18:1585‐1592.
	Anisimova, M., Bielawski, J.P., and Yang, Z. 2002. Accuracy and power of Bayesian prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19:950‐958.
	Aris‐Brosou, S. and Bielawski, J.P. 2006. Large‐scale analyses of synonymous substitution rates can be sensitive to assumptions about the process of mutation. Gene 378:58‐64.
	Bao, L., Gu, H., Dunn, K.A. and Bielawski, J.P. 2007. Methods for selecting fixed‐effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evol. Biol. 7:S5.
	Bao, L., Gu, H., Dunn, K.A., and Bielawski, J.P. 2008. Likelihood Based Clustering (LiBaC) for Codon Models, a method for grouping sites according to similarities in the underlying process of evolution. Mol. Biol. Evol. 25:1995‐2007.
	Bielawski, J.P. and Yang, Z. 2001. The role of selection in the evolution of the DAZ gene family. Mol. Biol. Evol. 18:523‐529.
	Bielawski, J.P. and Yang, Z. 2004. A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J. Mol. Evol. 59:121‐132.
	Bielawski, J.P. and Yang, Z. 2005. Maximum likelihood methods for detecting adaptive protein evolution. In Statistical Methods in Molecular Evolution (R. Nielsen, ed.) pp. 103‐124. Springer‐Verlag, New York.
	Bielawski, J.P., Dunn, K.A., Sabehi, G., and Béjà, O. 2004. Darwinian adaptation of proteorhodopsin to different light intensities in the marine environment. Proc. Natl. Acad. Sci. U.S.A. 101:14824‐14829.
	DeLong, E.F. and Béjà, O. 2010. The light‐driven proton pump proteorhodopsin enhances bacterial survival during tough times. PLoS Biol. 8:e100359.
	Dutheil, J.Y., Galtier, N., Romiguier, J., Douzery, E.J., Ranwez, V., and Boussau, B. 2012. Efficient selection of branch‐specific models of sequence evolution. Mol. Biol. Evol. 29:1861‐1874.
	Field, S. F., Bulina, M.Y., Kelmanson, I.V., Bielawski, J.P., and Matz, M.V. 2006. Adaptive evolution of multicolored fluorescent proteins in reef‐building corals. J. Mol. Evol. 62:332‐339.
	Goldman, N. and Yang, Z. 1994. A codon based model of nucleotide substitution for protein‐coding DNA sequences. Mol. Biol. Evol. 11:725‐736.
	Guindon, S., Rodrigo, A.G., Dyer, K.A., and Huelsenbeck, J.P. 2004. Modeling the site‐specific variation of selection patterns along lineages. Proc. Natl. Acad. Sci. U.S.A. 101:12957‐12962.
	Jiggins, F.M., Hurst, G.D.D., and Yang, Z. 2002. Host‐symbiont conflicts: Positive selection on the outer membrane protein of parasite but not mutualistic Rickettsiaceae. Mol. Biol. Evol. 19:1341‐1349.
	Kelley, J.L. and Swanson, W.J. 2008. Dietary change and adaptive evolution of enamelin in humans and among primates. Genetics 178:1595‐1603.
	Kosakovsky Pond, S.L. and Frost, S.D. 2005a. A genetic algorithm approach to detecting lineage‐specific variation in selection pressure. Mol. Biol. Evol. 22:478‐485.
	Kosakovsky Pond, S.L. and Frost, S.D. 2005b. Not so different after all: A comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 22:1208‐1222.
	Kosakovsky Pond, S.L. and Muse, S.V. 2005. HyPhy: Hypothesis testing using phylogenies. In Statistical Methods in Molecular Evolution (R. Nielsen, ed.) pp. 125‐181. Springer‐Verlag, New York.
	Kosakovsky Pond, S.L., Murrell, B., Fourment, M., Frost, S.D., Delport, W., and Scheffler, K. 2011. A random effects branch‐site model for detecting episodic diversifying selection. Mol. Biol. Evol. 28:3033‐3043.
	Muse, S.V. and Gaut, B.S. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with applications to the chloroplast genome. Mol. Biol. Evol. 11:715‐725.
	Pawitan, Y. 2001. In all likelihood: Statistical modeling and inference using likelihood. Clarendon Press, Oxford.
	Rodrigue, N., Lartillot, N., and Philippe, H. 2008. Bayesian comparisons of codon substitution models. Genetics 180:1579‐1591.
	Stamatakis, A. 2006. RAxML‐VI‐HPC: Maximum likelihood‐based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688‐2690.
	Swofford, D.L. 2003. PAUundefined. Phylogenetic analysis using parsimony ~undefined and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
	Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568‐573.
	Yang, Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford.
	Yang, Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24:1586‐1591.
	Yang, Z. and Bielawski, J.P. 2000. Statistical methods for detecting molecular adaptation. TREE 15:496‐503.
	Yang, Z. and dos Reis, M. 2011. Statistical properties of the branch‐site test of positive selection. Mol. Biol. Evol. 28:1217‐1228.
	Yang, Z. and Nielsen, R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32‐43.
	Yang, Z. and Nielsen, R. 2002. Codon‐substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908‐917.
	Yang, Z. and Swanson, W.J. 2002. Codon‐substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49‐57.
	Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A.M.K. 2000. Codon‐substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431‐449.
	Yap, V.B., Lindsay, H., Easteal, S., and Huttley, G. 2010. Estimates of the effect of natural selection on protein‐coding content. Mol. Biol. Evol. 27:726‐734.
	Zhang, J., Nielsen, R., and Yang, Z. 2005. Evaluation of an improved branch‐site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22:2472‐2479.
	Zwickl, D.J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Supplementary Materials

Codon usage bias in amy2 gene of D. melanogaster and D. pseudoobscura: Supplementary_FigureS1.pptx

The supporting files for Activities 1 through 4 for codon modeling activities using the CODEML program: Supporting_files_for_Activities.zip

Detecting the Signatures of Adaptive Evolution in Protein‐Coding Genes

Abstract

Table of Contents

Materials

Figures

Videos

Literature Cited

Supplementary Materials