A Bayesian Mutation–Selection Framework for Detecting Site-Specific Adaptive Evolution in Protein-Coding Genes

Molecular Biology and Evolution ◽

10.1093/molbev/msaa265 ◽

2020 ◽

Author(s):

Nicolas Rodrigue ◽

Thibault Latrille ◽

Nicolas Lartillot

Keyword(s):

Adaptive Evolution ◽

Real Data ◽

Data Sets ◽

Protein Coding ◽

Site Specific ◽

Protein Coding Genes ◽

Codon Substitution ◽

Selection Framework ◽

Dna Alignment ◽

The Impact

Abstract In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.

Download Full-text

Erratum to: A Bayesian mutation-selection framework for detecting site-specific adaptive evolution in protein-coding genes

Molecular Biology and Evolution ◽

10.1093/molbev/msab135 ◽

2021 ◽

Author(s):

Nicolas Rodrigue ◽

Thibault Latrille ◽

Nicolas Lartillot

Keyword(s):

Adaptive Evolution ◽

Protein Coding ◽

Site Specific ◽

Protein Coding Genes ◽

Selection Framework

Download Full-text

How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the Fisher–Wright mutation–selection framework

Biology Letters ◽

10.1098/rsbl.2014.1031 ◽

2015 ◽

Vol 11 (4) ◽

pp. 20141031 ◽

Cited By ~ 12

Author(s):

Mario dos Reis

Keyword(s):

First Principles ◽

Rate Ratio ◽

Real Data ◽

Synonymous Substitution ◽

Chloroplast Gene ◽

Synonymous Substitution Rate ◽

Protein Coding ◽

Protein Coding Genes ◽

Selection Framework ◽

Insight Into

First principles of population genetics are used to obtain formulae relating the non-synonymous to synonymous substitution rate ratio to the selection coefficients acting at codon sites in protein-coding genes. Two theoretical cases are discussed and two examples from real data (a chloroplast gene and a virus polymerase) are given. The formulae give much insight into the dynamics of non-synonymous substitutions and may inform the development of methods to detect adaptive evolution.

Download Full-text

Codon-Substitution Models for Heterogeneous Selection Pressure at Amino Acid Sites

Genetics ◽

10.1093/genetics/155.1.431 ◽

2000 ◽

Vol 155 (1) ◽

pp. 431-449 ◽

Cited By ~ 41

Author(s):

Ziheng Yang ◽

Rasmus Nielsen ◽

Nick Goldman ◽

Anne-Mette Krabbe Pedersen

Keyword(s):

Amino Acid ◽

Positive Selection ◽

Selective Pressure ◽

Acid Sites ◽

Data Sets ◽

Protein Coding ◽

Important Indicator ◽

Diversifying Selection ◽

Codon Substitution ◽

Neutral Mutations

AbstractComparison of relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and have different underlying ω ratios. We develop models that account for heterogeneous ω ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are useful for testing for adaptive molecular evolution and identifying amino acid sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of ω among sites. In all data sets analyzed, the selective pressure indicated by the ω ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average ω ratio across sites is <1, but in which some sites are clearly under diversifying selection with ω > 1. Genes undergoing positive selection include the β-globin gene from vertebrates, mitochondrial protein-coding genes from hominoids, the hemagglutinin (HA) gene from human influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the presence of positively selected sites and their subsequent identification appear quite robust to the specific distributional form assumed for ω and can be achieved using any of several models we implement. However, we encountered difficulties in estimating the precise distribution of ω among sites from real data sets.

Download Full-text

Comparative analysis of chloroplast genomes for five Dicliptera species (Acanthaceae): molecular structure, phylogenetic relationships, and adaptive evolution

PeerJ ◽

10.7717/peerj.8450 ◽

2020 ◽

Vol 8 ◽

pp. e8450 ◽

Cited By ~ 2

Author(s):

Sunan Huang ◽

Xuejun Ge ◽

Asunción Cano ◽

Betty Gaby Millán Salazar ◽

Yunfei Deng

Keyword(s):

Adaptive Evolution ◽

Phylogenetic Relationships ◽

Single Copy ◽

Rrna Genes ◽

Trna Genes ◽

Evolutionary Analysis ◽

Protein Coding ◽

Variable Regions ◽

Protein Coding Genes ◽

Chloroplast Genomes

The genus Dicliptera (Justicieae, Acanthaceae) consists of approximately 150 species distributed throughout the tropical and subtropical regions of the world. Newly obtained chloroplast genomes (cp genomes) are reported for five species of Dilciptera (D. acuminata, D. peruviana, D. montana, D. ruiziana and D. mucronata) in this study. These cp genomes have circular structures of 150,689–150,811 bp and exhibit quadripartite organizations made up of a large single copy region (LSC, 82,796–82,919 bp), a small single copy region (SSC, 17,084–17,092 bp), and a pair of inverted repeat regions (IRs, 25,401–25,408 bp). Guanine-Cytosine (GC) content makes up 37.9%–38.0% of the total content. The complete cp genomes contain 114 unique genes, including 80 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. Comparative analyses of nucleotide variability (Pi) reveal the five most variable regions (trnY-GUA-trnE-UUC, trnG-GCC, psbZ-trnG-GCC, petN-psbM, and rps4-trnL-UUA), which may be used as molecular markers in future taxonomic identification and phylogenetic analyses of Dicliptera. A total of 55-58 simple sequence repeats (SSRs) and 229 long repeats were identified in the cp genomes of the five Dicliptera species. Phylogenetic analysis identified a close relationship between D. ruiziana and D. montana, followed by D. acuminata, D. peruviana, and D. mucronata. Evolutionary analysis of orthologous protein-coding genes within the family Acanthaceae revealed only one gene, ycf15, to be under positive selection, which may contribute to future studies of its adaptive evolution. The completed genomes are useful for future research on species identification, phylogenetic relationships, and the adaptive evolution of the Dicliptera species.

Download Full-text

A COMPARISON OF SCORING METRICS FOR PREDICTING THE NEXT NAVIGATION STEP WITH MARKOV MODEL-BASED SYSTEMS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003956 ◽

2010 ◽

Vol 09 (04) ◽

pp. 547-573 ◽

Cited By ~ 4

Author(s):

JOSÉ BORGES ◽

MARK LEVENE

Keyword(s):

Markov Model ◽

Prediction Accuracy ◽

Prediction Models ◽

Markov Models ◽

Real Data ◽

Absolute Error ◽

Brier Score ◽

Data Sets ◽

Extensive Evaluation ◽

The Impact

The problem of predicting the next request during a user's navigation session has been extensively studied. In this context, higher-order Markov models have been widely used to model navigation sessions and to predict the next navigation step, while prediction accuracy has been mainly evaluated with the hit and miss score. We claim that this score, although useful, is not sufficient for evaluating next link prediction models with the aim of finding a sufficient order of the model, the size of a recommendation set, and assessing the impact of unexpected events on the prediction accuracy. Herein, we make use of a variable length Markov model to compare the usefulness of three alternatives to the hit and miss score: the Mean Absolute Error, the Ignorance Score, and the Brier score. We present an extensive evaluation of the methods on real data sets and a comprehensive comparison of the scoring methods.

Download Full-text

Detecting the Signatures of Adaptive Evolution in Protein-Coding Genes

Current Protocols in Molecular Biology ◽

10.1002/0471142727.mb1901s101 ◽

2013 ◽

Vol 101 (1) ◽

pp. 19.1.1-19.1.21 ◽

Cited By ~ 4

Author(s):

Joseph P. Bielawski

Keyword(s):

Adaptive Evolution ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Plants regenerated from tissue culture contain stable epigenome changes in rice

eLife ◽

10.7554/elife.00354 ◽

2013 ◽

Vol 2 ◽

Cited By ~ 131

Author(s):

Hume Stroud ◽

Bo Ding ◽

Stacey A Simon ◽

Suhua Feng ◽

Maria Bellizzi ◽

...

Keyword(s):

Tissue Culture ◽

Phenotypic Variability ◽

Whole Genome ◽

Single Nucleotide ◽

Protein Coding ◽

Protein Coding Genes ◽

Regenerated Plants ◽

Nucleotide Resolution ◽

The Impact ◽

Single Nucleotide Resolution

Most transgenic crops are produced through tissue culture. The impact of utilizing such methods on the plant epigenome is poorly understood. Here we generated whole-genome, single-nucleotide resolution maps of DNA methylation in several regenerated rice lines. We found that all tested regenerated plants had significant losses of methylation compared to non-regenerated plants. Loss of methylation was largely stable across generations, and certain sites in the genome were particularly susceptible to loss of methylation. Loss of methylation at promoters was associated with deregulated expression of protein-coding genes. Analyses of callus and untransformed plants regenerated from callus indicated that loss of methylation is stochastically induced at the tissue culture step. These changes in methylation may explain a component of somaclonal variation, a phenomenon in which plants derived from tissue culture manifest phenotypic variability.

Download Full-text

Preliminary assessment of adaptive evolution of mitochondrial protein coding genes in darters (Percidae: Etheostomatinae)

F1000Research ◽

10.12688/f1000research.17552.2 ◽

2019 ◽

Vol 8 ◽

pp. 464 ◽

Cited By ~ 1

Author(s):

Leos G. Kral ◽

Sara Watson

Keyword(s):

Positive Selection ◽

Adaptive Evolution ◽

Gene Evolution ◽

Mitochondrial Gene ◽

Mitochondrial Protein ◽

Preliminary Assessment ◽

Protein Coding ◽

Protein Coding Genes ◽

Show Evidence ◽

Selection Of

Background: Mitochondrial DNA of vertebrates contains genes for 13 proteins involved in oxidative phosphorylation. Some of these genes have been shown to undergo adaptive evolution in a variety of species. This study examines all mitochondrial protein coding genes in 11 darter species to determine if any of these genes show evidence of positive selection. Methods: The mitogenome from four darter was sequenced and annotated. Mitogenome sequences for another seven species were obtained from GenBank. Alignments of each of the protein coding genes were subject to codon-based identification of positive selection by Selecton, MEME and FEL. Results: Evidence of positive selection was obtained for six of the genes by at least one of the methods. CYTB was identified as having evolved under positive selection by all three methods at the same codon location. Conclusions: Given the evidence for positive selection of mitochondrial protein coding genes in darters, a more extensive analysis of mitochondrial gene evolution in all the extant darter species is warranted.

Download Full-text

The complete plastid genomes of four species from Brassicales

10.1101/458000 ◽

2018 ◽

Author(s):

Weixue Mu ◽

Ting Yang ◽

Xin Liu

Keyword(s):

Single Copy ◽

Data Sets ◽

Base Pairs ◽

Typical Structure ◽

Protein Coding ◽

Protein Coding Genes ◽

Phylogenetic Studies ◽

Plastid Genomes ◽

Cleome Rutidosperma ◽

Small Single Copy

AbstractBrassicales is a diverse angiosperm order with about 4,700 recognized species. Here, we assembled and described the complete plastid genomes from four species of Brassicales: Capparis urophylla F.Chun (Capparaceae), Carica papaya L. (Caricaceae), Cleome rutidosperma DC. (Cleomaceae), and Moringa oleifera Lam. (Moringaceae), including two plastid genomes newly assembled for two families (Capparaceae and Moringaceae). The four plastid genomes are 159,680 base pairs on average in length and encode 78 protein-coding genes. The genomes each contains a typical structure of a Large Single-Copy (LSC) region and a Small Single-Copy (SSC) region separated by two Inverted Repeat (IR) regions. We performed the maximum-likelihood (ML) phylogenetic analysis using three different data sets of 66 protein-coding genes (ntAll, ntNo3rd and AA). Our phylogenetic results from different dataset are congruent, and are consistent with previous phylogenetic studies of Brassiales.

Download Full-text

A codon model of nucleotide substitution with selection on synonymous codon usage

10.1101/007849 ◽

2014 ◽

Author(s):

Laura Kubatko ◽

Premal Shah ◽

Radu Herbei ◽

Michael Gilchrist

Keyword(s):

Protein Production ◽

Synonymous Codon ◽

Error Rates ◽

Selection Model ◽

Substitution Model ◽

Protein Coding ◽

New Model ◽

Protein Coding Genes ◽

Codon Substitution ◽

Yeast Genes

The quality of phylogenetic inference made from protein-coding genes depends, in part, on the realism with which the codon substitution process is modeled. Here we propose a new mechanistic model that combines the standard M0 substitution model of Yang (1997) with a simplified model from Gilchrist (2007) that includes selection on synonymous substitutions as a function of codon-specific nonsense error rates. We tested the newly proposed model by applying it to 104 protein-coding genes in brewer's yeast, and compared the fit of the new model to the standard M0 model and to the mutation-selection model of Yang and Nielsen (2008) using the AIC. Our new model provided significantly better fit in approximately 85% of the cases considered for the basic M0 model and in approximately 25% of the cases for the M0 model with estimated codon frequencies, but only in a few cases when the mutation-selection model was considered. However, our model includes a parameter that can be interpreted as a measure of the rate of protein production, and the estimates of this parameter were highly correlated with an independent measure of protein production for the yeast genes considered here. Finally, we found that in some cases the new model led to the preference of a different phylogeny for a subset of the genes considered, indicating that substitution model choice may have an impact on the estimated phylogeny.

Download Full-text