scholarly journals A Machine Learning Method for Detecting Autocorrelation of Evolutionary Rates in Large Phylogenies

2019 ◽  
Vol 36 (4) ◽  
pp. 811-824 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Fabia U. Battistuzzi ◽  
Sudhir Kumar

Abstract New species arise from pre-existing species and inherit similar genomes and environments. This predicts greater similarity of the tempo of molecular evolution between direct ancestors and descendants, resulting in autocorrelation of evolutionary rates in the tree of life. Surprisingly, molecular sequence data have not confirmed this expectation, possibly because available methods lack the power to detect autocorrelated rates. Here, we present a machine learning method, CorrTest, to detect the presence of rate autocorrelation in large phylogenies. CorrTest is computationally efficient and performs better than the available state-of-the-art method. Application of CorrTest reveals extensive rate autocorrelation in DNA and amino acid sequence evolution of mammals, birds, insects, metazoans, plants, fungi, parasitic protozoans, and prokaryotes. Therefore, rate autocorrelation is a common phenomenon throughout the tree of life. These findings suggest concordance between molecular and nonmolecular evolutionary patterns, and they will foster unbiased and precise dating of the tree of life.

2018 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Fabia Battistuzzi ◽  
Sudhir Kumar

AbstractNew species arise from pre-existing species and inherit similar genomes and environments. This predicts greater similarity of mutation rates and the tempo of molecular evolution between direct ancestors and descendants, resulting in autocorrelation of evolutionary rates within lineages in the tree of life. Surprisingly, molecular sequence data have not confirmed this expectation, possibly because available methods lack power to detect autocorrelated rates. Here we present a machine learning method to detect the presence evolutionary rate autocorrelation in large phylogenies. The new method is computationally efficient and performs better than the available state-of-the-art methods. Application of the new method reveals extensive rate autocorrelation in DNA and amino acid sequence evolution of mammals, birds, insects, metazoans, plants, fungi, and prokaryotes. Therefore, rate autocorrelation is a common phenomenon throughout the tree of life. These findings suggest concordance between molecular and non-molecular evolutionary patterns and will foster unbiased and precise dating of the tree of life.


2020 ◽  
Vol 20 (4) ◽  
pp. 410-436
Author(s):  
Sarah E Heaps ◽  
Tom MW Nye ◽  
Richard J Boys ◽  
Tom A Williams ◽  
Svetlana Cherlin ◽  
...  

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees relating species. Along branches, sequence evolution is modelled using a continuous-time Markov process characterized by an instantaneous rate matrix. Early models assumed the same rate matrix governed substitutions at all sites of the alignment, ignoring variation in evolutionary pressures. Substantial improvements in phylogenetic inference and model fit were achieved by augmenting these models with multiplicative random effects that describe the result of variation in selective constraints and allow sites to evolve at different rates which linearly scale a baseline rate matrix. Motivated by this pioneering work, we consider an extension using a quadratic, rather than linear, transformation. The resulting models allow for variation in the selective coefficients of different types of point mutation at a site in addition to variation in selective constraints. We derive properties of the extended models. For certain non-stationary processes, the extension gives a model that allows variation in sequence composition, both across sites and taxa. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. Our quadratic models are applied to alignments spanning the tree of life and compared with site-homogeneous and linear models.


2017 ◽  
Author(s):  
Stephen M Crotty ◽  
Bui Quang Minh ◽  
Nigel G Bean ◽  
Barbara R Holland ◽  
Jonathan Tuke ◽  
...  

AbstractMolecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths and substitution model parameters from heterotachously-evolved sequences. We develop a model selection algorithm based on simulation results, and investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic dataset composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a dataset composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to infer a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model is able to offer unique biological insights when applied to empirical data.


Author(s):  
Stephen M Crotty ◽  
Bui Quang Minh ◽  
Nigel G Bean ◽  
Barbara R Holland ◽  
Jonathan Tuke ◽  
...  

Abstract Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.


2019 ◽  
Author(s):  
Hironori Takemoto ◽  
Tsubasa Goto ◽  
Yuya Hagihara ◽  
Sayaka Hamanaka ◽  
Tatsuya Kitamura ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document