scholarly journals Contrast-FEL—A Test for Differences in Selective Pressures at Individual Sites among Clades and Sets of Branches

Author(s):  
Sergei L Kosakovsky Pond ◽  
Sadie R Wisotsky ◽  
Ananias Escalante ◽  
Brittany Rife Magalis ◽  
Steven Weaver

Abstract A number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K≥2 sets of branches in a phylogenetic tree have detectably different ω ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates, when the model is correctly specified. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.

Author(s):  
Sergei L. Kosakovsky Pond ◽  
Sadie R Wisotsky ◽  
Ananias Escalante ◽  
Brittany Rife Magalis ◽  
Steven Weaver

AbstractA number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences, and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal, and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K ≥ 2 sets of branches in a phylogenetic tree have detectably different dN/dS ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.


2016 ◽  
Author(s):  
Rui J. Costa ◽  
Hilde Wilkinson-Herbots

AbstractThe isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, Becquet and Przeworski (2009) report that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions (including the assumption of constant gene flow until the present). This paper is concerned with the isolation-with-initial-migration (IIM) model of Wilkinson-Herbots (2012), which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used to distinguish between alternative models representing different evolutionary scenarios, by means of likelihood ratio tests. We illustrate the procedure on pairs of Drosophila sequences from approximately 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this paper.


2016 ◽  
Vol 1 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Basant K. Tiwary

Background/Aims: A recent duplication of the gene encoding SLIT-ROBO Rho GTPase-activating protein 2 (SRGAP2) in the primate lineage has been proposed to be associated with the human-specific extraordinary development of intelligence. There is no report regarding the role of the SRGAP2 gene in the expression of neural traits indicating intelligence in mammals. Methods: A phylogenetic tree of the SRGAP2 gene from 11 mammals was reconstructed using MrBayes. The evolution of neural traits along the branches of the phylogenetic tree was modeled in the BayesTraits, and the dN/dS ratio (i.e. the ratio between the number of nonsynonymous substitutions per nonsynonymous site and the number of synonymous substitutions per synonymous site) was estimated using the codon-based maximum likelihood method (CODEML) in PAML (phylogenetic analysis by maximum likelihood). Results: Two neural traits, namely brain mass and the number of cortical neurons, showed statistical dependency on the underlying evolutionary history of the SRGAP2 gene in mammals. A significant positive correlation between the increase in cortical neurons and the rate of nucleotide substitutions in the SRGAP2 gene was observed concomitantly with a significant negative correlation between the increase in cortical neurons and the rate of nonsynonymous substitutions in the gene. The SRGAP2 gene appears to be under intense pressure of purifying selection in all mammalian lineages under stringent functional constraint. Conclusion: This work indicates a key role of the SRGAP2 gene in the rapid expansion of neurons in the brain cortex, thereby facilitating the evolution of remarkable intelligence in mammals.


2017 ◽  
Author(s):  
Sebastian Duchene ◽  
David Duchene ◽  
Jemma Geoghegan ◽  
Zoe Anne Dyson ◽  
Jane Hawkey ◽  
...  

Background: Recent developments in sequencing technologies make it possible to obtain genome sequences from a large number of isolates in a very short time. Bayesian phylogenetic approaches can take advantage of these data by simultaneously inferring the phylogenetic tree, evolutionary timescale, and demographic parameters (such as population growth rates), while naturally integrating uncertainty in all parameters. Despite their desirable properties, Bayesian approaches can be computationally intensive, hindering their use for outbreak investigations involving genome data for a large numbers of pathogen isolates. An alternative to using full Bayesian inference is to use a hybrid approach, where the phylogenetic tree and evolutionary timescale are estimated first using maximum likelihood. Under this hybrid approach, demographic parameters are inferred from estimated trees instead of the sequence data, using maximum likelihood, Bayesian inference, or approximate Bayesian computation. This can vastly reduce the computational burden, but has the disadvantage of ignoring the uncertainty in the phylogenetic tree and evolutionary timescale. Results: We compared the performance of a fully Bayesian and a hybrid method by analysing six whole-genome SNP data sets from a range of bacteria and simulations. The estimates from the two methods were very similar, suggesting that the hybrid method is a valid alternative for very large datasets. However, we also found that congruence between these methods is contingent on the presence of strong temporal structure in the data (i.e. clocklike behaviour), which is typically verified using a date-randomisation test in a Bayesian framework. To reduce the computational burden of this Bayesian test we implemented a date-randomisation test using a rapid maximum likelihood method, which has similar performance to its Bayesian counterpart. Conclusions: Hybrid approaches can produce reliable inferences of evolutionary timescales and phylodynamic parameters in a fraction of the time required for fully Bayesian analyses. As such, they are a valuable alternative in outbreak studies involving a large number of isolates.


2012 ◽  
Author(s):  
Fadhilah Y. ◽  
Zalina Md. ◽  
Nguyen V–T–V. ◽  
Suhaila S. ◽  
Zulkifli Y.

Dalam mengenal pasti model yang terbaik untuk mewakili taburan jumlah hujan bagi data selang masa satu jam di 12 stesen di Wilayah Persekutuan empat taburan digunakan iaitu Taburan Eksponen, Gamma, Weibull dan Gabungan Eksponen. Parameter–parameter dianggar menggunakan kaedah kebolehjadian maksimum. Model yang terbaik dipilih berdasarkan nilai minimum yang diperolehi daripada ujian–ujian kebagusan penyuaian yang digunakan dalam kajian ini. Ujian ini dipertahankan lagi dengan plot kebarangkalian dilampaui. Taburan Gabungan Eksponen di dapati paling baik untuk mewakili taburan jumlah hujan dalam selang masa satu jam. Daripada anggaran parameter bagi taburan Gabungan Eksponen ini, boleh diterjemah bahawa jumlah hujan tertinggi yang direkodkan diperolehi daripada hujan yang dikategorikan sebagai hujan lebat, walaupun hujan renyai–renyai berlaku lebih kerap. Kata kunci: Jumlah hujan dalam selang masa sejam, ujian kebagusan penyuaian, kebolehjadian maksimum In determining the best–fit model for the hourly rainfall amounts for the twelve stations in the Wilayah Persekutuan, four distributions namely, the Exponential, Gamma, Weibull and Mixed–Exponential were used. Parameters for each distribution were estimated using the maximum likelihood method. The best–fit model was chosen based upon the minimum error produced by the goodness–offit tests used in this study. The tests were justified further by the exceedance probability plot. The Mixed–Exponential was found to be the most appropriate distribution in describing the hourly rainfall amounts. From the parameter estimates for the Mixed–Exponential distribution, it could be implied that most of the hourly rainfall amount recorded were received from the heavy rainfall even though there was a high occurrences of light rainfall. Key words: Hourly rainfall amount, goodness-of-fit test, exceedance probability, maximum likelihood


2021 ◽  
Vol 4 (4) ◽  
pp. 155-165
Author(s):  
Aminu Suleiman Mohammed ◽  
Badamasi Abba ◽  
Abubakar G. Musa

For proper actualization of the phenomenon contained in some lifetime data sets, a generalization, extension or modification of classical distributions is required. In this paper, we introduce a new generalization of exponential distribution, called the generalized odd generalized exponential-exponential distribution. The proposed distribution can model lifetime data with different failure rates, including the increasing, decreasing, unimodal, bathtub, and decreasing-increasing-decreasing failure rates. Various properties of the model such as quantile function, moment, mean deviations, Renyi entropy, and order statistics.  We provide an approximation for the values of the mean, variance, skewness, kurtosis, and mean deviations using Monte Carlo simulation experiments. Estimating of the distribution parameters is performed using the maximum likelihood method, and Monte Carlo simulation experiments is used to assess the estimation method. The method of maximum likelihood is shown to provide a promising parameter estimates, and hence can be adopted in practice for estimating the parameters of the distribution. An application to real and simulated datasets indicated that the new model is superior to the fits than the other compared distributions


2015 ◽  
Vol 38 (2) ◽  
pp. 453-466 ◽  
Author(s):  
Hugo S. Salinas ◽  
Yuri A. Iriarte ◽  
Heleno Bolfarine

<p>In this paper we introduce a new distribution for modeling positive data with high kurtosis. This distribution can be seen as an extension of the exponentiated Rayleigh distribution. This extension builds on the quotient of two independent random variables, one exponentiated Rayleigh in the numerator and Beta(q,1) in the denominator with q&gt;0. It is called the slashed exponentiated Rayleigh random variable. There is evidence that the distribution of this new variable can be more flexible in terms of modeling the kurtosis regarding the exponentiated Rayleigh distribution. The properties of this distribution are studied and the parameter estimates are calculated using the maximum likelihood method. An application with real data reveals good performance of this new distribution.</p>


Sign in / Sign up

Export Citation Format

Share Document