scholarly journals Synonymous Site-to-Site Substitution Rate Variation Dramatically Inflates False Positive Rates of Selection Analyses: Ignore at Your Own Peril

2020 ◽  
Vol 37 (8) ◽  
pp. 2430-2439 ◽  
Author(s):  
Sadie R Wisotsky ◽  
Sergei L Kosakovsky Pond ◽  
Stephen D Shank ◽  
Spencer V Muse

Abstract Most molecular evolutionary studies of natural selection maintain the decades-old assumption that synonymous substitution rate variation (SRV) across sites within genes occurs at levels that are either nonexistent or negligible. However, numerous studies challenge this assumption from a biological perspective and show that SRV is comparable in magnitude to that of nonsynonymous substitution rate variation. We evaluated the impact of this assumption on methods for inferring selection at the molecular level by incorporating SRV into an existing method (BUSTED) for detecting signatures of episodic diversifying selection in genes. Using simulated data we found that failing to account for even moderate levels of SRV in selection testing is likely to produce intolerably high false positive rates. To evaluate the effect of the SRV assumption on actual inferences we compared results of tests with and without the assumption in an empirical analysis of over 13,000 Euteleostomi (bony vertebrate) gene alignments from the Selectome database. This exercise reveals that close to 50% of positive results (i.e., evidence for selection) in empirical analyses disappear when SRV is modeled as part of the statistical analysis and are thus candidates for being false positives. The results from this work add to a growing literature establishing that tests of selection are much more sensitive to certain model assumptions than previously believed.

2004 ◽  
Vol 78 (18) ◽  
pp. 9782-9789 ◽  
Author(s):  
Javier Fernandez ◽  
Deborah Taylor ◽  
Duncan R. Morhardt ◽  
Kathleen Mihalik ◽  
Montserrat Puig ◽  
...  

ABSTRACT Two chimpanzees, 1535 and 1536, became persistently infected following inoculation with RNA transcripts from cDNA clones of hepatitis C virus (HCV). Analysis of the HCV genomes from both animals showed an accumulation of amino acid substitutions over time. The appearance of substitutions in the envelope genes was associated with increased antienvelope antibody titers. However, extensive mutations were not incorporated into hypervariable region 1 (HVR1). A comparison of the nonsynonymous substitution rate/synonymous substitution rate was made at various time points to analyze selective pressure. The highest level of selective pressure occurred during the acute phase and decreased as the infection continued. The nonsynonymous substitution rate was initially higher than the synonymous substitution rate but decreased over time from 3.3 × 10−3 (chimpanzee 1535) and 3.2 × 10−3 (chimpanzee 1536) substitutions/site/year at week 26 to 1.4 × 10−3 (chimpanzee 1535) and 1.7 × 10−3 (chimpanzee 1536) at week 216, while the synonymous substitution rate remained steady at ∼1 × 10−3 substitutions/site/year. Analysis of PCR products using single-stranded conformational polymorphism indicated a low level of heterogeneity in the viral genome. The results of these studies confirm that the persistence of infection is not solely due to changes in HVR1 or heterogeneity and that the majority of variants observed in natural infections could not arise simply through mutation during the time period most humans and chimpanzees are observed. These data also indicate that immune pressure and selection continue throughout the chronic phase.


2005 ◽  
Vol 35 (3) ◽  
pp. 411-419 ◽  
Author(s):  
Ying Li ◽  
Michael Wallis ◽  
Ya-ping Zhang

Divergence of proteins in signaling pathways requires ligand and receptor coevolution to maintain or improve binding affinity and/or specificity. In this paper we show a clear case of coevolution between the prolactin (PRL) gene and its receptor (prolactin receptor, PRLR) in mammals. First we observed episodic evolution of the extracellular and intracellular domains of the PRLR, which is closely consistent with that seen in PRL. Correlated evolution was demonstrated both between PRL and its receptor and between the two domains of the PRLR using Pearson’s correlation coefficient. On comparing the ratio of the nonsynonymous substitution rate to synonymous substitution rate (ω =dN/dS) for each branch of the star phylogeny of mammalian PRLRs, separately for the extracellular domain (ECD) and the transmembrane domain/intracellular domain (TMD/ICD), we observed a lower ω ratio for ECD than TMD/ICD along those branches leading to pig, dog and rabbit but a higher ratio for ECD than TMD/ICD on the branches leading to primates, rodents and ruminants, on which bursts of rapid evolution were observed. These observations can be best explained by coevolution between PRL and its receptor and between the two domains of the PRLR.


2017 ◽  
Author(s):  
Iakov I. Davydov ◽  
Nicolas Salamin ◽  
Marc Robinson-Rechavi

AbstractThere are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. While we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated with Wolbachia infection. We provide software to perform positive selection analysis using the new model.


2013 ◽  
Vol 182 (4) ◽  
pp. 494-513 ◽  
Author(s):  
Stacy O. Scholle ◽  
Rolf J. F. Ypma ◽  
Alun L. Lloyd ◽  
Katia Koelle

2009 ◽  
Vol 9 (1) ◽  
pp. 260 ◽  
Author(s):  
Daniel B Sloan ◽  
Bengt Oxelman ◽  
Anja Rautenberg ◽  
Douglas R Taylor

2008 ◽  
Vol 25 (8) ◽  
pp. 1778-1787 ◽  
Author(s):  
J. L. Knies ◽  
K. K. Dang ◽  
T. J. Vision ◽  
N. G. Hoffman ◽  
R. Swanstrom ◽  
...  

2018 ◽  
Author(s):  
Xuhua Xia

AbstractGamma distribution has been used to fit substitution rate variation over site. One simple method to estimate the shape parameter of the gamma distribution is to 1) reconstruct a phylogenetic tree and the ancestral states of internal nodes, 2) perform pairwise comparison between nodes on each side of each branch to count the number of “observed” substitutions for each site, and apply correction of multiple hits to derive the estimated number of substitutions for each site, and 3) fit the site-specific substitution data to gamma distribution to obtain the shape parameter α This method is fast but its accuracy depends much on the accuracy of the estimated site-specific number of substitutions. The existing method has three shortcomings. First, it uses Poisson correction which is inadequate for almost any nucleotide sequences. Second, it does independent estimation for the number of substitutions at each site without making use of information at all sites. Third, the program implementing the method has never been made publically available. I have implemented in DAMBE software a new method based on the F84 substitution model with simultaneous estimation that uses information from all sites in estimating the number of substitutions at each site. DAMBE is freely available at available athttp://dambe.bio.uottawa.ca


Heredity ◽  
1997 ◽  
Vol 78 (1) ◽  
pp. 21-31 ◽  
Author(s):  
Dan Fieldhouse ◽  
Fariborz Yazdani ◽  
G Brian Golding

Sign in / Sign up

Export Citation Format

Share Document