scholarly journals GEMME: a simple and fast global epistatic model predicting mutational effects

2019 ◽  
Author(s):  
Elodie Laine ◽  
Yasaman Karami ◽  
Alessandra Carbone

AbstractsThe systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling inter-site dependencies within biological sequences. However, state-of-the-art methods require numerous highly variable sequences and remain time consuming. Here, we present GEMME (www.lcqb.upmc.fr/GEMME), a method that overcomes these limitations by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. Assessed against 41 experimental high-throughput mutational scans, GEMME overall performs similarly or better than existing methods and runs faster by several orders of magnitude. It greatly improves predictions for viral sequences and, more generally, for very conserved families. It uses only a few biologically meaningful and interpretable parameters, while existing methods work with hundreds of thousands of parameters.

2019 ◽  
Vol 36 (11) ◽  
pp. 2604-2619 ◽  
Author(s):  
Elodie Laine ◽  
Yasaman Karami ◽  
Alessandra Carbone

Abstract The systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering, and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling intersite dependencies within biological sequences. However, state-of-the-art methods remain time consuming. Here, we present Global Epistatic Model for predicting Mutational Effects (GEMME) (www.lcqb.upmc.fr/GEMME), an original and fast method that predicts mutational outcomes by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. GEMME uses only a few biologically meaningful and interpretable parameters. Assessed against 50 high- and low-throughput mutational experiments, it overall performs similarly or better than existing methods. It accurately predicts the mutational landscapes of a wide range of protein families, including viral ones and, more generally, of much conserved families. Given an input alignment, it generates the full mutational landscape of a protein in a matter of minutes. It is freely available as a package and a webserver at www.lcqb.upmc.fr/GEMME/.


Viruses ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 737
Author(s):  
Issiaka Bagayoko ◽  
Marcos Giovanni Celli ◽  
Gustavo Romay ◽  
Nils Poulicard ◽  
Agnès Pinel-Galzi ◽  
...  

The rice stripe necrosis virus (RSNV) has been reported to infect rice in several countries in Africa and South America, but limited genomic data are currently publicly available. Here, eleven RSNV genomes were entirely sequenced, including the first corpus of RSNV genomes of African isolates. The genetic variability was differently distributed along the two genomic segments. The segment RNA1, within which clusters of polymorphisms were identified, showed a higher nucleotidic variability than did the beet necrotic yellow vein virus (BNYVV) RNA1 segment. The diversity patterns of both viruses were similar in the RNA2 segment, except for an in-frame insertion of 243 nucleotides located in the RSNV tgbp1 gene. Recombination events were detected into RNA1 and RNA2 segments, in particular in the two most divergent RSNV isolates from Colombia and Sierra Leone. In contrast to BNYVV, the RSNV molecular diversity had a geographical structure with two main RSNV lineages distributed in America and in Africa. Our data on the genetic diversity of RSNV revealed unexpected differences with BNYVV suggesting a complex evolutionary history of the genus Benyvirus.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Sven D. Schrinner ◽  
Rebecca Serra Mari ◽  
Jana Ebler ◽  
Mikko Rautiainen ◽  
Lancelot Seillier ◽  
...  

Abstract Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. Polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes.We present WhatsHap polyphase, a novel two-stage approach that addresses these challenges by (i) clustering reads and (ii) threading the haplotypes through the clusters. Our method outperforms the state-of-the-art in terms of phasing quality. Using a real tetraploid potato dataset, we demonstrate how to assemble local genomic regions of interest at the haplotype level. Our algorithm is implemented as part of the widely used open source tool WhatsHap.


Diversity ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 70 ◽  
Author(s):  
Juan C. Garcia-R ◽  
Emily Moriarty Lemmon ◽  
Alan R. Lemmon ◽  
Nigel French

The integration of state-of-the-art molecular techniques and analyses, together with a broad taxonomic sampling, can provide new insights into bird interrelationships and divergence. Despite their evolutionary significance, the relationships among several rail lineages remain unresolved as does the general timescale of rail evolution. Here, we disentangle the deep phylogenetic structure of rails using anchored phylogenomics. We analysed a set of 393 loci from 63 species, representing approximately 40% of the extant familial diversity. Our phylogenomic analyses reconstruct the phylogeny of rails and robustly infer several previously contentious relationships. Concatenated maximum likelihood and coalescent species-tree approaches recover identical topologies with strong node support. The results are concordant with previous phylogenetic studies using small DNA datasets, but they also supply an additional resolution. Our dating analysis provides contrasting divergence times using fossils and Bayesian and non-Bayesian approaches. Our study refines the evolutionary history of rails, offering a foundation for future evolutionary studies of birds.


2006 ◽  
Vol 24 (1) ◽  
pp. 146-158 ◽  
Author(s):  
O Thalmann ◽  
A Fischer ◽  
F Lankester ◽  
S Paabo ◽  
L Vigilant

2021 ◽  
Vol 118 (31) ◽  
pp. e2107434118
Author(s):  
Peter R. Grant ◽  
B. Rosemary Grant

Many species of plants, animals, and microorganisms exchange genes well after the point of evolutionary divergence at which taxonomists recognize them as species. Genomes contain signatures of past gene exchange and, in some cases, they reveal a legacy of lineages that no longer exist. But genomic data are not available for many organisms, and particularly problematic for reconstructing and interpreting evolutionary history are communities that have been depleted by extinctions. For these, morphology may substitute for genes, as exemplified by the history of Darwin’s finches on the Galápagos islands of Floreana and San Cristóbal. Darwin and companions collected seven specimens of a uniquely large form of Geospiza magnirostris in 1835. The populations became extinct in the next few decades, partly due to destruction of Opuntia cactus by introduced goats, whereas Geospiza fortis has persisted to the present. We used measurements of large samples of G. fortis collected for museums in the period 1891 to 1906 to test for unusually large variances and skewed distributions of beak and body size resulting from introgression. We found strong evidence of hybridization on Floreana but not on San Cristóbal. The skew is in the direction of the absent G. magnirostris. We estimate introgression influenced 6% of the frequency distribution that was eroded by selection after G. magnirostris became extinct on these islands. The genetic residuum of an extinct species in an extant one has implications for its future evolution, as well as for a conservation program of reintroductions in extinction-depleted communities.


2019 ◽  
Author(s):  
Sebastian Deorowicz

AbstractMotivationThe amount of genomic data that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives.ResultsWe present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world. The compression ratios are often tens of percent better than offered by the state-of-the-art tools.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.


Author(s):  
Weijian Guo ◽  
Di Sun ◽  
Yang Cao ◽  
Linlin Xiao ◽  
Xin Huang ◽  
...  

AbstractRecently diverged taxa are often characterized by high rates of hybridization, which can complicate phylogenetic reconstruction. For this reason, the phylogenetic relationships and evolutionary history of dolphins are still not very well resolved; the question of whether the genera Tursiops and Stenella are monophyletic is especially controversial. Here, we performed re-sequencing of six dolphin genomes and combined them with eight previously published dolphin SRA datasets and six whole-genome datasets to investigate the phylogenetic relationships of dolphins and test the monophyly hypothesis of Tursiops and Stenella. Phylogenetic reconstruction with the maximum likelihood and Bayesian methods of concatenated loci, as well as with coalescence analyses of sliding window trees, produced a concordant and well-supported tree. Our studies support the non-monophyletic status of Tursiops and Stenella because the species referred these genera do not form exclusive monophyletic clades. This suggests that the current taxonomy of both genera might not reflect their evolutionary history and may underestimate their diversity. A four-taxon D-statistic (ABBA-BABA) test, five-taxon DFOIL test, and tree-based PhyloNet analyses all showed extensive gene flow across dolphin species, which could explain the instability in resolving phylogenetic relationship of oceanic dolphins with different and limited markers. This study could be a good case to demonstrate how genomic data can reveal complex speciation and phylogeny in rapidly radiating animal groups.


2019 ◽  
Author(s):  
Sayaka Miura ◽  
Koichiro Tamura ◽  
Sergei L. Kosakovsky Pond ◽  
Louise A. Huuki ◽  
Jessica Priest ◽  
...  

ABSTRACTPathogen timetrees are phylogenies scaled to time. They reveal the temporal history of a pathogen spread through the populations as captured in the evolutionary history of strains. These timetrees are inferred by using molecular sequences of pathogenic strains sampled at different times. That is, temporally sampled sequences enable the inference of sequence divergence times. Here, we present a new approach (RelTime with Dated Tips [RTDT]) to estimating pathogen timetrees based on the relative rate framework underlying the RelTime approach. RTDT does not require many of the priors demanded by Bayesian approaches, and it has light computing requirements. We found RTDT to be accurate on simulated datasets evolved under a variety of branch rates models. Interestingly, we found two non-Bayesian methods (RTDT and Least Squares Dating [LSD]) to perform similar to or better than the Bayesian approaches available in BEAST and MCMCTree programs. RTDT method was found to generally outperform all other methods for phylogenies in with autocorrelated evolutionary rates. In analyses of empirical datasets, RTDT produced dates that were similar to those from Bayesian analyses. Speed and accuracy of the new method, as compared to the alternatives, makes it appealing for analyzing growing datasets of pathogenic strains. Cross-platform MEGA X software, freely available from http://www.megasoftware.net, now contains the new method for use through a friendly graphical user interface and in high-throughput settings.AUTHOR SUMMARYPathogen timetrees trace the origins and evolutionary histories of strains in populations, hosts, and outbreaks. The tips of these molecular phylogenies often contain sampling time information because the sequences were generally obtained at different times during the disease outbreaks and propagation. We have developed a new method for inferring timetrees for phylogenies with tip dates, which improves on widely-used Bayesian methods (e.g., BEAST) in computational efficiency and does not require prior specification of population parameters, branch rate model, or clock model. We performed extensive computer simulation and found that RTDT performed better than the other methods for the estimation of divergence times at deep node in phylogenies where evolutionary rates were autocorrelated. The new method is available in the cross-platform MEGA software package that provides a graphical user interface, and allows use via a command line in scripting and high throughput analysis (www.megasoftware.net).


2018 ◽  
Vol 93 (3) ◽  
Author(s):  
Satoshi Kawato ◽  
Aiko Shitara ◽  
Yuanyuan Wang ◽  
Reiko Nozaki ◽  
Hidehiro Kondo ◽  
...  

ABSTRACT White spot syndrome virus (WSSV) is a crustacean-infecting, double-stranded DNA virus and is the most serious viral pathogen in the global shrimp industry. WSSV is the sole recognized member of the family Nimaviridae, and the lack of genomic data on other nimaviruses has obscured the evolutionary history of WSSV. Here, we investigated the evolutionary history of WSSV by characterizing WSSV relatives hidden in host genomic data. We surveyed 14 host crustacean genomes and identified five novel nimaviral genomes. Comparative genomic analysis of Nimaviridae identified 28 “core genes” that are ubiquitously conserved in Nimaviridae; unexpected conservation of 13 uncharacterized proteins highlighted yet-unknown essential functions underlying the nimavirus replication cycle. The ancestral Nimaviridae gene set contained five baculoviral per os infectivity factor homologs and a sulfhydryl oxidase homolog, suggesting a shared phylogenetic origin of Nimaviridae and insect-associated double-stranded DNA viruses. Moreover, we show that novel gene acquisition and subsequent amplification reinforced the unique accessory gene repertoire of WSSV. Expansion of unique envelope protein and nonstructural virulence-associated genes may have been the key genomic event that made WSSV such a deadly pathogen. IMPORTANCE WSSV is the deadliest viral pathogen threatening global shrimp aquaculture. The evolutionary history of WSSV has remained a mystery, because few WSSV relatives, or nimaviruses, had been reported. Our aim was to trace the history of WSSV using the genomes of novel nimaviruses hidden in host genome data. We demonstrate that WSSV emerged from a diverse family of crustacean-infecting large DNA viruses. By comparing the genomes of WSSV and its relatives, we show that WSSV possesses an expanded set of unique host-virus interaction-related genes. This extensive gene gain may have been the key genomic event that made WSSV such a deadly pathogen. Moreover, conservation of insect-infecting virus protein homologs suggests a common phylogenetic origin of crustacean-infecting Nimaviridae and other insect-infecting DNA viruses. Our work redefines the previously poorly characterized crustacean virus family and reveals the ancient genomic events that preordained the emergence of a devastating shrimp pathogen.


Sign in / Sign up

Export Citation Format

Share Document