scholarly journals Evolinc: a comparative transcriptomics and genomics pipeline for quickly identifying sequence conserved lincRNAs for functional analysis

2017 ◽  
Author(s):  
Andrew D. L. Nelson ◽  
Upendra K. Devisetty ◽  
Kyle Palos ◽  
Asher K. Haug-Baltzell ◽  
Eric Lyons ◽  
...  

AbstractLong intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on lincRNA emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analyses workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Evolinc-II builds families of homologous lincRNA loci, aligns constituent sequences, infers gene trees, and then uses gene tree / species tree reconciliation to reconstruct evolutionary processes such as gain, loss, or duplication of the locus. Here we demonstrate that Evolinc-I is agnostic to target organism by validating against previously annotated Arabidopsis and human lincRNA data. Using Evolinc-II, we examine ways in which conservation can rapidly be used to winnow down large lincRNA datasets to a small set of candidates for functional analysis. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). The analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse’s Discovery Environment and is free to use by researchers.

2017 ◽  
Vol 66 (6) ◽  
pp. 1007-1018 ◽  
Author(s):  
Gregg W C Thomas ◽  
S Hussain Ather ◽  
Matthew W Hahn

Abstract Polyploidy can have a huge impact on the evolution of species, and it is a common occurrence, especially in plants. The two types of polyploids—autopolyploids and allopolyploids—differ in the level of divergence between the genes that are brought together in the new polyploid lineage. Because allopolyploids are formed via hybridization, the homoeologous copies of genes within them are at least as divergent as orthologs in the parental species that came together to form them. This means that common methods for estimating the parental lineages of allopolyploidy events are not accurate, and can lead to incorrect inferences about the number of gene duplications and losses. Here, we have adapted an algorithm for topology-based gene-tree reconciliation to work with multi-labeled trees (MUL-trees). By definition, MUL-trees have some tips with identical labels, which makes them a natural representation of the genomes of polyploids. Using this new reconciliation algorithm we can: accurately place allopolyploidy events on a phylogeny, identify the parental lineages that hybridized to form allopolyploids, distinguish between allo-, auto-, and (in most cases) no polyploidy, and correctly count the number of duplications and losses in a set of gene trees. We validate our method using gene trees simulated with and without polyploidy, and revisit the history of polyploidy in data from the clades including both baker’s yeast and bread wheat. Our re-analysis of the yeast data confirms the allopolyploid origin and parental lineages previously identified for this group. The method presented here should find wide use in the growing number of genomes from species with a history of polyploidy. [Polyploidy; reconciliation; whole-genome duplication.]


2016 ◽  
Author(s):  
Gregg W.C. Thomas ◽  
S. Hussain Ather ◽  
Matthew W. Hahn

AbstractPolyploidy can have a huge impact on the evolution of species, and it is a common occurrence, especially in plants. The two types of polyploids - autopolyploids and allopolyploids - differ in the level of divergence between the genes that are brought together in the new polyploid lineage. Because allopolyploids are formed via hybridization, the homoeologous copies of genes within them are at least as divergent as orthologs in the parental species that came together to form them. This means that common methods for estimating the parental lineages of allopolyploidy events are not accurate, and can lead to incorrect inferences about the number of gene duplications and losses. Here, we have adapted an algorithm for topology-based gene-tree reconciliation to work with multi-labeled trees (MUL-trees). By definition, MUL-trees have some tips with identical labels, which makes them a natural representation of the genomes of polyploids. Using this new reconciliation algorithm we can: accurately place allopolyploidy events on a phylogeny, identify the parental lineages that hybridized to form allopolyploids, distinguish between allo-, auto-, and (in most cases) no polyploidy, and correctly count the number of duplications and losses in a set of gene trees. We validate our method using gene trees simulated with and without polyploidy, and revisit the history of polyploidy in data from the clades including both baker’s yeast and bread wheat. Our re-analysis of the yeast data confirms the allopolyploid origin and parental lineages previously identified for this group. The method presented here should find wide use in the growing number of genomes from species with a history of polyploidy.


2020 ◽  
Author(s):  
Michael J. Sanderson ◽  
Michelle M. McMahon ◽  
Mike Steel

AbstractTerraces in phylogenetic tree space are sets of trees with identical optimality scores for a given data set, arising from missing data. These were first described for multilocus phylogenetic data sets in the context of maximum parsimony inference and maximum likelihood inference under certain model assumptions. Here we show how the mathematical properties that lead to terraces extend to gene tree - species tree problems in which the gene trees are incomplete. Inference of species trees from either sets of gene family trees subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can exhibit terraces in their solution space. First, we show conditions that lead to a new kind of terrace, which stems from subtree operations that appear in reconciliation problems for incomplete trees. Then we characterize when terraces of both types can occur when the optimality criterion for tree search is based on duplication, loss or deep coalescence scores. Finally, we examine the impact of assumptions about the causes of losses: whether they are due to imperfect sampling or true evolutionary deletion.


2016 ◽  
Author(s):  
Anil S. Thanki ◽  
Nicola Soranzo ◽  
Javier Herrero ◽  
Wilfried Haerty ◽  
Robert P. Davey

AbstractBackgroundPhylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identification of syntenic blocks, which can then be visualised with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes.FindingsWe present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus, available within the Galaxy web platform as a visualisation plugin, which can be used to visualise gene trees generated by the GeneSeqToFamily workflow.AvailabilityAequatus is an open-source tool freely available to download under the MIT license at https://github.com/TGAC/Aequatus. A demo server is available at http://aequatus.earlham.ac.uk/. A publicly available instance of the GeneSeqToFamily workflow to generate gene tree information and visualise it using Aequatus is available on the Galaxy EU server at https://[email protected] and [email protected]


Author(s):  
Dan Vanderpool ◽  
Bui Quang Minh ◽  
Robert Lanfear ◽  
Daniel Hughes ◽  
Shwetha Murali ◽  
...  

AbstractOur understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here we present new reference genome assemblies for three Old World Monkey species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (2) ◽  
pp. e1009095
Author(s):  
Thomas C. Nelson ◽  
Angela M. Stathos ◽  
Daniel D. Vanderpool ◽  
Findley R. Finseth ◽  
Yao-wu Yuan ◽  
...  

Inferences about past processes of adaptation and speciation require a gene-scale and genome-wide understanding of the evolutionary history of diverging taxa. In this study, we use genome-wide capture of nuclear gene sequences, plus skimming of organellar sequences, to investigate the phylogenomics of monkeyflowers in Mimulus section Erythranthe (27 accessions from seven species). Taxa within Erythranthe, particularly the parapatric and putatively sister species M. lewisii (bee-pollinated) and M. cardinalis (hummingbird-pollinated), have been a model system for investigating the ecological genetics of speciation and adaptation for over five decades. Across >8000 nuclear loci, multiple methods resolve a predominant species tree in which M. cardinalis groups with other hummingbird-pollinated taxa (37% of gene trees), rather than being sister to M. lewisii (32% of gene trees). We independently corroborate a single evolution of hummingbird pollination syndrome in Erythranthe by demonstrating functional redundancy in genetic complementation tests of floral traits in hybrids; together, these analyses overturn a textbook case of pollination-syndrome convergence. Strong asymmetries in allele-sharing (Patterson’s D-statistic and related tests) indicate that gene-tree discordance reflects ancient and recent introgression rather than incomplete lineage sorting. Consistent with abundant introgression blurring the history of divergence, low-recombination and adaptation-associated regions support the new species tree, while high-recombination regions generate phylogenetic evidence for sister status for M. lewisii and M. cardinalis. Population-level sampling of core taxa also revealed two instances of chloroplast capture, with Sierran M. lewisii and Southern Californian M. parishii each carrying organelle genomes nested within respective sympatric M. cardinalis clades. A recent organellar transfer from M. cardinalis, an outcrosser where selfish cytonuclear dynamics are more likely, may account for the unexpected cytoplasmic male sterility effects of selfer M. parishii organelles in hybrids with M. lewisii. Overall, our phylogenomic results reveal extensive reticulation throughout the evolutionary history of a classic monkeyflower radiation, suggesting that natural selection (re-)assembles and maintains species-diagnostic traits and barriers in the face of gene flow. Our findings further underline the challenges, even in reproductively isolated species, in distinguishing re-use of adaptive alleles from true convergence and emphasize the value of a phylogenomic framework for reconstructing the evolutionary genetics of adaptation and speciation.


PLoS Biology ◽  
2020 ◽  
Vol 18 (12) ◽  
pp. e3000954
Author(s):  
Dan Vanderpool ◽  
Bui Quang Minh ◽  
Robert Lanfear ◽  
Daniel Hughes ◽  
Shwetha Murali ◽  
...  

Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here, we present new reference genome assemblies for 3 Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Anne J. Villacastin ◽  
Keeley S. Adams ◽  
Rin Boonjue ◽  
Paul J. Rushton ◽  
Mira Han ◽  
...  

AbstractWRKY transcription factors play key roles in stress responses, growth, and development. We previously reported on the evolution of WRKYs from unicellular green algae to land plants. To address recent evolution events, we studied three domesticated and eight wild species in the genus Oryza, an ideal model due to its long history of domestication, economic importance, and central role as a model system. We have identified prevalence of Group III WRKYs despite differences in breeding of cultivated and wild species. Same groups of WRKY genes tend to cluster together, suggesting recent, multiple duplication events. Duplications followed by divergence may result in neofunctionalizations of co-expressed WRKY genes that finely tune the regulation of target genes in a same metabolic or response pathway. WRKY genes have undergone recent rearrangements to form novel genes. Group Ib WRKYs, unique to AA genome type Oryza species, are derived from Group III genes dated back to 6.76 million years ago. Gene tree reconciliation analysis with the species tree revealed details of duplication and loss events in the 11 genomes. Selection analysis on single copy orthologs reveals the highly conserved nature of the WRKY domain and clusters of fast evolving sites under strong positive selection pressure. Also, the numbers of single copy orthologs under positive or negative selection almost evenly split. Our results provide valuable insights into the preservation and diversification of an important gene family under strong selective pressure for biotechnological improvements of the world’s most valued food crop.


2004 ◽  
Vol 72 (9) ◽  
pp. 5253-5261 ◽  
Author(s):  
Kwan Soo Ko ◽  
Jong-Wan Kim ◽  
Jong-Man Kim ◽  
Wonyong Kim ◽  
Sang-in Chung ◽  
...  

ABSTRACT The population structure of the Bacillus cereus group (52 strains of B. anthracis, B. cereus, and B. thuringiensis) was investigated by sequencing seven gene fragments (rpoB, gyrB, pycA, mdh, mbl, mutS, and plcR). Most of the strains were classifiable into two large subgroups in six housekeeping gene trees but not in the plcR tree. In addition, several consistent clusters were identified, which were unrelated to species distinction. Moreover, interrelationships among these clusters were incongruent in each gene tree. The incongruence length difference test and split decomposition analyses also showed incongruences between genes, suggesting horizontal gene transfer. The plcR gene was observed to have characteristics that differed from those of the other genes in terms of phylogenetic topology and pattern of sequence diversity. Thus, we suggest that the evolutionary history of the PlcR regulon differs from those of the other chromosomal genes and that recombination of the plcR gene may be frequent. The homogeneity of B. anthracis, which is depicted as an independent lineage in phylogenetic trees, is suggested to be of recent origin or to be due to the narrow taxonomic definition of species.


2020 ◽  
Author(s):  
Manuel Lafond ◽  
Marc Hellmuth

Abstract Background: The history of gene families -- which are equivalent to event-labeled gene trees -- can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are "biologically feasible" which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way.Results: In this contribution, we consider event-labeled gene trees that contain speciations, duplications as well as horizontal gene transfer (HGT) and we assume that the species tree is unknown. Although many problems become NP-hard as soon as HGT and time-consistency are involved, we show, in contrast, that the problem of finding a time-consistent species tree for a given event-labeled gene can be solved in polynomial-time. We provide a cubic-time algorithm to decide whether a "time-consistent" species for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity.


Sign in / Sign up

Export Citation Format

Share Document