Aequatus: An open-source homology browser

Mapping Intimacies ◽

10.1101/055632 ◽

2016 ◽

Cited By ~ 1

Author(s):

Anil S. Thanki ◽

Nicola Soranzo ◽

Javier Herrero ◽

Wilfried Haerty ◽

Robert P. Davey

Keyword(s):

Open Source ◽

Structural Changes ◽

Gene Tree ◽

Purifying Selection ◽

Gene Families ◽

Gene Trees ◽

Ancestral Gene ◽

Link Type ◽

The Galaxy ◽

Duplication Events

AbstractBackgroundPhylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identification of syntenic blocks, which can then be visualised with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes.FindingsWe present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus, available within the Galaxy web platform as a visualisation plugin, which can be used to visualise gene trees generated by the GeneSeqToFamily workflow.AvailabilityAequatus is an open-source tool freely available to download under the MIT license at https://github.com/TGAC/Aequatus. A demo server is available at http://aequatus.earlham.ac.uk/. A publicly available instance of the GeneSeqToFamily workflow to generate gene tree information and visualise it using Aequatus is available on the Galaxy EU server at https://[email protected] and [email protected]

Download Full-text

GeneSeqToFamily: the Ensembl Compara GeneTrees pipeline as a Galaxy workflow

10.1101/096529 ◽

2016 ◽

Author(s):

Anil S. Thanki ◽

Nicola Soranzo ◽

Wilfried Haerty ◽

Robert P. Davey

Keyword(s):

Gene Duplication ◽

Structural Changes ◽

Gene Families ◽

Vital Role ◽

Software Project ◽

Command Line ◽

Gene Trees ◽

Ancestral Gene ◽

The Galaxy ◽

Duplication Events

AbstractBackgroundGene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL and HomoloGene, to identify gene families and visualise syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences and provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families.FindingsA certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we have converted the command line Ensembl Compara GeneTrees pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow.ConclusionsGeneSeqToFamily represents the Ensembl Compara pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy’s user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualise the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.

Download Full-text

SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution

Bioinformatics ◽

10.1093/bioinformatics/btz081 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3496-3498 ◽

Cited By ~ 3

Author(s):

Soumya Kundu ◽

Mukul S Bansal

Keyword(s):

Open Source ◽

Gene Tree ◽

Protein Domain ◽

Supplementary Information ◽

Death Process ◽

Gene Trees ◽

Species Trees ◽

Simulation Framework ◽

Probabilistic Sampling ◽

Domain Level

Abstract Summary SaGePhy is a software package for improved phylogenetic simulation of gene and subgene evolution. SaGePhy can be used to generate species trees, gene trees and subgene or (protein) domain trees using a probabilistic birth–death process that allows for gene and subgene duplication, horizontal gene and subgene transfer and gene and subgene loss. SaGePhy implements a range of important features not found in other phylogenetic simulation frameworks/software. These include (i) simulation of subgene or domain level evolution inside one or more gene trees, (ii) simultaneous simulation of both additive and replacing horizontal gene/subgene transfers and (iii) probabilistic sampling of species tree and gene tree nodes, respectively, for gene- and domain-family birth. SaGePhy is open-source, platform independent and written in Java and Python. Availability and implementation Executables, source code (open-source under the revised BSD license) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/sagephy/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

EXACT SOLUTIONS FOR SPECIES TREE INFERENCE FROM DISCORDANT GENE TREES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013420055 ◽

2013 ◽

Vol 11 (05) ◽

pp. 1342005 ◽

Cited By ~ 16

Author(s):

WEN-CHIEH CHANG ◽

PAWEŁ GÓRECKI ◽

OLIVER EULENSTEIN

Keyword(s):

Exact Solutions ◽

Gene Tree ◽

Simulated Data ◽

Gene Families ◽

Species Tree ◽

Data Sets ◽

Gene Trees ◽

Species Trees ◽

Worst Case

Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.

Download Full-text

Population genomics of two closely related anhydrobiotic midges reveals differences in adaptation to extreme desiccation

10.1101/2020.08.19.255828 ◽

2020 ◽

Author(s):

N.M. Shaykhutdinov ◽

G.V. Klink ◽

S.K. Garushyants ◽

O.S. Kozlova ◽

A.V. Cherkasov ◽

...

Keyword(s):

Population Genomics ◽

Expression Profiles ◽

Gene Families ◽

Genomic Region ◽

Ancestral Gene ◽

New Genes ◽

Protective Genes ◽

Duplication Events ◽

Polypedilum Vanderplanki ◽

The Sleeping Chironomid

AbstractThe sleeping chironomid Polypedilum vanderplanki is capable of anhydrobiosis, a striking example of adaptation to extreme desiccation. Tolerance to complete desiccation in this species is associated with the emergence of multiple paralogs of protective genes. One of the gene families highly expressed under anhydrobiosis and involved in this process are protein-L-isoaspartate (D-aspartate) O-methyltransferases (PIMTs). Recently, a closely related anhydrobiotic midge from Malawi, P. pembai, showing the ability to tolerate complete desiccation similar to that of P. vanderplanki, but experiences more frequent desiccation-rehydration cycles due to differences in ecology, was discovered. Here, we sequenced and assembled the genome of P. pembai and performed a population genomics analysis of several populations of P. vanderplanki and a population of P. pembai. We observe positive selection and radical changes in the genetic architecture of the PIMT locus between the two species, including multiple duplication events in the P. pembai lineage. In particular, PIMT-4, the most highly expressed of these PIMTs, is present in six copies in the P. pembai; these copies differ in expression profiles, suggesting possible sub- or neofunctionalization. The nucleotide diversity (π) of the genomic region carrying these new genes is decreased in P. pembai, but not in the orthologous region carrying the ancestral gene in P. vanderplanki, providing evidence for a selective sweep associated with post-duplication adaptation in the former. Overall, our results suggest an extensive recent and likely ongoing, adaptation of the mechanisms of anhydrobiosis.

Download Full-text

Species Tree Estimation from Genome-wide Data with Guenomu

10.1101/023861 ◽

2015 ◽

Author(s):

Leonardo de Oliveira Martins ◽

David Posada

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Gene Families ◽

Species Tree ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

Multiple Sources ◽

Reconstruction Methods ◽

Tree Topologies

The history of particular genes and that of the species that carry them can be different due to different reasons. In particular, gene trees and species trees can truly differ due to well-known evolutionary processes like gene duplication and loss, lateral gene transfer or incomplete lineage sorting. Different species tree reconstruction methods have been developed to take this incongruence into account, which can be divided grossly into supertree and supermatrix approaches. Here, we introduce a new Bayesian hierarchical model that we have recently developed and implemented in the program Guenomu, that considers multiple sources of gene tree/species tree disagreement. Guenomu takes as input the posterior distributions of unrooted gene tree topologies for multiple gene families, in order to estimate the posterior distribution of rooted species tree topologies.

Download Full-text

Genome-Wide Identification, Evolution, and Expression Analysis of TPS and TPP Gene Families in Brachypodium distachyon

Plants ◽

10.3390/plants8100362 ◽

2019 ◽

Vol 8 (10) ◽

pp. 362 ◽

Cited By ~ 3

Author(s):

Song Wang ◽

Kai Ouyang ◽

Kai Wang

Keyword(s):

Stress Responses ◽

Brachypodium Distachyon ◽

Interaction Network ◽

Purifying Selection ◽

Gene Interaction ◽

Gene Families ◽

Gene Interaction Network ◽

Genome Wide ◽

Duplication Events ◽

Gene Structures

Trehalose biosynthesis enzyme homologues in plants contain two families, trehalose-6-phosphate synthases (TPSs) and trehalose-6-phosphate phosphatases (TPPs). Both families participate in trehalose synthesis and a variety of stress-resistance processes. Here, nine BdTPS and ten BdTPP genes were identified based on the Brachypodium distachyon genome, and all genes were classified into three classes. The Class I and Class II members differed substantially in gene structures, conserved motifs, and protein sequence identities, implying varied gene functions. Gene duplication analysis showed that one BdTPS gene pair and four BdTPP gene pairs are formed by duplication events. The value of Ka/Ks (non-synonymous/synonymous) was less than 1, suggesting purifying selection in these gene families. The cis-elements and gene interaction network prediction showed that many family members may be involved in stress responses. The quantitative real-time reverse transcription (qRT-PCR) results further supported that most BdTPSs responded to at least one stress or abscisic acid (ABA) treatment, whereas over half of BdTPPs were downregulated after stress treatment, implying that BdTPSs play a more important role in stress responses than BdTPPs. This work provides a foundation for the genome-wide identification of the B. distachyon TPS–TPP gene families and a frame for further studies of these gene families in abiotic stress responses.

Download Full-text

SimPhy: Phylogenomic Simulation of Gene, Locus and Species Trees

10.1101/021709 ◽

2015 ◽

Cited By ~ 2

Author(s):

Diego Mallo ◽

Leonardo de Oliveira Martins ◽

David Posada

Keyword(s):

Incomplete Lineage Sorting ◽

A Priori ◽

Gene Tree ◽

Gene Families ◽

Rate Variation ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

Sequence Alignments ◽

Large Trees

We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to the species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, pre-compiled executables, a detailed manual and example cases.

Download Full-text

Identification and Evolution Analysis of the JAZ Gene Family in Maize

10.21203/rs.3.rs-144271/v1 ◽

2021 ◽

Author(s):

Yang Han ◽

Dawn Luthe

Keyword(s):

Gene Expression ◽

Gene Family ◽

Phylogenetic Analyses ◽

Purifying Selection ◽

Gene Families ◽

Defense Responses ◽

Maize Genome ◽

Synteny Analysis ◽

Genetic Redundancy ◽

Duplication Events

Abstract Background: Jasmonates (JAs) are important for plants to coordinate growth, reproduction, and defense responses. In JA signaling, jasmonate ZIM-domain (JAZ) proteins serve as master regulators at the initial stage of herbivores attacks. Although discovered in many plant species, little in-depth characterization of JAZ gene expression has been reported in the agronomically important crop, maize (Zea mays L.). Results: In this study 16 JAZ genes from the maize genome were identified and classified. Phylogenetic analyses were performed from maize, rice, sorghum, Brachypodium, and Arabidopsis using deduced protein sequences, total six clades were proposed and conservation was observed in each group, such as similar gene exon/intron structures. Synteny analysis across four monocots indicated these JAZ gene families had a common ancestor, and duplication events in maize genome may drive the expansion of JAZ gene family, including genome-wide duplication (GWD), transposon, and/or tandem duplication. Strong purifying selection acted on all JAZ genes except those in group 4, which were under neutral selection. Further, we cloned three paralogous JAZ gene pairs from two maize inbreds differing in JA levels and insect resistance, and gene polymorphisms were observed between two inbreds.Conclusions: Here we analyzed the composition and evolution of JAZ genes in maize with three other monocot plants. Extensive phylogenetic and synteny analysis revealed the expansion and selection fate of maize JAZ. This is the first study comparing the difference between two inbreds, and we propose genotype-specific JAZ gene expression might be present in maize plants. Since genetic redundancy in JAZ gene family hampers our understanding of their role in response to specific elicitors, we hope this research could be pertinent to elucidating the defensive responses in plants.

Download Full-text

Evolinc: a comparative transcriptomics and genomics pipeline for quickly identifying sequence conserved lincRNAs for functional analysis

10.1101/110148 ◽

2017 ◽

Author(s):

Andrew D. L. Nelson ◽

Upendra K. Devisetty ◽

Kyle Palos ◽

Asher K. Haug-Baltzell ◽

Eric Lyons ◽

...

Keyword(s):

Functional Analysis ◽

Differential Expression Analysis ◽

Gene Tree ◽

Gene Trees ◽

Tree Reconciliation ◽

Origin And Evolution ◽

Duplication Events ◽

History Of ◽

Human Telomerase ◽

Gain Loss

AbstractLong intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on lincRNA emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analyses workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Evolinc-II builds families of homologous lincRNA loci, aligns constituent sequences, infers gene trees, and then uses gene tree / species tree reconciliation to reconstruct evolutionary processes such as gain, loss, or duplication of the locus. Here we demonstrate that Evolinc-I is agnostic to target organism by validating against previously annotated Arabidopsis and human lincRNA data. Using Evolinc-II, we examine ways in which conservation can rapidly be used to winnow down large lincRNA datasets to a small set of candidates for functional analysis. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). The analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse’s Discovery Environment and is free to use by researchers.

Download Full-text

Asymmetric Distribution of Gene Trees Can Arise under Purifying Selection If Differences in Population Size Exist

Molecular Biology and Evolution ◽

10.1093/molbev/msz232 ◽

2019 ◽

Vol 37 (3) ◽

pp. 881-892 ◽

Cited By ~ 5

Author(s):

Chong He ◽

Dan Liang ◽

Peng Zhang

Keyword(s):

Population Size ◽

Incomplete Lineage Sorting ◽

Random Mating ◽

Gene Tree ◽

Purifying Selection ◽

Species Tree ◽

Asymmetric Distribution ◽

Gene Trees ◽

Tree Distribution ◽

The Impact

Abstract Incomplete lineage sorting (ILS) is an important factor that causes gene tree discordance. For gene trees of three species, under neutrality, random mating, and the absence of interspecific gene flow, ILS creates a symmetric distribution of gene trees: the gene tree that accords with the species tree has the highest frequency, and the two discordant trees are equally frequent. If the neutral condition is violated, the impact of ILS may change, altering the gene tree distribution. Here, we show that under purifying selection, even assuming that the fitness effect of mutations is constant throughout the species tree, if differences in population size exist among species, asymmetric distributions of gene trees will arise, which is different from the expectation under neutrality. In extremes, one of the discordant trees rather than the concordant tree becomes the most frequent gene tree. In addition, we found that in a real case, the position of Scandentia relative to Primate and Glires, the symmetry in the gene tree distribution can be influenced by the strength of purifying selection. In current phylogenetic inference, the impact of purifying selection on the gene tree distribution is rarely considered by researchers. This study highlights the necessity of considering this impact.

Download Full-text