scholarly journals Phylogenetic tree inference from local gene content

2015 ◽  
Author(s):  
Galina Glazko ◽  
Michael Gensheimer ◽  
Arcady Mushegian

Abstract Background: Complete genome sequences provide many new characters suitable for studying phylogenetic relationships. The limitations of the single sequence-based phylogenetic reconstruction prompted the efforts to build trees based on genome-wide properties, such as the fraction of shared orthologous genes or conservation of adjoining gene pairs. Gene content-based phylogenies, however, have their own biases: most notably, differential losses and horizontal transfers of genes interfere with phylogenetic signal, each in their own way, and special measures need to be taken to eliminate these types of noise. Results: We expand the repertoire of genome-wide traits available for phylogeny building, by developing a practical approach for measuring local gene conservation in two genomes. We counted the number of orthologous genes shared by chromosomal neighborhoods (“bins”), and built the phylogeny of 63 prokaryotic genomes on this basis. The tree correctly resolved all well-established clades, and also suggested the monophyly of firmicutes, which tend to be split in other genome-based trees. Conclusions: Our measure of local gene order conservation extracts strong phylogenetic signal. This new measure appears to be substantially resistant to the observed instances of gene loss and horizontal transfer, two evolutionary forces which can cause systematic biases in the genome-based phylogenies.

2013 ◽  
Vol 42 (4) ◽  
pp. 2391-2404 ◽  
Author(s):  
Anton Shifman ◽  
Noga Ninyo ◽  
Uri Gophna ◽  
Sagi Snir

Abstract The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non–tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation (‘synteny’) decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a ‘synteny index’ (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool ‘Phylo SI’. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at http://research.haifa.ac.il/ssagi/software/PhyloSI.zip.


2019 ◽  
Author(s):  
Daniel Fernández Marchán ◽  
Marta Novo ◽  
Nuria Sánchez ◽  
Jorge Domínguez ◽  
Darío J. Díaz Cosín ◽  
...  

AbstractUncovering the genetic and evolutionary basis of cryptic speciation is a major focus of evolutionary biology. Next Generation Sequencing (NGS) allows the identification of genome-wide local adaptation signatures, but has rarely been applied to cryptic complexes - particularly in the soil milieu - as is the case with integrative taxonomy. The earthworm genus Carpetania, comprising six previously suggested putative cryptic lineages, is a promising model to study the evolutionary phenomena shaping cryptic speciation in soil-dwelling lineages. Genotyping-By-Sequencing (GBS) was used to provide genome-wide information about genetic variability between seventeen populations, and geometric morphometrics analyses of genital chaetae were performed to investigate unexplored cryptic morphological evolution. Genomic analyses revealed the existence of three cryptic species, with half of the previously-identified potential cryptic lineages clustering within them. Local adaptation was detected in more than 800 genes putatively involved in a plethora of biological functions (most notably reproduction, metabolism, immunological response and morphogenesis). Several genes with selection signatures showed shared mutations for each of the cryptic species, and genes under selection were enriched in functions related to regulation of transcription, including SNPs located in UTR regions. Finally, geometric morphometrics approaches partially confirmed the phylogenetic signal of relevant morphological characters such as genital chaetae. Our study therefore unveils that local adaptation and regulatory divergence are key evolutionary forces orchestrating genome evolution in soil fauna.


Author(s):  
Robert Literman ◽  
Rachel Schwartz

Abstract Many evolutionary relationships remain controversial despite whole-genome sequencing data. These controversies arise in part due to challenges associated with accurately modeling the complex phylogenetic signal coming from genomic regions experiencing distinct evolutionary forces. Here we examine how different regions of the genome support or contradict well-established hypotheses among three mammal groups using millions of orthologous parsimony-informative biallelic sites [PIBS] distributed across primate, rodent, and Pecora genomes. We compared PIBS concordance percentages among locus types (e.g. coding sequences, introns, intergenic regions), and contrasted PIBS utility over evolutionary timescales. Sites derived from noncoding sequences provided more data and proportionally more concordant sites compared with those from coding sequences [CDS] in all clades. CDS PIBS were also predominant drivers of tree incongruence in two cases of topological conflict. PIBS derived from most locus types provided surprisingly consistent support for splitting events spread across the timescales we examined, although we find evidence that CDS and intronic PIBS may, respectively and to a limited degree, inform disproportionately about older and younger splits. In this era of accessible whole genome sequence data, these results (1) suggest benefits to more intentionally focusing on noncoding loci as robust data for tree inference, and (2) reinforce the importance of accurate modeling, especially when using CDS data.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Nathaniel D. Anderson ◽  
Yael Babichev ◽  
Fabio Fuligni ◽  
Federico Comitani ◽  
Mehdi Layeghifard ◽  
...  

AbstractLeiomyosarcomas (LMS) are genetically heterogeneous tumors differentiating along smooth muscle lines. Currently, LMS treatment is not informed by molecular subtyping and is associated with highly variable survival. While disease site continues to dictate clinical management, the contribution of genetic factors to LMS subtype, origins, and timing are unknown. Here we analyze 70 genomes and 130 transcriptomes of LMS, including multiple tumor regions and paired metastases. Molecular profiling highlight the very early origins of LMS. We uncover three specific subtypes of LMS that likely develop from distinct lineages of smooth muscle cells. Of these, dedifferentiated LMS with high immune infiltration and tumors primarily of gynecological origin harbor genomic dystrophin deletions and/or loss of dystrophin expression, acquire the highest burden of genomic mutation, and are associated with worse survival. Homologous recombination defects lead to genome-wide mutational signatures, and a corresponding sensitivity to PARP trappers and other DNA damage response inhibitors, suggesting a promising therapeutic strategy for LMS. Finally, by phylogenetic reconstruction, we present evidence that clones seeding lethal metastases arise decades prior to LMS diagnosis.


2015 ◽  
Author(s):  
Sanaa Afroz Ahmed ◽  
Chien-Chi Lo ◽  
Po-E Li ◽  
Karen W Davenport ◽  
Patrick S.G. Chain

Next-generation sequencing is increasingly being used to examine closely related organisms. However, while genome-wide single nucleotide polymorphisms (SNPs) provide an excellent resource for phylogenetic reconstruction, to date evolutionary analyses have been performed using different ad hoc methods that are not often widely applicable across different projects. To facilitate the construction of robust phylogenies, we have developed a method for genome-wide identification/characterization of SNPs from sequencing reads and genome assemblies. Our phylogenetic and molecular evolutionary (PhaME) analysis software is unique in its ability to take reads and draft/complete genome(s) as input, derive core genome alignments, identify SNPs, construct phylogenies and perform evolutionary analyses. Several examples using genomes and read datasets for bacterial, eukaryotic and viral linages demonstrate the broad and robust functionality of PhaME. Furthermore, the ability to incorporate raw metagenomic reads from clinical samples with suspected infectious agents shows promise for the rapid phylogenetic characterization of pathogens within complex samples.


2020 ◽  
Vol 48 (5) ◽  
pp. 2544-2563 ◽  
Author(s):  
Pilar Menendez-Gil ◽  
Carlos J Caballero ◽  
Arancha Catalan-Moreno ◽  
Naiara Irurzun ◽  
Inigo Barrio-Hernandez ◽  
...  

Abstract The evolution of gene expression regulation has contributed to species differentiation. The 3′ untranslated regions (3′UTRs) of mRNAs include regulatory elements that modulate gene expression; however, our knowledge of their implications in the divergence of bacterial species is currently limited. In this study, we performed genome-wide comparative analyses of mRNAs encoding orthologous proteins from the genus Staphylococcus and found that mRNA conservation was lost mostly downstream of the coding sequence (CDS), indicating the presence of high sequence diversity in the 3′UTRs of orthologous genes. Transcriptomic mapping of different staphylococcal species confirmed that 3′UTRs were also variable in length. We constructed chimeric mRNAs carrying the 3′UTR of orthologous genes and demonstrated that 3′UTR sequence variations affect protein production. This suggested that species-specific functional 3′UTRs might be specifically selected during evolution. 3′UTR variations may occur through different processes, including gene rearrangements, local nucleotide changes, and the transposition of insertion sequences. By extending the conservation analyses to specific 3′UTRs, as well as the entire set of Escherichia coli and Bacillus subtilis mRNAs, we showed that 3′UTR variability is widespread in bacteria. In summary, our work unveils an evolutionary bias within 3′UTRs that results in species-specific non-coding sequences that may contribute to bacterial diversity.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Shulin Zhang ◽  
Zailong Tian ◽  
Haipeng Li ◽  
Yutao Guo ◽  
Yanqi Zhang ◽  
...  

Abstract Background F-box proteins are substrate-recognition components of the Skp1-Rbx1-Cul1-F-box protein (SCF) ubiquitin ligases. By selectively targeting the key regulatory proteins or enzymes for ubiquitination and 26S proteasome mediated degradation, F-box proteins play diverse roles in plant growth/development and in the responses of plants to both environmental and endogenous signals. Studies of F-box proteins from the model plant Arabidopsis and from many additional plant species have demonstrated that they belong to a super gene family, and function across almost all aspects of the plant life cycle. However, systematic exploration of F-box family genes in the important fiber crop cotton (Gossypium hirsutum) has not been previously performed. The genome-wide analysis of the cotton F-box gene family is now possible thanks to the completion of several cotton genome sequencing projects. Results In current study, we first conducted a genome-wide investigation of cotton F-box family genes by reference to the published F-box protein sequences from other plant species. 592 F-box protein encoding genes were identified in the Gossypium hirsutume acc.TM-1 genome and, subsequently, we were able to present their gene structures, chromosomal locations, syntenic relationships with their parent species. In addition, duplication modes analysis showed that cotton F-box genes were distributed to 26 chromosomes, with the maximum number of genes being detected on chromosome 5. Although the WGD (whole-genome duplication) mode seems play a dominant role during cotton F-box gene expansion process, other duplication modes including TD (tandem duplication), PD (proximal duplication), and TRD (transposed duplication) also contribute significantly to the evolutionary expansion of cotton F-box genes. Collectively, these bioinformatic analysis suggest possible evolutionary forces underlying F-box gene diversification. Additionally, we also conducted analyses of gene ontology, and expression profiles in silico, allowing identification of F-box gene members potentially involved in hormone signal transduction. Conclusion The results of this study provide first insights into the Gossypium hirsutum F-box gene family, which lays the foundation for future studies of functionality, particularly those involving F-box protein family members that play a role in hormone signal transduction.


2015 ◽  
Vol 2 (9) ◽  
pp. 150156 ◽  
Author(s):  
Georgia Tsagkogeorga ◽  
Michael R. McGowen ◽  
Kalina T. J. Davies ◽  
Simon Jarman ◽  
Andrea Polanowski ◽  
...  

Recent studies have reported multiple cases of molecular adaptation in cetaceans related to their aquatic abilities. However, none of these has included the hippopotamus, precluding an understanding of whether molecular adaptations in cetaceans occurred before or after they split from their semi-aquatic sister taxa. Here, we obtained new transcriptomes from the hippopotamus and humpback whale, and analysed these together with available data from eight other cetaceans. We identified more than 11 000 orthologous genes and compiled a genome-wide dataset of 6845 coding DNA sequences among 23 mammals, to our knowledge the largest phylogenomic dataset to date for cetaceans. We found positive selection in nine genes on the branch leading to the common ancestor of hippopotamus and whales, and 461 genes in cetaceans compared to 64 in hippopotamus. Functional annotation revealed adaptations in diverse processes, including lipid metabolism, hypoxia, muscle and brain function. By combining these findings with data on protein–protein interactions, we found evidence suggesting clustering among gene products relating to nervous and muscular systems in cetaceans. We found little support for shared ancestral adaptations in the two taxa; most molecular adaptations in extant cetaceans occurred after their split with hippopotamids.


Sign in / Sign up

Export Citation Format

Share Document