scholarly journals Inferring synteny between genome assemblies: a systematic evaluation

2017 ◽  
Author(s):  
Dang Liu ◽  
Martin Hunt ◽  
Isheng. J. Tsai

AbstractIdentification of synteny between genomes of closely related species is an important aspect of comparative genomics. However, it is unknown to what extent draft assemblies lead to errors in such analysis. To investigate this, we fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests throws into question the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, determining the true evolutionary relationship is compromised by assembly improvement using a reference guided approach with a closely related species. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. Our results highlight the need for gold standard genome assemblies for synteny identification and accurate downstream analysis.Author summaryGenome assemblies across all domains of life are currently produced routinely. Initial analysis of any new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. Here, we systematically evaluate this interplay by inferring synteny in genome assemblies with different degrees of contiguation. As expected, our investigation reveals that assembly quality can drastically affect synteny analysis, from the initial synteny identification to downstream analysis. Importantly, we found that improving a fragmented assembly using synteny with the genome of a related species can be dangerous, as this a priori assumes a potentially false evolutionary relationship between the species. The results presented here re-emphasize the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.

mSystems ◽  
2017 ◽  
Vol 2 (4) ◽  
Author(s):  
Sander Wuyts ◽  
Stijn Wittouck ◽  
Ilke De Boeck ◽  
Camille N. Allonsius ◽  
Edoardo Pasolli ◽  
...  

ABSTRACT The closely related species of the Lactobacillus casei group are extensively studied because of their applications in food fermentations and as probiotics. Our results show that many strains in this group are incorrectly classified and that reclassifying them to their most closely related species type strain improves the functional predictive power of their taxonomy. In addition, our findings may spark increased interest in the L. casei species. We find that after reclassification, only 10 genomes remain classified as L. casei. These strains show some interesting properties. First, they all appear to be catalase positive. This suggests that they have increased oxidative stress resistance. Second, we isolated an L. casei strain from the human upper respiratory tract and discovered that it and multiple other L. casei strains harbor one or even two large, glycosylated putative surface adhesins. This might inspire further exploration of this species as a potential probiotic organism. Although the genotypic and phenotypic properties of the Lactobacillus casei group have been studied extensively, the taxonomic structure has been the subject of debate for a long time. Here, we performed a large-scale comparative analysis by using 183 publicly available genomes supplemented with a Lactobacillus strain isolated from the human upper respiratory tract. On the basis of this analysis, we identified inconsistencies in the taxonomy and reclassified all of the genomes according to their most closely related type strains. This led to the identification of a catalase-encoding gene in all 10 L. casei sensu stricto strains, making it the first described catalase-positive species in the Lactobacillus genus. Moreover, we found that 6 of 10 L. casei genomes contained a SecA2/SecY2 gene cluster with two putative glycosylated surface adhesin proteins. Altogether, our results highlight current inconsistencies in the taxonomy of the L. casei group and reveal new clade-associated functional features. IMPORTANCE The closely related species of the Lactobacillus casei group are extensively studied because of their applications in food fermentations and as probiotics. Our results show that many strains in this group are incorrectly classified and that reclassifying them to their most closely related species type strain improves the functional predictive power of their taxonomy. In addition, our findings may spark increased interest in the L. casei species. We find that after reclassification, only 10 genomes remain classified as L. casei. These strains show some interesting properties. First, they all appear to be catalase positive. This suggests that they have increased oxidative stress resistance. Second, we isolated an L. casei strain from the human upper respiratory tract and discovered that it and multiple other L. casei strains harbor one or even two large, glycosylated putative surface adhesins. This might inspire further exploration of this species as a potential probiotic organism.


Zootaxa ◽  
2021 ◽  
Vol 5051 (1) ◽  
pp. 443-486
Author(s):  
ANNABEL MATHISKE ◽  
DAVID THISTLE ◽  
HENDRIK GHEERARDYN ◽  
GRITTA VEIT-KÖHLER

The large-scale dispersal of deep-sea harpacticoid copepods is an increasing focus for ecological studies. A fundamental prerequisite for monitoring and explaining their geographical distribution is precise descriptions of their morphology. Four new, closely related species of the family Paramesochridae (Copepoda, Harpacticoida) were found in the deep sea of the Pacific (San Diego Trough and off Chile), the Atlantic Ocean (Porcupine Abyssal Plain and Angola Basin), and the Atlantic and Indian Ocean sectors of the Southern Ocean (Weddell Sea and off Crozet Island). The discovery of Emertonia berndi sp. nov., E. hessleri sp. nov., E. ilse sp. nov., and E. serrata sp. nov. increases the number of known deep-sea species in this genus to ten. The new species are placed in Emertonia Wilson, 1932 because of their one-segmented endopods on the second and third swimming legs. The presence of a two-segmented endopod on the fourth swimming leg allocates them to the “andeep-group” within this genus. The four species can be distinguished from their congeners by the strongly serrated spines on the exopods of their swimming legs and an outwardly directed flexible seta on the exopod of the fifth leg. It is conveivable that these two specific characters evolved only once in the genus Emertonia. Their apparently cosmopolitan distribution covers thousands of kilometres and spans all major oceans. This biogeographical pattern may be explained by resuspension events followed by passive transport by benthic currents. Discrepancies in their dispersal ranges may be a result of changing geological and oceanographic boundaries.  


Author(s):  
Qiaolian Yi ◽  
Meng Xiao ◽  
Xin Fan ◽  
Ge Zhang ◽  
Yang Yang ◽  
...  

Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) has been accepted as a rapid, accurate, and less labor-intensive method in the identification of microorganisms in clinical laboratories. However, there is limited data on systematic evaluation of its effectiveness in the identification of phylogenetically closely-related yeast species. In this study, we evaluated two commercially available MALDI-TOF systems, Autof MS 1000 and Vitek MS, for the identification of yeasts within closely-related species complexes. A total of 1,228 yeast isolates, representing 14 different species of five species complexes, including 479 of Candida parapsilosis complex, 323 of Candida albicans complex, 95 of Candida glabrata complex, 16 of Candida haemulonii complex (including two Candida auris), and 315 of Cryptococcus neoformans complex, collected under the National China Hospital Invasive Fungal Surveillance Net (CHIF-NET) program, were studied. Autof MS 1000 and Vitek MS systems correctly identified 99.2% and 89.2% of the isolates, with major error rate of 0.4% versus 1.6%, and minor error rate of 0.1% versus 3.5%, respectively. The proportion of isolates accurately identified by Autof MS 1000 and Vitek MS per each yeast complex, respectively, was as follows; C. albicans complex, 99.4% vs 96.3%; C. parapsilosis complex, 99.0% vs 79.1%; C glabrata complex, 98.9% vs 94.7%; C. haemulonii complex, 100% vs 93.8%; and C. neoformans, 99.4% vs 95.2%. Overall, Autof MS 1000 exhibited good capacity in yeast identification while Vitek MS had lower identification accuracy, especially in the identification of less common species within phylogenetically closely-related species complexes.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 218
Author(s):  
Dwueng-Chwuan Jhwueng ◽  
Chih-Ping Wang

Regression analysis using line equations has been broadly applied in studying the evolutionary relationship between the response trait and its covariates. However, the characteristics among closely related species in nature present abundant diversities where the nonlinear relationship between traits have been frequently observed. By treating the evolution of quantitative traits along a phylogenetic tree as a set of continuous stochastic variables, statistical models for describing the dynamics of the optimum of the response trait and its covariates are built herein. Analytical representations for the response trait variables, as well as their optima among a group of related species, are derived. Due to the models’ lack of tractable likelihood, a procedure that implements the Approximate Bayesian Computation (ABC) technique is applied for statistical inference. Simulation results show that the new models perform well where the posterior means of the parameters are close to the true parameters. Empirical analysis supports the new models when analyzing the trait relationship among kangaroo species.


2021 ◽  
Author(s):  
Rory J Craig ◽  
Ahmed R Hasan ◽  
Rob W Ness ◽  
Peter D Keightley

Abstract Despite its role as a reference organism in the plant sciences, the green alga Chlamydomonas reinhardtii entirely lacks genomic resources from closely related species. We present highly contiguous and well-annotated genome assemblies for three unicellular C. reinhardtii relatives: Chlamydomonas incerta, Chlamydomonas schloesseri, and the more distantly related Edaphochlamys debaryana. The three Chlamydomonas genomes are highly syntenous with similar gene contents, although the 129.2 Mb C. incerta and 130.2 Mb C. schloesseri assemblies are more repeat-rich than the 111.1 Mb C. reinhardtii genome. We identify the major centromeric repeat in C. reinhardtii as a LINE transposable element homologous to Zepp (the centromeric repeat in Coccomyxa subellipsoidea) and infer that centromere locations and structure are likely conserved in C. incerta and C. schloesseri. We report extensive rearrangements, but limited gene turnover, between the minus mating type loci of these Chlamydomonas species. We produce an eight-species core-Reinhardtinia whole-genome alignment, which we use to identify several hundred false positive and missing genes in the C. reinhardtii annotation and >260,000 evolutionarily conserved elements in the C. reinhardtii genome. In summary, these resources will enable comparative genomics analyses for C. reinhardtii, significantly extending the analytical toolkit for this emerging model system.


2018 ◽  
Author(s):  
Swati Parekh ◽  
Beate Vieth ◽  
Christoph Ziegenhain ◽  
Wolfgang Enard ◽  
Ines Hellmann

AbstractWith the growing appreciation for the role of regulatory differences in evolution, researchers need to reliably quantify expression levels within and among species. However, for non-model organisms genome assemblies and annotations are often not available or have inferior quality, biasing the inference of expression changes to an unknown extent. Here, we explore the possibility to map RNA-seq reads from diverged species to one high quality reference genome. As test case, we used a small primate phylogeny ranging from Human to Marmoset spanning 12% nucleotide divergence. To distinguish the effect of sequence divergence and genome quality, we used in silico evolved genomes and existing genomes to simulate RNA-seq reads. These were then mapped to the genome of origin (self-mapping) as well as to one common reference (cross-mapping) to infer the quantification biases. We find that the bias due to cross-mapping is small for the closely related great apes (≤ 4% divergence), and preferable to self-mapping given current genome qualities. For closely related species, cross-mapping provides easy access, high power and a well controlled false discovery rate for both; the analysis of intra-species expression differences as well as the detection of relative differences between species. If divergence increases, so that a substantial fraction of reads exceeds the limits of the mapper used, we find that gene-specific corrections and effect-size cutoffs can limit the bias before self-mapping becomes unavoidable. In summary, for the first time we systematically quantify biases in cross-species RNA-seq studies, providing guidance to best practices for these important evolutionary studies.


2019 ◽  
Vol 7 (1) ◽  
pp. 41-64 ◽  
Author(s):  
Joel Armstrong ◽  
Ian T. Fiddes ◽  
Mark Diekhans ◽  
Benedict Paten

Rapidly improving sequencing technology coupled with computational developments in sequence assembly are making reference-quality genome assembly economical. Hundreds of vertebrate genome assemblies are now publicly available, and projects are being proposed to sequence thousands of additional species in the next few years. Such dense sampling of the tree of life should give an unprecedented new understanding of evolution and allow a detailed determination of the events that led to the wealth of biodiversity around us. To gain this knowledge, these new genomes must be compared through genome alignment (at the sequence level) and comparative annotation (at the gene level). However, different alignment and annotation methods have different characteristics; before starting a comparative genomics analysis, it is important to understand the nature of, and biases and limitations inherent in, the chosen methods. This review is intended to act as a technical but high-level overview of the field that should provide this understanding. We briefly survey the state of the genome alignment and comparative annotation fields and potential future directions for these fields in a new, large-scale era of comparative genomics.


2020 ◽  
Author(s):  
Rory J. Craig ◽  
Ahmed R. Hasan ◽  
Rob W. Ness ◽  
Peter D. Keightley

AbstractDespite its fundamental role as a model organism in plant sciences, the green alga Chlamydomonas reinhardtii entirely lacks genomic resources for any closely related species, obstructing its development as a study system in several fields. We present highly contiguous and well-annotated genome assemblies for the two closest known relatives of the species, Chlamydomonas incerta and Chlamydomonas schloesseri, and a third more distantly related species, Edaphochlamys debaryana. We find the three Chlamydomonas genomes to be highly syntenous with similar gene contents, although the 129.2 Mb C. incerta and 130.2 Mb C. schloesseri assemblies are more repeat-rich than the 111.1 Mb C. reinhardtii genome. We identify the major centromeric repeat in C. reinhardtii as an L1 LINE transposable element homologous to Zepp (the centromeric repeat in Coccomyxa subellipsoidea) and infer that centromere locations and structure are likely conserved in C. incerta and C. schloesseri. We report extensive rearrangements, but limited gene turnover, between the minus mating-type loci of the Chlamydomonas species, potentially representing the early stages of mating-type haplotype reformation. We produce an 8-species whole-genome alignment of unicellular and multicellular volvocine algae and identify evolutionarily conserved elements in the C. reinhardtii genome. We find that short introns (<~100 bp) are extensively overlapped by conserved elements, and likely represent an important functional class of regulatory sequence in C. reinhardtii. In summary, these novel resources enable comparative genomics analyses to be performed for C. reinhardtii, significantly developing the analytical toolkit for this important model system.


2017 ◽  
Author(s):  
Efrat Rapoport ◽  
Moran Neuhof

AbstractBackgroundThe effective detection and comparison of orthologues is crucial for answering many questions in comparative genomics, phylogenetics and evolutionary biology. One of the most common methods for discovering orthologues is widely known as ‘Reciprocal Blast’. While this method is simple when comparing only two genomes, performing a large-scale comparison of Multiple Genes across Multiple Taxa becomes a labor-intensive and inefficient task. The low efficiency of this complicated process limits the scope and breadth of questions that would otherwise benefit from this powerful method.FindingsHere we present RecBlast, an intuitive and easy-to-use pipeline that enables fast and easy discovery of orthologues along and across the evolutionary tree. RecBlast is capable of running heavy, large-scale and complex Reciprocal Blast comparisons across multiple genes and multiple taxa, in a completely automatic way. RecBlast is available as a cloud-based web server, which includes an easy-to-use user interface, implemented using cloud computing and an elastic and scalable server architecture. RecBlast is also available as a powerful standalone software supporting multi-processing for large datasets, and a cloud image which can be easily deployed on Amazon Web Services cloud. We also include sample results spanning 448 human genes, which illustrate the potential of RecBlast in detecting orthologues and in highlighting patterns and trends across multiple taxa.ConclusionsRecBlast provides a fast, inexpensive and valuable insight into trends and phenomena across distance phyla, and provides data, visualizations and directions for downstream analysis. RecBlast's fully automatic pipeline provides a new and intuitive discovery platform for researchers from any domain in biology who are interested in evolution, comparative genomics and phylogenetics, regardless of their computational skills.


Sign in / Sign up

Export Citation Format

Share Document