scholarly journals Anomalous phylogenetic behavior of ribosomal proteins in metagenome assembled genomes

2019 ◽  
Author(s):  
Sriram G. Garg ◽  
Nils Kapust ◽  
Weili Lin ◽  
Fernando D. K. Tria ◽  
Shijulal Nelson-Sathi ◽  
...  

SummaryMetagenomic studies have claimed the existence of novel lineages with unprecedented properties never before observed in prokaryotes. Such lineages include Asgard archaea1–3, which are purported to represent archaea with eukaryotic cell complexity, and the Candidate Phyla Radiation (CPR), a novel domain level taxon erected solely on the basis of metagenomic data4. However, it has escaped the attention of most biologists that these metagenomic sequences are not assembled into genomes by sequence overlap, as for cultured archaea and bacteria. Instead, short contigs are sorted into computer files by a process called binning in which they receive taxonomic assignment on the basis of sequence properties like GC content, dinucleotide frequencies, and stoichiometric co-occurrence across samples. Consequently, they are not genome sequences as we know them, reflecting the gene content of real organisms. Rather they are metagenome assembled genomes (MAGs). Debates that Asgard data are contaminated with individual eukaryotic sequences5–7 are overshadowed by the more pressing issue that no evidence exists to indicate that any sequences in binned Asgard MAGs actually stem from the same chromosome, as opposed to simply stemming from the same environment. Here we show that Asgard and CPR MAGs fail spectacularly to meet the most basic phylogenetic criterion8 fulfilled by genome sequences of all cultured prokaryotes investigated to date: the ribosomal proteins of Asgard and CPR MAGs do not share common evolutionary histories. Their phylogenetic behavior is anomalous to a degree never observed with genomes of real organisms. CPR and Asgard MAGs are binning artefacts, assembled from environments where up to 90% of the DNA is from dead cells9–12. Asgard and CPR MAGs are unnatural constructs, genome-like patchworks of genes that have been stitched together into computer files by binning.

Author(s):  
Sriram G Garg ◽  
Nils Kapust ◽  
Weili Lin ◽  
Michael Knopp ◽  
Fernando D K Tria ◽  
...  

Abstract Metagenomic studies permit the exploration of microbial diversity in a defined habitat and binning procedures enable phylogenomic analyses, taxon description and even phenotypic characterizations in the absence of morphological evidence. Such lineages include asgard archaea, which were initially reported to represent archaea with eukaryotic cell complexity, although the first images of such an archaeon show simple cells with prokaryotic characteristics. However, these metagenome-assembled genomes (MAGs) might suffer from data quality problems not encountered in sequences from cultured organisms due to two common analytical procedures of bioinformatics: assembly of metagenomic sequences and binning of assembled sequences on the basis of innate sequence properties and abundance across samples. Consequently, genomic sequences of distantly related taxa, or domains, can in principle be assigned to the same MAG and result in chimeric sequences. The impacts of low-quality or chimeric MAGs on phylogenomic and metabolic prediction remain unknown. Debates that asgard archaeal data are contaminated with eukaryotic sequences are overshadowed by the lack of evidence indicating that individual asgard MAGs stem from the same chromosome. Here we show that universal proteins including ribosomal proteins of asgard archaeal MAGs fail to meet the basic phylogenetic criterion fulfilled by genome sequences of cultured archaea investigated to date: these proteins do not share common evolutionary histories to the same extent as pure culture genomes (PCGs) do, pointing to a chimeric nature of asgard archaeal MAGs. Our analysis suggests that some asgard archaeal MAGs represent unnatural constructs, genome-like patchworks of genes resulting from assembly and/or the binning process.


2018 ◽  
Author(s):  
Peiqi Meng ◽  
Chang Lu ◽  
Xinzhe Lou ◽  
Qian Zhang ◽  
Peizeng Jia ◽  
...  

AbstractSeveral studies have documented the diversity and potential pathogenic associations of organisms in the human oral cavity. Although much progress has been made in understanding the complex bacterial community inhabiting the human oral cavity, our understanding of some microorganisms is less resolved due to a variety of reasons. One such little-understood group is the candidate phyla radiation (CPR), which is a recently identified, but highly abundant group of ultrasmall bacteria with reduced genomes and unusual ribosomes. Here, we present a computational protocol for the detection of CPR organisms from metagenomic data. Our approach relies on a self-constructed dataset comprising published CPR genomic sequences as a filter to identify CPR sequences from metagenomic sequencing data. After assembly and functional prediction, the taxonomic affiliation of CPR contigs can be identified through phylogenetic analysis with publically available 16S rRNA gene and ribosomal proteins, in addition to sequence similarity analyses (e.g., average nucleotide identity calculations and contig mapping). Using this protocol, we reconstructed two draft genomes of organisms within the TM7 superphylum, that had genome sizes of 0.594 Mb and 0.678 Mb. Among the predicted functional genes of the constructed genomes, a high percentage were related to signal transduction, cell motility, and cell envelope biogenesis, which could contribute to cellular morphological changes in response to environmental cues.ImportanceCandidate phyla radiation (CPR) bacterial group is a recently identified, but highly diverse and abundant group of ultrasmall bacteria exhibiting reduced genomes and limited metabolic capacities. A number of studies have reported their potential pathogenic associations in multiple mucosal diseases including periodontitis, halitosis, and inflammatory bowel disease. However, CPR organisms are difficult to cultivate and are difficult to detect with PCR-based methods due to divergent genetic sequences. Thus, our understanding of CPR has lagged behind that of other bacterial component. Here, we used metagenomic approaches to overcome these previous barriers to CPR identification, and established a computational protocol for detection of CPR organisms from metagenomic samples. The protocol describe herein holds great promise for better understanding the potential biological functioning of CPR. Moreover, the pipeline could be applied to other organisms that are difficult to cultivate.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Qiyun Zhu ◽  
Uyen Mai ◽  
Wayne Pfeiffer ◽  
Stefan Janssen ◽  
Francesco Asnicar ◽  
...  

AbstractRapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer “core” genes, such as the ribosomal proteins. The robustness of the results was tested with respect to several variables, including taxon and site sampling, amino acid substitution heterogeneity and saturation, non-vertical evolution, and the impact of exclusion of candidate phyla radiation (CPR) taxa. Our results provide an updated view of domain-level relationships.


2021 ◽  
Vol 53 (4) ◽  
Author(s):  
Jean N. Hakizimana ◽  
Jean B. Ntirandekura ◽  
Clara Yona ◽  
Lionel Nyabongo ◽  
Gladson Kamwendo ◽  
...  

AbstractSeveral African swine fever (ASF) outbreaks in domestic pigs have been reported in Burundi and Malawi and whole-genome sequences of circulating outbreak viruses in these countries are limited. In the present study, complete genome sequences of ASF viruses (ASFV) that caused the 2018 outbreak in Burundi (BUR/18/Rutana) and the 2019 outbreak in Malawi (MAL/19/Karonga) were produced using Illumina next-generation sequencing (NGS) platform and compared with other previously described ASFV complete genomes. The complete nucleotide sequences of BUR/18/Rutana and MAL/19/Karonga were 176,564 and 183,325 base pairs long with GC content of 38.62 and 38.48%, respectively. The MAL/19/Karonga virus had a total of 186 open reading frames (ORFs) while the BUR/18/Rutana strain had 151 ORFs. After comparative genomic analysis, the MAL/19/Karonga virus showed greater than 99% nucleotide identity with other complete nucleotides sequences of p72 genotype II viruses previously described in Tanzania, Europe and Asia including the Georgia 2007/1 isolate. The Burundian ASFV BUR/18/Rutana exhibited 98.95 to 99.34% nucleotide identity with genotype X ASFV previously described in Kenya and in Democratic Republic of the Congo (DRC). The serotyping results classified the BUR/18/Rutana and MAL/19/Karonga ASFV strains in serogroups 7 and 8, respectively. The results of this study provide insight into the genetic structure and antigenic diversity of ASFV strains circulating in Burundi and Malawi. This is important in order to understand the transmission dynamics and genetic evolution of ASFV in eastern Africa, with an ultimate goal of designing an efficient risk management strategy against ASF transboundary spread.


2019 ◽  
Author(s):  
H. Soon Gweon ◽  
Liam P. Shaw ◽  
Jeremy Swann ◽  
Nicola De Maio ◽  
Manal AbuOun ◽  
...  

ABSTRACTBackgroundShotgun metagenomics is increasingly used to characterise microbial communities, particularly for the investigation of antimicrobial resistance (AMR) in different animal and environmental contexts. There are many different approaches for inferring the taxonomic composition and AMR gene content of complex community samples from shotgun metagenomic data, but there has been little work establishing the optimum sequencing depth, data processing and analysis methods for these samples. In this study we used shotgun metagenomics and sequencing of cultured isolates from the same samples to address these issues. We sampled three potential environmental AMR gene reservoirs (pig caeca, river sediment, effluent) and sequenced samples with shotgun metagenomics at high depth (∼200 million reads per sample). Alongside this, we cultured single-colony isolates ofEnterobacteriaceaefrom the same samples and used hybrid sequencing (short- and long-reads) to create high-quality assemblies for comparison to the metagenomic data. To automate data processing, we developed an open-source software pipeline, ‘ResPipe’.ResultsTaxonomic profiling was much more stable to sequencing depth than AMR gene content. 1 million reads per sample was sufficient to achieve <1% dissimilarity to the full taxonomic composition. However, at least 80 million reads per sample were required to recover the full richness of different AMR gene families present in the sample, and additional allelic diversity of AMR genes was still being discovered in effluent at 200 million reads per sample. Normalising the number of reads mapping to AMR genes using gene length and an exogenous spike ofThermus thermophilusDNA substantially changed the estimated gene abundance distributions. While the majority of genomic content from cultured isolates from effluent was recoverable using shotgun metagenomics, this was not the case for pig caeca or river sediment.ConclusionsSequencing depth and profiling method can critically affect the profiling of polymicrobial animal and environmental samples with shotgun metagenomics. Both sequencing of cultured isolates and shotgun metagenomics can recover substantial diversity that is not identified using the other methods. Particular consideration is required when inferring AMR gene content or presence by mapping metagenomic reads to a database. ResPipe, the open-source software pipeline we have developed, is freely available (https://gitlab.com/hsgweon/ResPipe).


2017 ◽  
Vol 33 (2) ◽  
pp. 119-130
Author(s):  
Vinh Van Le ◽  
Hoai Van Tran ◽  
Hieu Ngoc Duong ◽  
Giang Xuan Bui ◽  
Lang Van Tran

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.


2021 ◽  
Author(s):  
Alexander L. Jaffe ◽  
Christine He ◽  
Ray Keren ◽  
Luis E. Valentin-Alvarado ◽  
Patrick Munk ◽  
...  

ABSTRACTCandidate Phyla Radiation (CPR) bacteria are small, likely episymbiotic organisms found across Earth’s ecosystems. Despite their prevalence, the distribution of CPR lineages across habitats and the genomic signatures of transitions amongst these habitats remain unclear. Here, we expand the genome inventory for Absconditabacteria (SR1), Gracilibacteria, and Saccharibacteria (TM7), CPR bacteria known to occur in both animal-associated and environmental microbiomes, and investigate variation in gene content with habitat of origin. By overlaying phylogeny with habitat information, we show that bacteria from these three lineages have undergone multiple transitions from environmental habitats into animal microbiomes. Based on co-occurrence analyses of hundreds of metagenomes, we extend the prior suggestion that certain Saccharibacteria have broad bacterial host ranges and constrain possible host relationships for Absconditabacteria and Gracilibacteria. Full-proteome analyses show that animal-associated Saccharibacteria have smaller gene repertoires than their environmental counterparts and are enriched in numerous protein families, including those likely functioning in amino acid metabolism, phage defense, and detoxification of peroxide. In contrast, some freshwater Saccharibacteria encode a putative rhodopsin. For protein families exhibiting the clearest patterns of differential habitat distribution, we compared protein and species phylogenies to estimate the incidence of lateral gene transfer and genomic loss occurring over the species tree. These analyses suggest that habitat transitions were likely not accompanied by large transfer or loss events, but rather were associated with continuous proteome remodeling. Thus, we speculate that CPR habitat transitions were driven largely by availability of suitable host taxa, and were reinforced by acquisition and loss of some capacities.IMPORTANCEStudying the genetic differences between related microorganisms from different environment types can indicate factors associated with their movement among habitats. This is particularly interesting for bacteria from the Candidate Phyla Radiation because their minimal metabolic capabilities require symbiotic associations with microbial hosts. We found that shifts of Absconditabacteria, Gracilibacteria, and Saccharibacteria between environmental ecosystems and mammalian mouths/guts probably did not involve major episodes of gene gain and loss; rather, gradual genomic change likely followed habitat migration. The results inform our understanding of how little-known microorganisms establish in the human microbiota where they may ultimately impact health.


mBio ◽  
2018 ◽  
Vol 9 (3) ◽  
Author(s):  
Yan Wang ◽  
Matt Stata ◽  
Wei Wang ◽  
Jason E. Stajich ◽  
Merlin M. White ◽  
...  

ABSTRACTModern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects.IMPORTANCEInsect guts harbor various microbes that are important for host digestion, immune response, and disease dispersal in certain cases. Bacteria, which are among the primary endosymbionts, have been studied extensively. However, fungi, which are also frequently encountered, are poorly known with respect to their biology within the insect guts. To understand the genomic features and related biology, we produced the whole-genome sequences of nine gut commensal fungi from disease-bearing insects (black flies, midges, and mosquitoes). The results show that insect gut fungi tend to have low GC content across their genomes. By comparing these commensals with entomopathogenic and free-living fungi that have available genome sequences, we found a universal core gene toolbox that is unique and thus potentially important for the insect-fungus symbiosis. This comparative work also uncovered different host invasion strategies employed by insect pathogens and commensals, as well as a model system to study ancient fungal genome duplication within the gut of insects.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Syed Razaul Haq ◽  
Sabeen Survery ◽  
Fredrik Hurtig ◽  
Ann-Christin Lindås ◽  
Celestine N. Chi

Abstract The origin of the eukaryotic cell is an unsettled scientific question. The Asgard superphylum has emerged as a compelling target for studying eukaryogenesis due to the previously unseen diversity of eukaryotic signature proteins. However, our knowledge about these proteins is still relegated to metagenomic data and very little is known about their structural properties. Additionally, it is still unclear if these proteins are functionally homologous to their eukaryotic counterparts. Here, we expressed, purified and structurally characterized profilin from Heimdallarchaeota in the Asgard superphylum. The structural analysis shows that while this profilin possesses similar secondary structural elements as eukaryotic profilin, it contains additional secondary structural elements that could be critical for its function and an indication of divergent evolution.


Sign in / Sign up

Export Citation Format

Share Document