scholarly journals Large-scale analyses of human microbiomes reveal thousands of small, novel genes and their predicted functions

2018 ◽  
Author(s):  
Hila Sberro ◽  
Nicholas Greenfield ◽  
Georgios Pavlopoulos ◽  
Nikos Kyrpides ◽  
Ami S. Bhatt

AbstractSmall proteins likely abound in prokaryotes, and may mediate much of the communication that occurs between organisms within a microbiome and their host. Unfortunately, small proteins are traditionally overlooked in biology, in part due to the computational and experimental difficulties in detecting them. To systematically identify novel small proteins, we carried out a large comparative genomics study on 1,773 HMP human-associated metagenomes from four different body sites (mouth, gut, skin and vagina). We describe more than four thousand conserved protein families, the majority of which are novel; ~30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain, and almost half are not represented in reference genomes, emphasizing the incompleteness of knowledge in this space. Our analysis exposes putative novel ‘housekeeping’ small protein families, including a potential novel ribosomally associated protein, as well as ‘mammalian-specific’ or ‘human-specific’ protein families. By analyzing the genomic neighborhood of small genes, we pinpoint a subset of families that are potentially associated with defense against bacteriophage. Finally, we identify families that may be subject to horizontal transfer and are thus potentially involved in adaptation of bacteria to the changing human environment. Our study suggest that small proteins are highly abundant and that those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.

2020 ◽  
Author(s):  
Matthew G. Durrant ◽  
Ami S. Bhatt

AbstractRecent work performed by Sberro et al. (2019) revealed a vast unexplored space of small proteins existing within the human microbiome. At present, these small open reading frames (smORFs) are unannotated in existing reference genomes and standard genome annotation tools are not able to accurately predict them. In this study, we introduce an annotation tool named SmORFinder that predicts small proteins based on those identified by Sberro et al. This tool combines profile Hidden Markov models (pHMMs) of each small protein family and deep learning models that may better generalize to smORF families not seen in the training set. We find that combining predictions of both pHMM and deep learning models leads to more precise smORF predictions and that these predicted smORFs are enriched for Ribo-Seq or MetaRibo-Seq translation signals. Feature importance analysis reveals that the deep learning models learned to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codons in a way that strongly corresponds to the codon synonyms found in the codon table. We perform a core genome analysis of 26 bacterial species and identify many core smORFs of unknown function. We pre-compute small protein annotations for thousands of RefSeq isolate genomes and HMP metagenomes, and we make these data available through a web portal along with other useful tools for small protein annotation and analysis. The systematic identification and annotation of those important small proteins will help researchers to expand our understanding of this exciting field of biology.


Genetics ◽  
1974 ◽  
Vol 76 (2) ◽  
pp. 289-299
Author(s):  
Margaret McCarron ◽  
William Gelbart ◽  
Arthur Chovnick

ABSTRACT A convenient method is described for the intracistronic mapping of genetic sites responsible for electrophoretic variation of a specific protein in Drosophila melanogaster. A number of wild-type isoalleles of the rosy locus have been isolated which are associated with the production of electrophoretically distinguishable xanthine dehydrogenases. Large-scale recombination experiments were carried out involving null enzyme mutants induced on electrophoretically distinct wild-type isoalleles, the genetic basis for which is followed as a nonselective marker in the cross. Additionally, a large-scale recombination experiment was carried out involving null enzyme rosy mutants induced on the same wild-type isoallele. Examination of the electrophoretic character of crossover and convertant products recovered from the latter experiment revealed that all exhibited the same parental electrophoretic character. In addition to documenting the stability of the xanthine dehydrogenase electrophoretic character, this observation argues against a special mutagenesis hypothesis to explain conversions resulting from allele recombination studies.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Yufeng Zhang ◽  
Wenzhi Cui ◽  
Lu Liu ◽  
Jian Xu ◽  
...  

Abstract Background Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. Results Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. Conclusions This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub (https://github.com/qibebt-bioinfo/meta-apo) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.


2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Yuanying Peng ◽  
Honghai Yan ◽  
Laichun Guo ◽  
Cao Deng ◽  
Lipeng Kang ◽  
...  

Abstract Common oat (Avena sativa) is one of the most important cereal crops serving as a valuable source of forage and human food. While reference genomes of many important crops have been generated, such work in oat has lagged behind, primarily owing to its large, repeat-rich, polyploid genome. By using Oxford Nanopore ultralong sequencing and Hi-C technologies, we have generated the first reference-quality genome assembly of hulless common oat with a contig N50 of 93 Mb. We also assembled the genomes of diploid and tetraploid Avena ancestors, which enabled us to identify oat subgenome, large-scale structural rearrangements, and preferential gene loss in the C subgenome after hexaploidization. Phylogenomic analyses of cereal crops indicated that the oat lineage descended before wheat, offering oat as a unique window into the early evolution of polyploid plants. The origin and evolution of hexaploid oat is deduced from whole-genome sequencing, plastid genome and transcriptomes assemblies of numerous Avena species. The high-quality reference genomes of Avena species with different ploidies and the studies of their polyploidization history will facilitate the full use of crop gene resources and provide a reference for the molecular mechanisms underlying the polyploidization of higher plants, helping us to overcome food security challenges.


2013 ◽  
Vol 5 (1) ◽  
pp. 185-232
Author(s):  
Tahnee Lisa Prior

Abstract We often mistakenly assume that institutional design will remain effective indefinitely. Complex long-term environmental challenges illuminate the disparity between institutions and state boundaries. While globalization has challenged monocentrism, we must look beyond traditional measures and design resilient governance systems, such as polycentric governance, that combine trust and local expertise in small-scale governance with the governance capacity of large-scale systems. These harness globalization’s benefits and provide solutions for the effects of ecosystem changes. This work examines the lessons – benefits, challenges, limitations, and unanswered questions – that may be learned from polycentric governance in the case of Persistent Organic Pollutants (POPs) in the Arctic, where a polycentric political system has developed as a result of a mismatch in environmental, jurisdictional, and temporal scales. Section One examines characteristics of polycentricity, focusing on actors, multilevel governance, degree of formality, and the nature of interactions. Section Two concentrates on the tools utilized. Section Three applies the outlined framework. Finally, Section Four examines three lessons that global environmental governance may learn from the case study: (1) Peak organizations are effective tools for managing polycentricity, allowing for the inclusion of non-state actors, such as indigenous peoples organizations (2) and epistemic communities (3), in bridging the human-environment nexus.


Author(s):  
Diana Liverman ◽  
Brent Yarnal

The human–environment condition has emerged as one of the central issues of the new millennium, especially as it has become apparent that human activity is transforming nature at a global scale in both systemic and cumulative ways. Originating with concerns about potential climate warming, the global environmental change agenda rapidly enlarged to include changes in structure and function of the earth’s natural systems, notably those systems critical for life, and the policy implications of these changes, especially focused on the coupled human–environment system. Recognition of the unprecedented pace, magnitude, and spatial scale of global change, and of the pivotal role of humankind in creating and responding to it, has led to the emergence of a worldwide, interdisciplinary effort to understand the human dimensions of global change. The term “global change” now encompasses a range of research issues including those relating to economic, political, and cultural globalization, but in this chapter we limit our focus to global environmental change and to the field that has become formally known as the human dimensions of global (or global environmental) change. We also focus mainly on the work of geographers rather than attempting to review the whole human dimensions research community. Intellectually, geography is well positioned to contribute to global environmental change research (Liverman 1999). The large-scale human transformation of the planet through activities such as agriculture, deforestation, water diversion, fossil fuel use, and urbanization, and the impacts of these on living conditions through changes in, for example, climate and biodiversity, has highlighted the importance of scholarship that analyzes the human–environmental relationship and can inform policy. Geography is one of the few disciplines that has historically claimed human–environment relationships as a definitional component of itself (Glacken 1967; Marsh 1864) and has fostered a belief in and reward system for engaging integrative approaches to problem solving (Golledge 2002; Turner 2002). Moreover, global environmental change is intimately spatial and draws upon geography-led remote sensing and geographic information science (Liverman et al. 1998). Geographers anticipated the emergence of current global change concerns (Thomas et al. 1956; Burton et al. 1978) and were seminal in the development of the multidisciplinary programs of study into the human dimensions of global change.


mSystems ◽  
2019 ◽  
Vol 4 (1) ◽  
pp. e00010-19
Author(s):  
Sigal Leviatan ◽  
Eran Segal

ABSTRACT Shotgun sequencing of samples taken from the human microbiome often reveals only partial mapping of the sequenced metagenomic reads to existing reference genomes. Such partial mappability indicates that many genomes are missing in our reference genome set. This is particularly true for non-Western populations and for samples that do not originate from the gut. Pasolli et al. (E. Pasolli, F. Asnicar, S. Manara, M. Zolfo, et al., Cell, 2019, https://doi.org/10.1016/j.cell.2019.01.001) perform a grand effort to expand the reference set, and to better classify its members, revealing a wider pangenome of existing species as well as identifying new species of previously unknown taxonomic branches.


2019 ◽  
Vol 36 (2) ◽  
pp. 356-363 ◽  
Author(s):  
Terry Ma ◽  
Di Xiao ◽  
Xin Xing

Abstract Motivation Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples that are sequenced using next-generation-sequencing technology hold the key to the success of most metagenomic studies. To achieve these goals, we propose a simple yet powerful metagenomic binning method, MetaBMF. The method does not require prior knowledge of reference genomes and produces highly accurate results, even at a strain level. Thus, it can be broadly used to identify disease-related microbial organisms that are not well-studied. Results Mathematically, we count the number of mapped reads on each assembled genomic fragment cross different samples as our input matrix and propose a scalable stratified angle regression algorithm to factorize this count matrix into a product of a binary matrix and a nonnegative matrix. The binary matrix can be used to separate microbial species and the nonnegative matrix quantifies the species distributions in different samples. In simulation and empirical studies, we demonstrate that MetaBMF has a high binning accuracy. It can not only bin DNA fragments accurately at a species level but also at a strain level. As shown in our example, we can accurately identify the Shiga-toxigenic Escherichia coli O104: H4 strain which led to the 2011 German E.coli outbreak. Our efforts in these areas should lead to (i) fundamental advances in metagenomic binning, (ii) development and refinement of technology for the rapid identification and quantification of microbial distributions and (iii) finding of potential probiotics or reliable pathogenic bacterial strains. Availability and implementation The software is available at https://github.com/didi10384/MetaBMF.


Sign in / Sign up

Export Citation Format

Share Document