scholarly journals REPARATION: Ribosome Profiling Assisted (Re-)Annotation of Bacterial genomes

2017 ◽  
Author(s):  
Elvis Ndah ◽  
Veronique Jonckheere ◽  
Adam Giess ◽  
Eivind Valen ◽  
Gerben Menschaert ◽  
...  

ABSTRACTProkaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence context and often underestimate the complexity of the proteome. We developed REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION), a de novo algorithm that takes advantage of experimental protein translation evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation. REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs. Our predictions were supported by matching mass spectrometry (MS) proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to perform de novo ORF delineation in bacterial genomes irrespective of the sequence context of the reading frame.

2020 ◽  
Author(s):  
Lei Li ◽  
Yanjie Chao

ABSTRACTSmall proteins shorter than 50 amino acids have been long overlooked. A number of small proteins have been identified in several model bacteria using experimental approaches and assigned important functions in diverse cellular processes. The recent development of ribosome profiling technologies has allowed a genome-wide identification of small proteins and small ORFs (smORFs), but our incomplete understanding of small proteins hinders de novo computational prediction of smORFs in non-model bacterial species. Here, we have identified several sequence features for smORFs by a systematic analysis of all the known small proteins in E. coli, among which the translation initiation rate is the strongest determinant. By integrating these features into a support vector machine learning model, we have developed a novel sPepFinder algorithm that can predict conserved smORFs in bacterial genomes with a high accuracy of 92.8%. De novo prediction in E. coli has revealed several novel smORFs with evidence of translation supported by ribosome profiling. Further application of sPepFinder in 549 bacterial species has led to the identification of > 100,000 novel smORFs, many of which are conserved at the amino acid and nucleotide levels under purifying selection. Overall, we have established sPepFinder as a valuable tool to identify novel smORFs in both model and non-model bacterial organisms, and provided a large resource of small proteins for functional characterizations.


2016 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
Pol Verdaguer-Grau ◽  
José Luis Villanueva-Cañas ◽  
Xavier Messeguer ◽  
M Mar Albà

AbstractThere is accumulating evidence that some genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that can at some point acquire new functions. Here we show for the first time that such peptides do not only exist but that they are prevalent among the translation products of mouse genes that lack homologues in rat and human. The data suggests that the translation of these peptides is due to the chance occurrence of open reading frames with a favorable codon composition. Our approach combines ribosome profiling experiments, proteomics data and non-synonymous and synonymous nucleotide polymorphism analysis. We propose that effectively neutral processes involving the expression of thousands of transcripts all the way down to proteins provide a basis for de novo gene evolution.


mSystems ◽  
2020 ◽  
Vol 5 (5) ◽  
Author(s):  
Patrick Willems ◽  
Igor Fijalkowski ◽  
Petra Van Damme

ABSTRACT Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.


Author(s):  
Patrick Willems ◽  
Igor Fijalkowski ◽  
Petra Van Damme

ABSTRACTProkaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribo-seq and proteomic data of Salmonella Typhiumurium to identify unannotated proteins or alternative protein forms raised upon alternative translation initiation (i.e. N-terminal proteoforms). This data analysis encompasses the searching of co-fragmenting peptides and post-processing with extended peptide-to-spectrum quality features including comparison to predicted fragment ion intensities. When applying this strategy, an enhanced proteome-depth is achieved as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by re-analyzing public Deinococcus radiodurans datasets. Taken together, systematic re-analysis using available prokaryotic (proteome) datasets holds great promise to assist in experimentally-based genome annotation.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Hannes Petruschke ◽  
Christian Schori ◽  
Sebastian Canzler ◽  
Sarah Riesbeck ◽  
Anja Poehlein ◽  
...  

Abstract Background The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. Results We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. Conclusions We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract.


Author(s):  
Carolin Wiechers ◽  
Mangge Zou ◽  
Eric Galvez ◽  
Michael Beckstette ◽  
Maria Ebel ◽  
...  

AbstractIntestinal Foxp3+ regulatory T cell (Treg) subsets are crucial players in tolerance to microbiota-derived and food-borne antigens, and compelling evidence suggests that the intestinal microbiota modulates their generation, functional specialization, and maintenance. Selected bacterial species and microbiota-derived metabolites, such as short-chain fatty acids (SCFAs), have been reported to promote Treg homeostasis in the intestinal lamina propria. Furthermore, gut-draining mesenteric lymph nodes (mLNs) are particularly efficient sites for the generation of peripherally induced Tregs (pTregs). Despite this knowledge, the direct role of the microbiota and their metabolites in the early stages of pTreg induction within mLNs is not fully elucidated. Here, using an adoptive transfer-based pTreg induction system, we demonstrate that neither transfer of a dysbiotic microbiota nor dietary SCFA supplementation modulated the pTreg induction capacity of mLNs. Even mice housed under germ-free (GF) conditions displayed equivalent pTreg induction within mLNs. Further molecular characterization of these de novo induced pTregs from mLNs by dissection of their transcriptomes and accessible chromatin regions revealed that the microbiota indeed has a limited impact and does not contribute to the initialization of the Treg-specific epigenetic landscape. Overall, our data suggest that the microbiota is dispensable for the early stages of pTreg induction within mLNs.


mSystems ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Matthew R. Olm ◽  
Alexander Crits-Christoph ◽  
Spencer Diamond ◽  
Adi Lavy ◽  
Paula B. Matheus Carnevali ◽  
...  

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.


2014 ◽  
Vol 7 (1) ◽  
pp. 484 ◽  
Author(s):  
Basil Xavier ◽  
Julia Sabirova ◽  
Moons Pieter ◽  
Jean-Pierre Hernalsteens ◽  
Henri de Greve ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document