REPARATION: Ribosome Profiling Assisted (Re-)Annotation of Bacterial genomes

sPepFinder expedites genome-wide identification of small proteins in bacteria

10.1101/2020.05.05.079178 ◽

2020 ◽

Author(s):

Lei Li ◽

Yanjie Chao

Keyword(s):

De Novo ◽

Bacterial Species ◽

Computational Prediction ◽

Ribosome Profiling ◽

Support Vector ◽

Initiation Rate ◽

E Coli ◽

Small Proteins ◽

Genome Wide ◽

A Genome

ABSTRACTSmall proteins shorter than 50 amino acids have been long overlooked. A number of small proteins have been identified in several model bacteria using experimental approaches and assigned important functions in diverse cellular processes. The recent development of ribosome profiling technologies has allowed a genome-wide identification of small proteins and small ORFs (smORFs), but our incomplete understanding of small proteins hinders de novo computational prediction of smORFs in non-model bacterial species. Here, we have identified several sequence features for smORFs by a systematic analysis of all the known small proteins in E. coli, among which the translation initiation rate is the strongest determinant. By integrating these features into a support vector machine learning model, we have developed a novel sPepFinder algorithm that can predict conserved smORFs in bacterial genomes with a high accuracy of 92.8%. De novo prediction in E. coli has revealed several novel smORFs with evidence of translation supported by ribosome profiling. Further application of sPepFinder in 549 bacterial species has led to the identification of > 100,000 novel smORFs, many of which are conserved at the amino acid and nucleotide levels under purifying selection. Overall, we have established sPepFinder as a valuable tool to identify novel smORFs in both model and non-model bacterial organisms, and provided a large resource of small proteins for functional characterizations.

Download Full-text

Evidence for functional and non-functional classes of peptides translated from long non-coding RNAs

10.1101/064915 ◽

2016 ◽

Cited By ~ 3

Author(s):

Jorge Ruiz-Orera ◽

Pol Verdaguer-Grau ◽

José Luis Villanueva-Cañas ◽

Xavier Messeguer ◽

M Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Polymorphism Analysis ◽

Proteomics Data ◽

Functional Protein ◽

Codon Composition ◽

Functional Classes ◽

Chance Occurrence

AbstractThere is accumulating evidence that some genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that can at some point acquire new functions. Here we show for the first time that such peptides do not only exist but that they are prevalent among the translation products of mouse genes that lack homologues in rat and human. The data suggests that the translation of these peptides is due to the chance occurrence of open reading frames with a favorable codon composition. Our approach combines ribosome profiling experiments, proteomics data and non-synonymous and synonymous nucleotide polymorphism analysis. We propose that effectively neutral processes involving the expression of thousands of transcripts all the way down to proteins provide a basis for de novo gene evolution.

Download Full-text

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage

mSystems ◽

10.1128/msystems.00833-20 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Patrick Willems ◽

Igor Fijalkowski ◽

Petra Van Damme

Keyword(s):

Genome Annotation ◽

Deinococcus Radiodurans ◽

Gene Annotation ◽

Bacterial Genome ◽

Prokaryotic Genome ◽

Ribosome Profiling ◽

Great Promise ◽

Data Sets ◽

Proteome Coverage ◽

Content Type

ABSTRACT Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.

Download Full-text

Lost and found: re-searching and re-scoring proteomics data aids the discovery of bacterial proteins and improves proteome coverage

10.1101/2019.12.18.881375 ◽

2019 ◽

Cited By ~ 3

Author(s):

Patrick Willems ◽

Igor Fijalkowski ◽

Petra Van Damme

Keyword(s):

Genome Annotation ◽

Deinococcus Radiodurans ◽

Gene Annotation ◽

Prokaryotic Genome ◽

Great Promise ◽

Proteomics Data ◽

General Applicability ◽

Proteomic Data ◽

Genome Complexity ◽

Alternative Protein

ABSTRACTProkaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribo-seq and proteomic data of Salmonella Typhiumurium to identify unannotated proteins or alternative protein forms raised upon alternative translation initiation (i.e. N-terminal proteoforms). This data analysis encompasses the searching of co-fragmenting peptides and post-processing with extended peptide-to-spectrum quality features including comparison to predicted fragment ion intensities. When applying this strategy, an enhanced proteome-depth is achieved as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by re-analyzing public Deinococcus radiodurans datasets. Taken together, systematic re-analysis using available prokaryotic (proteome) datasets holds great promise to assist in experimentally-based genome annotation.

Download Full-text

Faculty Opinions recommendation of Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13296960.14657061 ◽

2011 ◽

Author(s):

Steven Salzberg

Keyword(s):

Single Cell ◽

De Novo Assembly ◽

De Novo ◽

Data Sets ◽

Bacterial Genomes ◽

Short Read

Download Full-text

Faculty Opinions recommendation of Prokka: rapid prokaryotic genome annotation.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718312026.793492408 ◽

2014 ◽

Author(s):

Stephen Turner

Keyword(s):

Genome Annotation ◽

Prokaryotic Genome

Download Full-text

Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome

Microbiome ◽

10.1186/s40168-020-00981-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Hannes Petruschke ◽

Christian Schori ◽

Sebastian Canzler ◽

Sarah Riesbeck ◽

Anja Poehlein ◽

...

Keyword(s):

Microbial Communities ◽

Intestinal Microbiota ◽

De Novo ◽

Bacterial Species ◽

Intestinal Microbiome ◽

Single Strain ◽

Small Proteins ◽

Human Intestinal Microbiota ◽

Wide Range

Abstract Background The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. Results We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. Conclusions We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract.

Download Full-text

The microbiota is dispensable for the early stages of peripheral regulatory T cell induction within mesenteric lymph nodes

Cellular and Molecular Immunology ◽

10.1038/s41423-021-00647-2 ◽

2021 ◽

Author(s):

Carolin Wiechers ◽

Mangge Zou ◽

Eric Galvez ◽

Michael Beckstette ◽

Maria Ebel ◽

...

Keyword(s):

T Cell ◽

Lymph Nodes ◽

Regulatory T Cell ◽

De Novo ◽

Bacterial Species ◽

Short Chain Fatty Acids ◽

Mesenteric Lymph Nodes ◽

Direct Role ◽

Early Stages ◽

Mesenteric Lymph

AbstractIntestinal Foxp3+ regulatory T cell (Treg) subsets are crucial players in tolerance to microbiota-derived and food-borne antigens, and compelling evidence suggests that the intestinal microbiota modulates their generation, functional specialization, and maintenance. Selected bacterial species and microbiota-derived metabolites, such as short-chain fatty acids (SCFAs), have been reported to promote Treg homeostasis in the intestinal lamina propria. Furthermore, gut-draining mesenteric lymph nodes (mLNs) are particularly efficient sites for the generation of peripherally induced Tregs (pTregs). Despite this knowledge, the direct role of the microbiota and their metabolites in the early stages of pTreg induction within mLNs is not fully elucidated. Here, using an adoptive transfer-based pTreg induction system, we demonstrate that neither transfer of a dysbiotic microbiota nor dietary SCFA supplementation modulated the pTreg induction capacity of mLNs. Even mice housed under germ-free (GF) conditions displayed equivalent pTreg induction within mLNs. Further molecular characterization of these de novo induced pTregs from mLNs by dissection of their transcriptomes and accessible chromatin regions revealed that the microbiota indeed has a limited impact and does not contribute to the initialization of the Treg-specific epigenetic landscape. Overall, our data suggest that the microbiota is dispensable for the early stages of pTreg induction within mLNs.

Download Full-text

Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries

mSystems ◽

10.1128/msystems.00731-19 ◽

2020 ◽

Vol 5 (1) ◽

Cited By ~ 14

Author(s):

Matthew R. Olm ◽

Alexander Crits-Christoph ◽

Spencer Diamond ◽

Adi Lavy ◽

Paula B. Matheus Carnevali ◽

...

Keyword(s):

Bacterial Diversity ◽

Ribosomal Proteins ◽

Large Scale ◽

Bacterial Species ◽

Bacterial Genome ◽

16S Rrna Genes ◽

Rrna Genes ◽

Species Discrimination ◽

Bacterial Genomes ◽

Discrimination Power

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.

Download Full-text

Employing whole genome mapping for optimal de novo assembly of bacterial genomes

BMC Research Notes ◽

10.1186/1756-0500-7-484 ◽

2014 ◽

Vol 7 (1) ◽

pp. 484 ◽

Cited By ~ 9

Author(s):

Basil Xavier ◽

Julia Sabirova ◽

Moons Pieter ◽

Jean-Pierre Hernalsteens ◽

Henri de Greve ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Genome Mapping ◽

Whole Genome ◽

Bacterial Genomes ◽

Whole Genome Mapping

Download Full-text