Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage

ABSTRACTProkaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribo-seq and proteomic data of Salmonella Typhiumurium to identify unannotated proteins or alternative protein forms raised upon alternative translation initiation (i.e. N-terminal proteoforms). This data analysis encompasses the searching of co-fragmenting peptides and post-processing with extended peptide-to-spectrum quality features including comparison to predicted fragment ion intensities. When applying this strategy, an enhanced proteome-depth is achieved as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by re-analyzing public Deinococcus radiodurans datasets. Taken together, systematic re-analysis using available prokaryotic (proteome) datasets holds great promise to assist in experimentally-based genome annotation.

Download Full-text

Structured RNA Contaminants in Bacterial Ribo-Seq

mSphere ◽

10.1128/msphere.00855-20 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Brayon J. Fremin ◽

Ami S. Bhatt

Keyword(s):

Rna Structure ◽

Large Scale ◽

Partial Information ◽

Ribosome Profiling ◽

Micrococcal Nuclease ◽

Data Sets ◽

Rna Structures ◽

Content Type ◽

Sequencing Library

ABSTRACT Ribosome profiling (Ribo-Seq) is a powerful method to study translation in bacteria. However, Ribo-Seq signal can be observed across RNAs that one would not expect to be bound by ribosomes. For example, Escherichia coli Ribo-Seq libraries also capture reads from most noncoding RNAs (ncRNAs). While some of these ncRNAs may overlap coding regions, this alone does not explain the majority of observed signal across ncRNAs. These fragments of ncRNAs in Ribo-Seq data pass all size selection steps of the Ribo-Seq protocol and survive hours of micrococcal nuclease (MNase) treatment. In this work, we specifically focus on Ribo-Seq signal across ncRNAs and provide evidence to suggest that RNA structure, as opposed to ribosome binding, protects them from degradation and allows them to persist in the Ribo-Seq sequencing library preparation. By inspecting these “contaminant reads” in bacterial Ribo-Seq, we show that data previously disregarded in bacterial Ribo-Seq experiments may, in fact, be used to gain partial information regarding the in vivo secondary structure of ncRNAs. IMPORTANCE Structured ncRNAs are pivotal mediators of bioregulation in bacteria, and their functions are often reliant on their specific structures. Here, we first inspect Ribo-Seq reads across noncoding regions, identifying contaminant reads in these libraries. We observe that contaminant reads in bacterial Ribo-Seq experiments that are often disregarded, in fact, strongly overlap with structured regions of ncRNAs. We then perform several bioinformatic analyses to determine why these contaminant reads may persist in Ribo-Seq libraries. Finally, we highlight some structured RNA contaminants in Ribo-Seq and support the hypothesis that structures in the RNA protect them from MNase digestion. We conclude that researchers should be cautious when interpreting Ribo-Seq signal as coding without considering signal distribution. These findings also may enable us to partially resolve RNA structures, identify novel structured RNAs, and elucidate RNA structure-function relationships in bacteria at a large scale and in vivo through the reanalysis of existing Ribo-Seq data sets.

Download Full-text

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

mSystems ◽

10.1128/msystems.00190-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Open Source ◽

Genome Analysis ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Data Sets ◽

Bacterial Genomes ◽

Data Set ◽

Content Type

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.

Download Full-text

Bacterial riboproteogenomics: the era of N-terminal proteoform existence revealed

FEMS Microbiology Reviews ◽

10.1093/femsre/fuaa013 ◽

2020 ◽

Vol 44 (4) ◽

pp. 418-431 ◽

Cited By ~ 2

Author(s):

Daria Fijalkowska ◽

Igor Fijalkowski ◽

Patrick Willems ◽

Petra Van Damme

Keyword(s):

Gene Annotation ◽

Single Gene ◽

Bacterial Genome ◽

Functional Characterization ◽

Ribosome Profiling ◽

Molecular Forms ◽

Protein Diversity ◽

New Methodologies ◽

Genome Annotations ◽

Genomic Regions

ABSTRACT With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome reannotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms.

Download Full-text

REPARATION: Ribosome Profiling Assisted (Re-)Annotation of Bacterial genomes

10.1101/113530 ◽

2017 ◽

Cited By ~ 1

Author(s):

Elvis Ndah ◽

Veronique Jonckheere ◽

Adam Giess ◽

Eivind Valen ◽

Gerben Menschaert ◽

...

Keyword(s):

Genome Annotation ◽

De Novo ◽

Bacterial Species ◽

Prokaryotic Genome ◽

Protein Translation ◽

Ribosome Profiling ◽

Proteomics Data ◽

Bacterial Genomes ◽

Sequence Context ◽

Automated Methods

ABSTRACTProkaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence context and often underestimate the complexity of the proteome. We developed REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION), a de novo algorithm that takes advantage of experimental protein translation evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation. REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs. Our predictions were supported by matching mass spectrometry (MS) proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to perform de novo ORF delineation in bacterial genomes irrespective of the sequence context of the reading frame.

Download Full-text

Competence beyond Genes: Filling in the Details of the Pneumococcal Competence Transcriptome by a Systems Approach

Journal of Bacteriology ◽

10.1128/jb.00238-19 ◽

2019 ◽

Vol 201 (13) ◽

Cited By ~ 2

Author(s):

Malcolm E. Winkler ◽

Donald A. Morrison

Keyword(s):

Genome Annotation ◽

Systems Approach ◽

Opportunistic Pathogen ◽

Data Sets ◽

Dna Uptake ◽

Central Process ◽

Natural Competence ◽

Content Type ◽

Temporal Regulation ◽

Component Gene

ABSTRACT DNA uptake by natural competence is a central process underlying the genetic plasticity, biology, and virulence of the human respiratory opportunistic pathogen Streptococcus pneumoniae. A study reported in this issue (J. Slager, R. Aprianto, and J.-W. Veening, J. Bacteriol. 201:e00780-18, https://doi.org/10.1128/JB.00780-18) combined deep-genome annotation and high-resolution transcriptome analyses to considerably extend the previous model of temporal regulation of competence at the operon and component gene levels. That extended study also provides a playbook for updating, refining, and extending genomic data sets and making them publicly available.

Download Full-text

Complete Genome Sequence of Listeria monocytogenes DFPST0073, Isolated from Imported Mexican Soft Cheese

Genome Announcements ◽

10.1128/genomea.00496-18 ◽

2018 ◽

Vol 6 (23) ◽

Author(s):

Joelle K. Salazar ◽

Lauren J. Gonsalves ◽

Kristin M. Schill ◽

Maria Sanchez Leon ◽

Nathan Anderson ◽

...

Keyword(s):

Listeria Monocytogenes ◽

Complete Genome Sequence ◽

Genome Annotation ◽

Prokaryotic Genome ◽

Illumina Miseq ◽

Annotation Pipeline ◽

Content Type ◽

Illumina Miseq Platform ◽

Soft Cheese ◽

Miseq Platform

ABSTRACT The genome of Listeria monocytogenes strain DFPST0073, isolated from imported fresh Mexican soft cheese in 2003, was sequenced using the Illumina MiSeq platform. Reads were assembled using SPAdes, and genome annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline.

Download Full-text

Reading between the Lines: Utilizing RNA-Seq Data for Global Analysis of sRNAs in Staphylococcus aureus

mSphere ◽

10.1128/msphere.00439-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Hailee M. Sorensen ◽

Rebecca A. Keogh ◽

Marcus A. Wittekind ◽

Andrew R. Caillet ◽

Richard E. Wiemels ◽

...

Keyword(s):

Staphylococcus Aureus ◽

Genome Annotation ◽

Small Rnas ◽

Regulatory Elements ◽

Data Sets ◽

Rna Seq ◽

Annotation File ◽

Data Set ◽

Content Type

ABSTRACT Regulatory small RNAs (sRNAs) are known to play important roles in the Gram-positive bacterial pathogen Staphylococcus aureus; however, their existence is often overlooked, primarily because sRNA genes are absent from genome annotation files. Consequently, transcriptome sequencing (RNA-Seq)-based experimental approaches, performed using standard genome annotation files as a reference, have likely overlooked data for sRNAs. Previously, we created an updated S. aureus genome annotation file, which included annotations for 303 known sRNAs in USA300. Here, we utilized this updated reference file to reexamine publicly available RNA-Seq data sets in an attempt to recover lost information on sRNA expression, stability, and potential to encode peptides. First, we used transcriptomic data from 22 studies to identify how the expression of 303 sRNAs changed under 64 different experimental conditions. Next, we used RNA-Seq data from an RNA stability assay to identify highly stable/unstable sRNAs. We went on to reanalyze a ribosome profiling (Ribo-seq) data set to identify sRNAs that have the potential to encode peptides and to experimentally confirm the presence of three of these peptides in the USA300 background. Interestingly, one of these sRNAs/peptides, encoded at the tsr37 locus, influences the ability of S. aureus cells to autoaggregate. Finally, we reexamined two recently published in vivo RNA-Seq data sets, from the cystic fibrosis (CF) lung and a murine vaginal colonization study, and identified 29 sRNAs that may play a role in vivo. Collectively, these results can help inform future studies of these important regulatory elements in S. aureus and highlight the need for ongoing curating and updating of genome annotation files. IMPORTANCE Regulatory small RNAs (sRNAs) are a class of RNA molecules that are produced in bacterial cells but that typically do not encode proteins. Instead, they perform a variety of critical functions within the cell as RNA. Most bacterial genomes do not include annotations for sRNA genes, and any type of analysis that is performed using a bacterial genome as a reference will therefore overlook data for sRNAs. In this study, we reexamined hundreds of previously generated S. aureus RNA-Seq data sets and reanalyzed them to generate data for sRNAs. To do so, we utilized an updated S. aureus genome annotation file, previously generated by our group, which contains annotations for 303 sRNAs. The data generated (which were previously discarded) shed new light on sRNAs in S. aureus, most of which are unstudied, and highlight certain sRNAs that are likely to play important roles in the cell.

Download Full-text

Evidence for Numerous Embedded Antisense Overlapping Genes in Diverse E. coli Strains

10.1101/2020.11.18.388249 ◽

2020 ◽

Author(s):

Barbara Zehentner ◽

Zachary Ardern ◽

Michaela Kreitmeier ◽

Siegfried Scherer ◽

Klaus Neuhaus

Keyword(s):

Genome Annotation ◽

Evolutionary Biology ◽

Bacterial Genome ◽

Ribosome Profiling ◽

Open Reading Frames ◽

E Coli ◽

Double Stranded Dna ◽

New Research ◽

K 12 ◽

Reading Frames

SUMMARYThe genetic code allows six reading frames at a double-stranded DNA locus, and many open reading frames (ORFs) overlap extensively with ORFs of annotated genes (e.g., at least 30 bp or having an embedded ORF). Currently, bacterial genome annotation systematically discards embedded overlapping ORFs of genes (OLGs) due to an assumed information-content constraint, and, consequently, very few OLGs are known. Here we use strand-specific RNAseq and ribosome profiling, detecting about 200 embedded or partially overlapping ORFs of gene candidates in the pathogen E. coli O157:H7 EDL933. These are typically short, many of them show clear promoter motifs as determined by Cappable-seq, indistinguishable from those of annotated genes, and are expressed at a low level. We could express most of them as stable proteins, and 49 displayed a potential phenotype. Ribosome profiling analyses in three other E. coli strains predicted between 84 and 190 embedded antisense OLGs per strain except in E. coli K-12, which is an atypical lab strain. We also found evidence of homology to annotated genes for 100 to 300 OLGs per E. coli strain investigated. Based on this evidence we suggest that bacterial OLGs deserve attention with respect to genome annotation and coding complexity of bacterial genomes. Such sequences may constitute an important coding reserve, opening up new research in genetics and evolutionary biology.

Download Full-text

Stereotactic radiation treatment planning and follow-up studies involving fused multimodality imaging

Journal of Neurosurgery ◽

10.3171/sup.2004.101.supplement3.0326 ◽

2004 ◽

Vol 101 (Supplement3) ◽

pp. 326-333 ◽

Cited By ~ 7

Author(s):

Klaus D. Hamm ◽

Gunnar Surber ◽

Michael Schmücking ◽

Reinhard E. Wurm ◽

Rene Aschenbach ◽

...

Keyword(s):

Image Fusion ◽

Treatment Planning ◽

Radiation Treatment ◽

Data Sets ◽

Slice Thickness ◽

Radiation Treatment Planning ◽

Content Type ◽

Follow Up Studies ◽

Fine Print

Object. Innovative new software solutions may enable image fusion to produce the desired data superposition for precise target definition and follow-up studies in radiosurgery/stereotactic radiotherapy in patients with intracranial lesions. The aim is to integrate the anatomical and functional information completely into the radiation treatment planning and to achieve an exact comparison for follow-up examinations. Special conditions and advantages of BrainLAB's fully automatic image fusion system are evaluated and described for this purpose. Methods. In 458 patients, the radiation treatment planning and some follow-up studies were performed using an automatic image fusion technique involving the use of different imaging modalities. Each fusion was visually checked and corrected as necessary. The computerized tomography (CT) scans for radiation treatment planning (slice thickness 1.25 mm), as well as stereotactic angiography for arteriovenous malformations, were acquired using head fixation with stereotactic arc or, in the case of stereotactic radiotherapy, with a relocatable stereotactic mask. Different magnetic resonance (MR) imaging sequences (T1, T2, and fluid-attenuated inversion-recovery images) and positron emission tomography (PET) scans were obtained without head fixation. Fusion results and the effects on radiation treatment planning and follow-up studies were analyzed. The precision level of the results of the automatic fusion depended primarily on the image quality, especially the slice thickness and the field homogeneity when using MR images, as well as on patient movement during data acquisition. Fully automated image fusion of different MR, CT, and PET studies was performed for each patient. Only in a few cases was it necessary to correct the fusion manually after visual evaluation. These corrections were minor and did not materially affect treatment planning. High-quality fusion of thin slices of a region of interest with a complete head data set could be performed easily. The target volume for radiation treatment planning could be accurately delineated using multimodal information provided by CT, MR, angiography, and PET studies. The fusion of follow-up image data sets yielded results that could be successfully compared and quantitatively evaluated. Conclusions. Depending on the quality of the originally acquired image, automated image fusion can be a very valuable tool, allowing for fast (∼ 1–2 minute) and precise fusion of all relevant data sets. Fused multimodality imaging improves the target volume definition for radiation treatment planning. High-quality follow-up image data sets should be acquired for image fusion to provide exactly comparable slices and volumetric results that will contribute to quality contol.

Download Full-text