Operon Prediction for Sequenced Bacterial Genomes without Experimental Information

Nicholas H. Bergman; Karla D. Passalacqua; Philip C. Hanna; Zhaohui S. Qin

doi:10.1128/aem.01686-06

Operon Prediction for Sequenced Bacterial Genomes without Experimental Information

Applied and Environmental Microbiology ◽

10.1128/aem.01686-06 ◽

2006 ◽

Vol 73 (3) ◽

pp. 846-854 ◽

Cited By ~ 25

Author(s):

Nicholas H. Bergman ◽

Karla D. Passalacqua ◽

Philip C. Hanna ◽

Zhaohui S. Qin

Keyword(s):

Bacterial Genome ◽

Experimental Information ◽

Prediction Algorithm ◽

Comparative Genomic ◽

Small Subset ◽

Bacterial Genomes ◽

Functional Relationships ◽

Operon Prediction ◽

Wide Range ◽

Generic Set

ABSTRACT Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods—it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the Bacillus anthracis genome and found that it successfully predicted all previously verified B. anthracis operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for B. anthracis biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.

Download Full-text

Bactopia: a flexible pipeline for complete analysis of bacterial genomes

10.1101/2020.02.28.969394 ◽

2020 ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Standard Procedure ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Genomic Analyses ◽

Conserved Genes ◽

Downstream Analysis

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.

Download Full-text

A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes

mSphere ◽

10.1128/msphere.00086-20 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Michelle Spoto ◽

Changhui Guan ◽

Elizabeth Fleming ◽

Julia Oh

Keyword(s):

Gene Function ◽

Large Scale ◽

Essential Gene ◽

Bacterial Species ◽

Bacterial Genome ◽

Model Organisms ◽

Design Parameters ◽

Bacterial Genomes ◽

Wide Range ◽

User Friendly

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.

Download Full-text

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

mSystems ◽

10.1128/msystems.00190-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Open Source ◽

Genome Analysis ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Data Sets ◽

Bacterial Genomes ◽

Data Set ◽

Content Type

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.

Download Full-text

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes

DNA Research ◽

10.1093/dnares/dsz017 ◽

2019 ◽

Vol 26 (5) ◽

pp. 391-398 ◽

Cited By ~ 15

Author(s):

Mitsuhiko P Sato ◽

Yoshitoshi Ogura ◽

Keiji Nakamura ◽

Ruriko Nishida ◽

Yasuhiro Gotoh ◽

...

Keyword(s):

Illumina Sequencing ◽

Bacterial Genome ◽

Gc Content ◽

Throughput Capacity ◽

Metagenomic Data ◽

Library Preparation ◽

Bacterial Genomes ◽

Sequencing Bias ◽

Wide Range ◽

Metagenome Sequencing

Abstract In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.

Download Full-text

OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes

BioMed Research International ◽

10.1155/2015/318217 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Kashish Chetal ◽

Sarath Chandra Janga

Keyword(s):

User Interface ◽

Operon Structure ◽

Rna Seq ◽

Bacterial Genomes ◽

Substantial Fraction ◽

Experimental Conditions ◽

Transcriptomic Data ◽

Operon Prediction ◽

Wide Range ◽

Prokaryotic Genomes

Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets.Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism.Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.

Download Full-text

Serodiagnosis and Bacterial Genome of Helicobacter pylori Infection

Toxins ◽

10.3390/toxins13070467 ◽

2021 ◽

Vol 13 (7) ◽

pp. 467

Author(s):

Aina Ichihara ◽

Hinako Ojima ◽

Kazuyoshi Gotoh ◽

Osamu Matsushita ◽

Susumu Take ◽

...

Keyword(s):

Helicobacter Pylori ◽

Antibody Titer ◽

Bacterial Genome ◽

Serum Antibody ◽

Gene Mutations ◽

Bacterial Genomes ◽

Western Blots ◽

A Genome ◽

Vaca Gene ◽

H Pylori

The infection caused by Helicobacter pylori is associated with several diseases, including gastric cancer. Several methods for the diagnosis of H. pylori infection exist, including endoscopy, the urea breath test, and the fecal antigen test, which is the serum antibody titer test that is often used since it is a simple and highly sensitive test. In this context, this study aims to find the association between different antibody reactivities and the organization of bacterial genomes. Next-generation sequences were performed to determine the genome sequences of four strains of antigens with different reactivity. The search was performed on the common genes, with the homology analysis conducted using a genome ring and dot plot analysis. The two antigens of the highly reactive strains showed a high gene homology, and Western blots for CagA and VacA also showed high expression levels of proteins. In the poorly responsive antigen strains, it was found that the inversion occurred around the vacA gene in the genome. The structure of bacterial genomes might contribute to the poor reactivity exhibited by the antibodies of patients. In the future, an accurate serodiagnosis could be performed by using a strain with few gene mutations of the antigen used for the antibody titer test of H. pylori.

Download Full-text

SCAPP: an algorithm for improved plasmid assembly in metagenomes

Microbiome ◽

10.1186/s40168-021-01068-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

David Pellow ◽

Alvah Zorea ◽

Maraike Probst ◽

Ori Furman ◽

Arik Segal ◽

...

Keyword(s):

Bacterial Species ◽

Bacterial Genome ◽

Biological Knowledge ◽

Assessment Procedure ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Human Gut ◽

Double Stranded Dna ◽

Wide Range ◽

Python Package

Abstract Background Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples. Results We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)—an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets. Conclusions SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP.

Download Full-text

Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries

mSystems ◽

10.1128/msystems.00731-19 ◽

2020 ◽

Vol 5 (1) ◽

Cited By ~ 14

Author(s):

Matthew R. Olm ◽

Alexander Crits-Christoph ◽

Spencer Diamond ◽

Adi Lavy ◽

Paula B. Matheus Carnevali ◽

...

Keyword(s):

Bacterial Diversity ◽

Ribosomal Proteins ◽

Large Scale ◽

Bacterial Species ◽

Bacterial Genome ◽

16S Rrna Genes ◽

Rrna Genes ◽

Species Discrimination ◽

Bacterial Genomes ◽

Discrimination Power

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.

Download Full-text

Insights into mammalian biology from the wild house mouse Mus musculus

eLife ◽

10.7554/elife.05959 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 45

Author(s):

Megan Phifer-Rixey ◽

Michael W Nachman

Keyword(s):

House Mouse ◽

Mus Musculus ◽

Genetic Model ◽

Inbred Strains ◽

Mendelian Inheritance ◽

Model Organisms ◽

House Mice ◽

Small Subset ◽

Wild Mice ◽

Wide Range

The house mouse, Mus musculus, was established in the early 1900s as one of the first genetic model organisms owing to its short generation time, comparatively large litters, ease of husbandry, and visible phenotypic variants. For these reasons and because they are mammals, house mice are well suited to serve as models for human phenotypes and disease. House mice in the wild consist of at least three distinct subspecies and harbor extensive genetic and phenotypic variation both within and between these subspecies. Wild mice have been used to study a wide range of biological processes, including immunity, cancer, male sterility, adaptive evolution, and non-Mendelian inheritance. Despite the extensive variation that exists among wild mice, classical laboratory strains are derived from a limited set of founders and thus contain only a small subset of this variation. Continued efforts to study wild house mice and to create new inbred strains from wild populations have the potential to strengthen house mice as a model system.

Download Full-text

Peripartum and neonatal factors associated with the persistence of neonatal brachial plexus palsy at 1 year: a review of 382 cases

Journal of Neurosurgery Pediatrics ◽

10.3171/2015.10.peds15543 ◽

2016 ◽

Vol 17 (5) ◽

pp. 618-624 ◽

Cited By ~ 9

Author(s):

Thomas J. Wilson ◽

Kate W. C. Chang ◽

Suneet P. Chauhan ◽

Lynda J. S. Yang

Keyword(s):

Brachial Plexus ◽

Prediction Algorithm ◽

University Of Michigan ◽

Brachial Plexus Palsy ◽

Factors Associated ◽

Neonatal Brachial Plexus Palsy ◽

Active Range Of Motion ◽

Wide Range ◽

Nerve Surgery ◽

The University

OBJECTIVE Neonatal brachial plexus palsy (NBPP) occurs due to the stretching of the nerves of the brachial plexus before, during, or after delivery. NBPP can resolve spontaneously or become persistent. To determine if nerve surgery is indicated, predicting recovery is necessary but difficult. Historical attempts explored the association of recovery with only clinical and electrodiagnostic examinations. However, no data exist regarding the neonatal and peripartum factors associated with NBPP persistence. METHODS This retrospective cohort study involved all NBPP patients at the University of Michigan between 2005 and 2015. Peripartum and neonatal factors were assessed for their association with persistent NBPP at 1 year, as defined as the presence of musculoskeletal contractures or an active range of motion that deviated from normal by > 10° (shoulder, elbow, hand, and finger ranges of motion were recorded). Standard statistical methods were used. RESULTS Of 382 children with NBPP, 85% had persistent NBPP at 1 year. A wide range of neonatal and peripartum factors was explored. We found that cephalic presentation, induction or augmentation of labor, birth weight > 9 lbs, and the presence of Horner syndrome all significantly increased the odds of persistence at 1 year, while cesarean delivery and Narakas Grade I to II injury significantly reduced the odds of persistence. CONCLUSIONS Peripartum/neonatal factors were identified that significantly altered the odds of having persistent NBPP at 1 year. Combining these peripartum/neonatal factors with previously published clinical examination findings associated with persistence should allow the development of a prediction algorithm. The implementation of this algorithm may allow the earlier recognition of those cases likely to persist and thus enable earlier intervention, which may improve surgical outcomes.

Download Full-text