Tychus: a whole genome sequencing pipeline for assembly, annotation and phylogenetics of bacterial genomes

Mapping Intimacies ◽

10.1101/283101 ◽

2018 ◽

Cited By ~ 1

Author(s):

Christopher Dean ◽

Noelle Noyes ◽

Steven Lakin ◽

Pablo Rovira-Sanz ◽

Xiang Yang ◽

...

Keyword(s):

Open Source ◽

Bacterial Genome ◽

Whole Genome Sequence ◽

Whole Genome ◽

Bacterial Genomes ◽

High Confidence ◽

Comprehensive Description ◽

Variant Discovery ◽

Large Numbers ◽

Virtualization Technology

AbstractSummaryTychus is a tool that allows researchers to perform massively parallel whole genome sequence (WGS) analysis with the goal of producing a high confidence and comprehensive description of the bacterial genome. Key features of the Tychus pipeline include the assembly, annotation, alignment, variant discovery and phylogenetic inference of large numbers of WGS isolates in parallel using open-source bioinformatics tools and virtualization technology. All prerequisite tools and dependencies come packaged together in a single suite that can be easily downloaded and installed on Linux and Mac operating systems.AvailabilityTychus is freely available as an open-source package under the MIT license, and can be downloaded via GitHub (https://github.com/Abdo-Lab/Tychus)[email protected]

Download Full-text

Reads2Resistome: An adaptable and high-throughput whole-genome sequencing pipeline for bacterial resistome characterization

10.1101/2020.05.18.102715 ◽

2020 ◽

Author(s):

Reed Woyda ◽

Adelumola Oladeinde ◽

Zaid Abdo

Keyword(s):

Antibiotic Resistance ◽

Bacterial Isolate ◽

Bacterial Genome ◽

Hybrid Approach ◽

Antibiotic Resistance Genes ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Comprehensive Description ◽

Short Read Sequencing ◽

Sequencing Technologies

AbstractSummaryThe bacterial resistome is the collection of all the antibiotic resistance genes, virulence genes, and other resistance elements within a bacterial isolate genome including plasmids and bacteriophage regions. Accurately characterizing the resistome is crucial for prevention and mitigation of emerging antibiotic resistance threats to animal and human health. Reads2Resistome is a tool which allows researchers to assemble and annotate bacterial genomes using long or short read sequencing technologies or both in a hybrid approach. Using a massively parallel analysis pipeline, Reads2Resistome performs assembly, annotation and resistome characterization with the goal of producing an accurate and comprehensive description of a bacterial genome and resistome contents. Key features of the Reads2Resistome pipeline include quality control of input sequencing reads, genome assembly, genome annotation, resistome characterization and alignment. All prerequisite dependencies come packaged together in a single suit which can easily be downloaded and run on Linux and Mac operating systems.AvailabilityReads2Resistome is freely available as an open-source package under the MIT license, and can be downloaded via GitHub (https://github.com/BioRRW/Reads2Resistome).

Download Full-text

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

mSystems ◽

10.1128/msystems.00190-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Open Source ◽

Genome Analysis ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Data Sets ◽

Bacterial Genomes ◽

Data Set ◽

Content Type

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.

Download Full-text

Whole Genome Sequence, Variant Discovery and Annotation in Mapuche-Huilliche Native South Americans

Scientific Reports ◽

10.1038/s41598-019-39391-z ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Elena A. Vidal ◽

Tomás C. Moyano ◽

Bernabé I. Bustos ◽

Eduardo Pérez-Palma ◽

Carol Moraga ◽

...

Keyword(s):

Genome Sequence ◽

Whole Genome Sequence ◽

Sequence Variant ◽

Whole Genome ◽

Variant Discovery ◽

South Americans

Download Full-text

Inconsistent PCR detection of Shiga toxin-producing Escherichia coli: Insights from whole genome sequence analyses

PLoS ONE ◽

10.1371/journal.pone.0257168 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257168

Author(s):

Vinicius Silva Castro ◽

Rodrigo Ortega Polo ◽

Eduardo Eustáquio de Souza Figueiredo ◽

Emmanuel Wihkochombom Bumunange ◽

Tim McAllister ◽

...

Keyword(s):

Escherichia Coli ◽

Shiga Toxin ◽

Bacterial Genome ◽

Disease Outbreaks ◽

Illumina Miseq ◽

Feedlot Cattle ◽

Pcr Detection ◽

Whole Genome Sequence ◽

Whole Genome ◽

The Stability

Shiga toxin-producing Escherichia coli (STEC) have been linked to food-borne disease outbreaks. As PCR is routinely used to screen foods for STEC, it is important that factors leading to inconsistent detection of STEC by PCR are understood. This study used whole genome sequencing (WGS) to investigate causes of inconsistent PCR detection of stx1, stx2, and serogroup-specific genes. Fifty strains isolated from Alberta feedlot cattle from three different studies were selected with inconsistent or consistent detection of stx and serogroup by PCR. All isolates were initially classified as STEC by PCR. Sequencing was performed using Illumina MiSeq® with sample library by Nextera XT. Virtual PCRs were performed using Geneious and bacteriophage content was determined using PHASTER. Sequencing coverage ranged from 47 to 102x, averaging 74x, with sequences deposited in the NCBI database. Eleven strains were confirmed by WGS as STEC having complete stxA and stxB subunits. However, truncated stx fragments occurred in twenty-two other isolates, some having multiple stx fragments in the genome. Isolates with complete stx by WGS had consistent stx1 and stx2 detection by PCR, although one also having a stx2 fragment had inconsistent stx2 PCR. For all STEC and 18/39 non-STEC, serogroups determined by PCR agreed with those determined by WGS. An additional three WGS serotypes were inconclusive and two isolates were Citrobacter spp. Results demonstrate that stx fragments associated with stx-carrying bacteriophages in the E. coli genome may contribute to inconsistent detection of stx1 and stx2 by PCR. Fourteen isolates had integrated stx bacteriophage but lacked complete or fragmentary stx possibly due to partial bacteriophage excision after sub-cultivation or other unclear mechanisms. The majority of STEC isolates (7/11) did not have identifiable bacteriophage DNA in the contig(s) where stx was located, likely increasing the stability of stx in the bacterial genome and its detection by PCR.

Download Full-text

The In Silico Genotyper (ISG): an open-source pipeline to rapidly identify and annotate nucleotide variants for comparative genomics applications

10.1101/015578 ◽

2015 ◽

Cited By ~ 20

Author(s):

Jason W Sahl ◽

Stephen M Beckstrom-Sternberg ◽

James Babic-Sternberg ◽

John D Gillece ◽

Crystal M Hepp ◽

...

Keyword(s):

Comparative Genomics ◽

Open Source ◽

In Silico ◽

Sequence Data ◽

Source Code ◽

Whole Genome Sequence ◽

Nucleotide Polymorphisms ◽

Bacterial Genomes ◽

Single Nucleotide ◽

General Public License

The identification and annotation of nucleotide variants, including insertions/deletions and single nucleotide polymorphisms (SNPs), from whole genome sequence data is important for studies of bacterial evolution, comparative genomics, and phylogeography. The in silico Genotyper (ISG) represents a parallel, tested, open source tool that can perform these functions and scales well to thousands of bacterial genomes. ISG is written in Java and requires MUMmer (Delcher, et al., 2003), BWA (Li and Durbin, 2009), and GATK (McKenna, et al., 2010) for full functionality. The source code and compiled binaries are freely available from https://github.com/TGenNorth/ISGPipeline under a GNU General Public License. Benchmark comparisons demonstrate that ISG is faster and more flexible than comparable tools.

Download Full-text

Nuclease pre-treatment increases efficiency of whole genome sequencing of Influenza B virus in respiratory specimens

Asia Pacific Journal of Molecular Biology and Biotechnology ◽

10.35118/apjmbb.2020.028.1.01 ◽

2020 ◽

pp. 1-13

Author(s):

Wudtichai Manasatienkij ◽

Piyawan Chinnawirotpisan ◽

Weerayuth Kittichotirat ◽

Sriluck Simasathien ◽

Louis R. Macareo ◽

...

Keyword(s):

Human Genome ◽

Viral Genome ◽

Bacterial Genome ◽

Whole Genome Sequence ◽

Influenza B Virus ◽

Influenza B ◽

Whole Genome ◽

Genome Sequences ◽

B Virus ◽

Respiratory Specimens

The use of next generation sequencing (NGS) directly on respiratory specimens to obtain viral whole genome sequence (WGS) enhances the capability for rapid and unbiased viral characterization. One of the challenges of using NGS directly in influenza-like illness (ILI) respiratory specimens is the higher proportion of host and bacterial genome compared to viral genetic materials found, which reduces the likelihood of obtaining complete viral genome sequences. This study aims to evaluate nuclease pretreatments prior to sequencing of influenza B virus directly from ILI respiratory specimens. Sequence data were mapped to human, bacteria and influenza B viral genome. In the absence of any nuclease pretreatments, the sequence reads identified as Haemophilus influenzae, Haemophilus parainfluenzae, Neisseria meningitidis and Veillonella parvula were the most prominent genetic materials in respiratory specimens. Filtration followed by nuclease treatment reduced bacterial sequence reads by at least 70 folds in all 4 tested samples, supporting the direct application of NGS in ILI respiratory specimens. Although the pretreatment methods significantly reduced human genome sequences, the remaining human genome especially human rRNA still impact the number and proportion of the viral sequence reads.

Download Full-text

BacWGSTdb 2.0: a one-stop repository for bacterial whole-genome sequence typing and source tracking

Nucleic Acids Research ◽

10.1093/nar/gkaa821 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D644-D650 ◽

Cited By ~ 3

Author(s):

Ye Feng ◽

Shengmei Zou ◽

Hangfei Chen ◽

Yunsong Yu ◽

Zhi Ruan

Keyword(s):

Genome Sequence ◽

Genomic Sequence ◽

Sequence Data ◽

Bacterial Genome ◽

Whole Genome Sequence ◽

Source Tracking ◽

Resistant Bacteria ◽

Computational Techniques ◽

Whole Genome ◽

One Stop

Abstract An increasing prevalence of hospital acquired infections and foodborne illnesses caused by pathogenic and multidrug-resistant bacteria has stimulated a pressing need for benchtop computational techniques to rapidly and accurately classify bacteria from genomic sequence data, and based on that, to trace the source of infection. BacWGSTdb (http://bacdb.org/BacWGSTdb) is a free publicly accessible database we have developed for bacterial whole-genome sequence typing and source tracking. This database incorporates extensive resources for bacterial genome sequencing data and the corresponding metadata, combined with specialized bioinformatics tools that enable the systematic characterization of the bacterial isolates recovered from infections. Here, we present BacWGSTdb 2.0, which encompasses several major updates, including (i) the integration of the core genome multi-locus sequence typing (cgMLST) approach, which is highly scalable and appropriate for typing isolates belonging to different lineages; (ii) the addition of a multiple genome analysis module that can process dozens of user uploaded sequences in a batch mode; (iii) a new source tracking module for comparing user uploaded plasmid sequences to those deposited in the public databases; (iv) the number of species encompassed in BacWGSTdb 2.0 has increased from 9 to 20, which represents bacterial pathogens of medical importance; (v) a newly designed, user-friendly interface and a set of visualization tools for providing a convenient platform for users are also included. Overall, the updated BacWGSTdb 2.0 bears great utility in continuing to provide users, including epidemiologists, clinicians and bench scientists, with a one-stop solution to bacterial genome sequence analysis.

Download Full-text

High confidence copy number variants identified in Holstein dairy cattle from whole genome sequence and genotype array data

Scientific Reports ◽

10.1038/s41598-020-64680-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Adrien M. Butty ◽

Tatiane C. S. Chud ◽

Filippo Miglior ◽

Flavio S. Schenkel ◽

Arun Kommadath ◽

...

Keyword(s):

Dairy Cattle ◽

Genome Sequence ◽

Copy Number ◽

Copy Number Variants ◽

Whole Genome Sequence ◽

Whole Genome ◽

High Confidence ◽

Array Data ◽

Genotype Array

Download Full-text

Identification of high-confidence somatic mutations in whole genome sequence of formalin-fixed breast cancer specimens

Nucleic Acids Research ◽

10.1093/nar/gks299 ◽

2012 ◽

Vol 40 (14) ◽

pp. e107-e107 ◽

Cited By ~ 64

Author(s):

Shawn E. Yost ◽

Erin N. Smith ◽

Richard B. Schwab ◽

Lei Bao ◽

HyunChul Jung ◽

...

Keyword(s):

Breast Cancer ◽

Genome Sequence ◽

Somatic Mutations ◽

Whole Genome Sequence ◽

Whole Genome ◽

High Confidence ◽

Formalin Fixed

Download Full-text

Benchmarking bacterial genome-wide association study (GWAS) methods using simulated genomes and phenotypes

10.1101/795492 ◽

2019 ◽

Author(s):

Morteza M. Saber ◽

Jesse Shapiro

Keyword(s):

Antibiotic Resistance ◽

Population Structure ◽

Bacterial Genome ◽

Elastic Net ◽

Genome Wide Association ◽

Effect Sizes ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Bacterial Genomes ◽

Genome Wide

AbstractGenome Wide Association Studies (GWASs) have the potential to reveal the genetics of microbial phenotypes such as antibiotic resistance and virulence. Capitalizing on the growing wealth of bacterial sequence data, microbial GWAS methods aim to identify causal genetic variants while ignoring spurious associations. Bacteria reproduce clonally, leading to strong population structure and genome-wide linkage, making it challenging to separate true “hits” (i.e. mutations that cause a phenotype) from non-causal linked mutations. GWAS methods attempt to correct for population structure in different ways, but their performance has not yet been systematically evaluated. Here we developed a bacterial GWAS simulator (BacGWASim) to generate bacterial genomes with varying rates of mutation, recombination, and other evolutionary parameters, along with a subset of causal mutations underlying a phenotype of interest. We assessed the performance (recall and precision) of three widely-used univariate GWAS approaches (cluster-based, dimensionality-reduction, and linear mixed models, implemented in PLINK, pySEER, and GEMMA) and one relatively new whole-genome elastic net model implemented in pySEER, across a range of simulated sample sizes, recombination rates, and causal mutation effect sizes. As expected, all methods performed better with larger sample sizes and effect sizes. The performance of clustering and dimensionality reduction approaches to correct for population structure were considerably variable according to the choice of parameters. Notably, the elastic net whole-genome model was consistently amongst the highest-performing methods and had the highest power in detecting causal variants with both low and high effect sizes. Most methods reached good performance (Recall > 0.75) to identify causal mutations of strong effect size (log Odds Ratio >= 2) with a sample size of 2000 genomes. However, only elastic nets reached reasonable performance (Recall = 0.35) for detecting markers with weaker effects (log OR ∼1) in smaller samples. Elastic nets also showed superior precision and recall in controlling for genome-wide linkage, relative to univariate models. However, all methods performed relatively poorly on highly clonal (low-recombining) genomes, suggesting room for improvement in method development. These findings show the potential for whole-genome models to improve bacterial GWAS performance. BacGWASim code and simulated data are publicly available to enable further comparisons and benchmarking of new methods.Author summaryMicrobial populations contain measurable phenotypic differences with important clinical and environmental consequences, such as antibiotic resistance, virulence, host preference and transmissibility. A major challenge is to discover the genes and mutations in bacterial genomes that control these phenotypes. Bacterial Genome-Wide Association Studies (GWASs) are family of methods to statistically associate phenotypes with genotypes, such as point mutations and other variants across the genome. However, compared to sexual organisms such as humans, bacteria reproduce clonally meaning that causal mutations tend to be strongly linked to other mutations on the same chromosome. This genome-wide linkage makes it challenging to statistically separate causal mutations from non-causal false-positive associations. Several GWAS methods are currently available, but it is not clear which is the most powerful and accurate for bacteria. To systematically evaluate these methods, we developed BacGWASim, a computational pipeline to simulate the evolution of bacterial genomes and phenotypes. Using simulated genomes, we found that GWAS methods varied widely in their performance. In general, causal mutations of strong effect (e.g. those under strong selection for antibiotic resistance) could be easily identified with relatively small samples sizes of around 1000 genomes, but more complex phenotypes controlled by mutations of weaker effect required 3000 genomes or more. We found that a recently-developed GWAS method called elastic net was particularly good at identifying causal mutations in highly clonal populations, with strong linkage between mutations – but there is still room for improvement. The BacGWASim computer code is publicly available to enable further comparisons and benchmarking of new methods.

Download Full-text