BugBuilder - An Automated Microbial Genome Assembly and Analysis Pipeline

Mapping Intimacies ◽

10.1101/148783 ◽

2017 ◽

Cited By ~ 4

Author(s):

Abbott J.C.

Keyword(s):

Genome Assembly ◽

Data Science ◽

Microbial Genome ◽

Analysis Pipeline ◽

Microbial Genomes ◽

Science Group ◽

Finishing Processes ◽

Sequence Types ◽

Machine Image ◽

Common Sequence

AbstractSummaryBugBuilder is a framework for hands-free assembly and annotation of microbial genomes. It produces outputs suitable either for database submission or downstream finishing processes. It is configurable to work with most command-line assembly and scaffolding tools which are selectable at run-time, and supports all common sequence types used in microbial genome assembly.Availability and ImplementationBugBuilder is implemented in Perl and is available under the Artistic License from http://www.imperial.ac.uk/bioinformatics-data-science-group/resources/software/bugbuilder, A virtual machine image is available pre-configured with the relevant freely-redistributable [email protected]

Download Full-text

Comparative evaluation of Nanopore polishing tools for microbial genome assembly and polishing strategies for downstream analysis

Scientific Reports ◽

10.1038/s41598-021-00178-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jin Young Lee ◽

Minyoung Kong ◽

Jinjoo Oh ◽

JinSoo Lim ◽

Sung Hee Chung ◽

...

Keyword(s):

Genome Assembly ◽

Gene Prediction ◽

Cost Effective ◽

Microbial Genome ◽

High Quality ◽

Short Read ◽

Microbial Genomes ◽

Base Calling ◽

Long Read ◽

Downstream Analysis

AbstractAssembling high-quality microbial genomes using only cost-effective Nanopore long-read systems such as Flongle is important to accelerate research on the microbial genome and the most critical point for this is the polishing process. In this study, we performed an evaluation based on BUSCO and Prokka gene prediction in terms of microbial genome assembly for eight state-of-the-art Nanopore polishing tools and combinations available. In the evaluation of individual tools, Homopolish, PEPPER, and Medaka demonstrated better results than others. In combination polishing, the second round Homopolish, and the PEPPER × medaka combination also showed better results than others. However, individual tools and combinations have specific limitations on usage and results. Depending on the target organism and the purpose of the downstream research, it is confirmed that there remain some difficulties in perfectly replacing the hybrid polishing carried out by the addition of a short-read. Nevertheless, through continuous improvement of the protein pores, related base-calling algorithms, and polishing tools based on improved error models, a high-quality microbial genome can be achieved using only Nanopore reads without the production of additional short-read data. The polishing strategy proposed in this study is expected to provide useful information for assembling the microbial genome using only Nanopore reads depending on the target microorganism and the purpose of the research.

Download Full-text

Molecular characterization of clinical carbapenem-resistant Enterobacterales from Qatar

European Journal of Clinical Microbiology & Infectious Diseases ◽

10.1007/s10096-021-04185-7 ◽

2021 ◽

Author(s):

Fatma Ben Abid ◽

Clement K. M. Tsui ◽

Yohei Doi ◽

Anand Deshmukh ◽

Christi L. McElheny ◽

...

Keyword(s):

Common Species ◽

Clinical Samples ◽

Whole Genome ◽

E Coli ◽

Carbapenem Resistant ◽

Genes Encoding ◽

Encoding Genes ◽

Sequence Types ◽

Common Sequence

AbstractOne hundred forty-nine carbapenem-resistant Enterobacterales from clinical samples obtained between April 2014 and November 2017 were subjected to whole genome sequencing and multi-locus sequence typing. Klebsiella pneumoniae (81, 54.4%) and Escherichia coli (38, 25.5%) were the most common species. Genes encoding metallo-β-lactamases were detected in 68 (45.8%) isolates, and OXA-48-like enzymes in 60 (40.3%). blaNDM-1 (45; 30.2%) and blaOXA-48 (29; 19.5%) were the most frequent. KPC-encoding genes were identified in 5 (3.6%) isolates. Most common sequence types were E. coli ST410 (8; 21.1%) and ST38 (7; 18.4%), and K. pneumoniae ST147 (13; 16%) and ST231 (7; 8.6%).

Download Full-text

Reproducible Analysis Pipeline for Data Streams: Open-Source Software to Process Data Collected With Mobile Devices

Frontiers in Digital Health ◽

10.3389/fdgth.2021.769823 ◽

2021 ◽

Vol 3 ◽

Author(s):

Julio Vega ◽

Meng Li ◽

Kwesi Aguillera ◽

Nikunj Goel ◽

Echhit Joshi ◽

...

Keyword(s):

Open Source ◽

Data Streams ◽

Data Science ◽

Wearable Devices ◽

Mobile Sensors ◽

Sensor Data ◽

Process Data ◽

Analysis Pipeline ◽

Ground Truth Data ◽

Reproducible Analysis

Smartphone and wearable devices are widely used in behavioral and clinical research to collect longitudinal data that, along with ground truth data, are used to create models of human behavior. Mobile sensing researchers often program data processing and analysis code from scratch even though many research teams collect data from similar mobile sensors, platforms, and devices. This leads to significant inefficiency in not being able to replicate and build on others' work, inconsistency in quality of code and results, and lack of transparency when code is not shared alongside publications. We provide an overview of Reproducible Analysis Pipeline for Data Streams (RAPIDS), a reproducible pipeline to standardize the preprocessing, feature extraction, analysis, visualization, and reporting of data streams coming from mobile sensors. RAPIDS is formed by a group of R and Python scripts that are executed on top of reproducible virtual environments, orchestrated by a workflow management system, and organized following a consistent file structure for data science projects. We share open source, documented, extensible and tested code to preprocess, extract, and visualize behavioral features from data collected with any Android or iOS smartphone sensing app as well as Fitbit and Empatica wearable devices. RAPIDS allows researchers to process mobile sensor data in a rigorous and reproducible way. This saves time and effort during the data analysis phase of a project and facilitates sharing analysis workflows alongside publications.

Download Full-text

Mother machine image analysis with MM3

10.1101/810036 ◽

2019 ◽

Cited By ~ 1

Author(s):

John T. Sauls ◽

Jeremy W. Schroeder ◽

Steven D. Brown ◽

Guillaume Le Treut ◽

Fangwei Si ◽

...

Keyword(s):

Image Analysis ◽

Phase Contrast ◽

Time Lapse ◽

Command Line ◽

Throughput Time ◽

Analysis Pipeline ◽

Machine Experiment ◽

Command Line Tool ◽

Time Lapse Imaging ◽

Machine Image

The mother machine is a microfluidic device for high-throughput time-lapse imaging of microbes. Here, we present MM3, a complete and modular image analysis pipeline. MM3 turns raw mother machine images, both phase contrast and fluorescence, into a data structure containing cells with their measured features. MM3 employs machine learning and non-learning algorithms, and is implemented in Python. MM3 is easy to run as a command line tool with the occasional graphical user interface on a PC or Mac. A typical mother machine experiment can be analyzed within one day. It has been extensively tested, is well documented and publicly available via Github.

Download Full-text

Complete Genome Sequences of Four ToxigenicClostridium difficileClinical Isolates from Patients of the Lower Hudson Valley, New York, USA

Genome Announcements ◽

10.1128/genomea.01537-17 ◽

2018 ◽

Vol 6 (4) ◽

Cited By ~ 1

Author(s):

Changhong Yin ◽

Donald S. Chen ◽

Jian Zhuge ◽

Donna McKenna ◽

Joan Sagurton ◽

...

Keyword(s):

New York ◽

Clostridium Difficile ◽

Complete Genome ◽

Genome Sequences ◽

Circular Chromosome ◽

Hudson Valley ◽

Sequence Types ◽

Common Sequence

ABSTRACTComplete genome sequences of four toxigenicClostridium difficileisolates from patients in the lower Hudson Valley, New York, USA, were achieved. These isolates represent four common sequence types (ST1, ST2, ST8, and ST42) belonging to two distinct phylogenetic clades. All isolates have a 4.0- to 4.2-Mb circular chromosome, and one carries a phage.

Download Full-text

Microbial genome analysis: the COG approach

Briefings in Bioinformatics ◽

10.1093/bib/bbx117 ◽

2017 ◽

Vol 20 (4) ◽

pp. 1063-1070 ◽

Cited By ~ 27

Author(s):

Michael Y Galperin ◽

David M Kristensen ◽

Kira S Makarova ◽

Yuri I Wolf ◽

Eugene V Koonin

Keyword(s):

Genome Analysis ◽

Metabolic Pathways ◽

Genome Annotation ◽

Functional Characterization ◽

Microbial Genome ◽

Functional Categories ◽

Orthologous Genes ◽

Microbial Genomes ◽

The Past

Abstract For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.

Download Full-text

Corrigendum to “Changes in the six most common sequence types of Neisseria gonorrhoeae, including ST4378, identified by surveillance of antimicrobial resistance in northern Taiwan from 2006 to 2013” [J Microbiol Immunol Infect 49 (5) (2016) 708–716]

Journal of Microbiology Immunology and Infection ◽

10.1016/j.jmii.2017.02.001 ◽

2018 ◽

Vol 51 (3) ◽

pp. 422

Author(s):

Ching-Wai Cheng ◽

Lan-Hui Li ◽

Chen-Yi Su ◽

Shu-Ying Li ◽

Muh-Yong Yen

Keyword(s):

Antimicrobial Resistance ◽

Neisseria Gonorrhoeae ◽

Sequence Types ◽

Northern Taiwan ◽

Common Sequence

Download Full-text

CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

BMC Genomics ◽

10.1186/s12864-017-3717-3 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 9

Author(s):

Sonia Agrawal ◽

Cesar Arze ◽

Ricky S. Adkins ◽

Jonathan Crabtree ◽

David Riley ◽

...

Keyword(s):

Sequence Analysis ◽

Genome Sequence ◽

Microbial Genome ◽

Analysis Pipeline ◽

Genome Sequence Analysis

Download Full-text

AFLAP: assembly-free linkage analysis pipeline using k-mers from genome sequencing data

Genome Biology ◽

10.1186/s13059-021-02326-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kyle Fletcher ◽

Lin Zhang ◽

Juliana Gil ◽

Rongkui Han ◽

Keri Cavanaugh ◽

...

Keyword(s):

Linkage Analysis ◽

Genome Sequencing ◽

Genome Assembly ◽

Simulated Data ◽

Genetic Maps ◽

Sequencing Data ◽

Analysis Pipeline ◽

A Genome ◽

Genotype By Sequencing ◽

Genome Assemblies

AbstractOur assembly-free linkage analysis pipeline (AFLAP) identifies segregating markers as k-mers in the raw reads without using a reference genome assembly for calling variants and provides genotype tables for the construction of unbiased, high-density genetic maps without a genome assembly. AFLAP is validated and contrasted to a conventional workflow using simulated data. AFLAP is applied to whole genome sequencing and genotype-by-sequencing data of F1, F2, and recombinant inbred populations of two different plant species, producing genetic maps that are concordant with genome assemblies. The AFLAP-based genetic map for Bremia lactucae enables the production of a chromosome-scale genome assembly.

Download Full-text

Inner Social Interactions Model of Big Data Impact on Economical Framework

South Asian Journal of Social Studies and Economics ◽

10.9734/sajsse/2018/v1i325800 ◽

2018 ◽

pp. 1-6

Author(s):

A. Alatorre

Keyword(s):

Mathematical Model ◽

Big Data ◽

Social Interaction ◽

Social Interactions ◽

Study Design ◽

Data Science ◽

High Volume ◽

Design Chain ◽

Science Group ◽

Physics Department

Aims / Objectives: This particular study is aimed to develop some proper mathematical model to justify the big data consuming economical framework with the proper social interactions. So that it can build some major key processes assessing several types of economical frames. Study Design: Chain Phenomena Analysis Place and Duration of Study: University of Guadalajara, Physics Department, Data Science Group. Results: Model exposition. Conclusion: I show how, as long as time change currently, social interaction impact on economical framework has become bigger. Big Data tools to manipulate high volume levels of information from this interactions have been a strongest platform to analyse economical indicators, such as those which repercussions affects financial stock markets. This process is modelled in this article.

Download Full-text