Phandango: an interactive viewer for bacterial population genomics

James Hadfield; Nicholas J Croucher; Richard J Goater; Khalil Abudahab; David M Aanensen; Simon R Harris

doi:10.1093/bioinformatics/btx610

Population Genomics of Bacterial Plant Pathogens

Phytopathology ◽

10.1094/phyto-09-20-0412-rvw ◽

2020 ◽

pp. PHYTO-09-20-041

Author(s):

Christina Straub ◽

Elena Colombi ◽

Honour C. McCann

Keyword(s):

Plant Pathology ◽

Open Access ◽

Genetic Basis ◽

Plant Pathogens ◽

Bacterial Population ◽

Population Genomics ◽

Agricultural Practices ◽

Critical Importance ◽

Transmission Pathways ◽

Open Access Article

Population genomics is transforming our understanding of pathogen biology and evolution, and contributing to the prevention and management of disease in diverse crops. We provide an overview of key methods in bacterial population genomics and describe recent work focusing on three topics of critical importance to plant pathology: (i) resolving pathogen origins and transmission pathways during outbreak events, (ii) identifying the genetic basis of host specificity and virulence, and (iii) understanding how pathogens evolve in response to changing agricultural practices. [Formula: see text] Copyright © 2020 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license .

Download Full-text

Bacterial genospecies that are not ecologically coherent: population genomics of Rhizobium leguminosarum

Open Biology ◽

10.1098/rsob.140133 ◽

2015 ◽

Vol 5 (1) ◽

pp. 140133 ◽

Cited By ~ 83

Author(s):

Nitin Kumar ◽

Ganesh Lad ◽

Elisa Giuntini ◽

Maria E. Kaye ◽

Piyachat Udomwong ◽

...

Keyword(s):

Rhizobium Leguminosarum ◽

Bacterial Population ◽

Population Genomics ◽

Bacterial Species ◽

Core Gene ◽

Biological Species ◽

Ecological Groups ◽

Genetic Isolation ◽

Taxonomic Framework ◽

Core Genes

Biological species may remain distinct because of genetic isolation or ecological adaptation, but these two aspects do not always coincide. To establish the nature of the species boundary within a local bacterial population, we characterized a sympatric population of the bacterium Rhizobium leguminosarum by genomic sequencing of 72 isolates. Although all strains have 16S rRNA typical of R. leguminosarum , they fall into five genospecies by the criterion of average nucleotide identity (ANI). Many genes, on plasmids as well as the chromosome, support this division: recombination of core genes has been largely within genospecies. Nevertheless, variation in ecological properties, including symbiotic host range and carbon-source utilization, cuts across these genospecies, so that none of these phenotypes is diagnostic of genospecies. This phenotypic variation is conferred by mobile genes. The genospecies meet the Mayr criteria for biological species in respect of their core genes, but do not correspond to coherent ecological groups, so periodic selection may not be effective in purging variation within them. The population structure is incompatible with traditional ‘polyphasic taxonomy′ that requires bacterial species to have both phylogenetic coherence and distinctive phenotypes. More generally, genomics has revealed that many bacterial species share adaptive modules by horizontal gene transfer, and we envisage a more consistent taxonomic framework that explicitly recognizes this. Significant phenotypes should be recognized as ‘biovars' within species that are defined by core gene phylogeny.

Download Full-text

Bacterial population genomics and infectious disease diagnostics

Trends in Biotechnology ◽

10.1016/j.tibtech.2010.09.001 ◽

2010 ◽

Vol 28 (12) ◽

pp. 611-618 ◽

Cited By ~ 29

Author(s):

Sandeep J. Joseph ◽

Timothy D. Read

Keyword(s):

Infectious Disease ◽

Bacterial Population ◽

Population Genomics ◽

Disease Diagnostics ◽

Infectious Disease Diagnostics

Download Full-text

Emergence and Spread of Antimicrobial Resistance: Recent Insights from Bacterial Population Genomics

Current Topics in Microbiology and Immunology - How to Overcome the Antibiotic Crisis ◽

10.1007/82_2016_505 ◽

2016 ◽

pp. 35-53 ◽

Cited By ~ 2

Author(s):

Ulrich Nübel

Keyword(s):

Antimicrobial Resistance ◽

Bacterial Population ◽

Population Genomics

Download Full-text

ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses

10.1101/2020.10.13.336479 ◽

2020 ◽

Author(s):

Natasha Pavlovikj ◽

Joao Carlos Gomes-Neto ◽

Jitender S. Deogun ◽

Andrew K. Benson

Keyword(s):

Case Studies ◽

High Throughput ◽

Bacterial Population ◽

Population Genomics ◽

Bacterial Species ◽

Population Based ◽

Epidemiological Surveillance ◽

Specific Gene ◽

Genotype Frequencies ◽

Population Scale

AbstractWhole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Scalability and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: 1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; 2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; 3) Use of high-performance and high-throughput computational platforms; 4) Generation of hierarchical population-based genotypes at different scales of resolution based on combinations of multi-locus and Bayesian statistical approaches for classification; 5) Detection of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases and association with genotypic classifications; and 6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species and the Pegasus WMS facilitates addition or removal of programs from the workflow or modification of options within them. All the dependencies of ProkEvo can be distributed via conda environment or Docker image. To demonstrate versatility of the ProkEvo platform, we performed population-based analyses from available genomes of three distinct pathogenic bacterial species as individual case studies (three serovars of Salmonella enterica, as well as Campylobacter jejuni and Staphylococcus aureus). The specific case studies used reproducible Python and R scripts documented in Jupyter Notebooks and collectively illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be used to generate novel hypotheses about the evolutionary history and ecological characteristics of specific populations of each pathogen. Collectively, our study shows that ProkEvo presents a viable option for scalable, automated analyses of bacterial populations with powerful applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.

Download Full-text

Phandango: an interactive viewer for bacterial population genomics

10.1101/119545 ◽

2017 ◽

Cited By ~ 13

Author(s):

James Hadfield ◽

Nicholas J. Croucher ◽

Richard J Goater ◽

Khalil Abudahab ◽

David M Aanensen ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Bacterial Population ◽

Population Genomics ◽

Genomic Analysis ◽

Evolutionary Analysis ◽

Base Pairs ◽

Web Browser ◽

Link Type ◽

Scale Population

ABSTRACTSummaryFully exploiting the wealth of data in current bacterial population genomics datasets requires synthesising and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.AvailabilityPhandango is a web application freely available for use at https://jameshadfield.github.io/phandango and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/[email protected], [email protected]

Download Full-text

A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter

Genes ◽

10.3390/genes3020261 ◽

2012 ◽

Vol 3 (2) ◽

pp. 261-277 ◽

Cited By ~ 93

Author(s):

Samuel K. Sheppard ◽

Keith A. Jolley ◽

Martin C. J. Maiden

Keyword(s):

Bacterial Population ◽

Population Genomics ◽

Whole Genome

Download Full-text

Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease

Microorganisms ◽

10.3390/microorganisms8040549 ◽

2020 ◽

Vol 8 (4) ◽

pp. 549 ◽

Cited By ~ 2

Author(s):

DJ Darwin R. Bandoy ◽

Bart C. Weimer

Keyword(s):

Machine Learning ◽

Bacterial Population ◽

Genome Wide Association Study ◽

Population Genomics ◽

Virulence Gene ◽

Spatiotemporal Analysis ◽

Nucleotide Polymorphisms ◽

Allelic Variants ◽

Automated Annotation

Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms.

Download Full-text

Biological machine learning combined with bacterial population genomics reveals common and rare allelic variants of genes to cause disease

10.1101/739540 ◽

2019 ◽

Author(s):

DJ Darwin R. Bandoy ◽

Bart C. Weimer

Keyword(s):

Machine Learning ◽

Bacterial Population ◽

Genome Wide Association Study ◽

Biological Function ◽

Population Genomics ◽

Accurate Estimation ◽

Allelic Variants ◽

Genome Wide ◽

Spatial Locations

AbstractHighly dimensional data generated from bacterial whole genome sequencing is providing unprecedented scale of information that requires appropriate statistical frameworks of analysis to infer biological function from bacterial genomic populations. Application of genome wide association study (GWAS) methods is an emerging approach with bacterial population genomics that yields a list of genes associated with a phenotype with an undefined importance among the candidates in the list. Here, we validate the combination of GWAS, machine learning, and pathogenic bacterial population genomics as a novel scheme to identify SNPs and rank allelic variants to determine associations for accurate estimation of disease phenotype. This approach parsed a dataset of 1.2 million SNPs that resulted in a ranked importance of associated alleles of Campylobacter jejuni porA using multiple spatial locations over a 30-year period. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This approach, termed BioML, defined intestinal and extraintestinal groups that have differential allelic variants that cause abortion. Divergent variants containing indels that defeated gene callers were rescued using biological context and knowledge that resulted in defining rare and divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled to GWAS and population genomics to simultaneously identify and rank alleles to define their role in abortion, and more broadly infectious disease.

Download Full-text

Bacterial Population Genomics

Handbook of Statistical Genomics ◽

10.1002/9781119487845.ch36 ◽

2019 ◽

pp. 997-1020 ◽

Cited By ~ 3

Author(s):

Jukka Corander ◽

Nicholas J. Croucher ◽

Simon R. Harris ◽

John A. Lees ◽

Gerry Tonkin‐Hill

Keyword(s):

Bacterial Population ◽

Population Genomics

Download Full-text