scholarly journals Phandango: an interactive viewer for bacterial population genomics

2017 ◽  
Vol 34 (2) ◽  
pp. 292-293 ◽  
Author(s):  
James Hadfield ◽  
Nicholas J Croucher ◽  
Richard J Goater ◽  
Khalil Abudahab ◽  
David M Aanensen ◽  
...  
2020 ◽  
pp. PHYTO-09-20-041
Author(s):  
Christina Straub ◽  
Elena Colombi ◽  
Honour C. McCann

Population genomics is transforming our understanding of pathogen biology and evolution, and contributing to the prevention and management of disease in diverse crops. We provide an overview of key methods in bacterial population genomics and describe recent work focusing on three topics of critical importance to plant pathology: (i) resolving pathogen origins and transmission pathways during outbreak events, (ii) identifying the genetic basis of host specificity and virulence, and (iii) understanding how pathogens evolve in response to changing agricultural practices. [Formula: see text] Copyright © 2020 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license .


Open Biology ◽  
2015 ◽  
Vol 5 (1) ◽  
pp. 140133 ◽  
Author(s):  
Nitin Kumar ◽  
Ganesh Lad ◽  
Elisa Giuntini ◽  
Maria E. Kaye ◽  
Piyachat Udomwong ◽  
...  

Biological species may remain distinct because of genetic isolation or ecological adaptation, but these two aspects do not always coincide. To establish the nature of the species boundary within a local bacterial population, we characterized a sympatric population of the bacterium Rhizobium leguminosarum by genomic sequencing of 72 isolates. Although all strains have 16S rRNA typical of R. leguminosarum , they fall into five genospecies by the criterion of average nucleotide identity (ANI). Many genes, on plasmids as well as the chromosome, support this division: recombination of core genes has been largely within genospecies. Nevertheless, variation in ecological properties, including symbiotic host range and carbon-source utilization, cuts across these genospecies, so that none of these phenotypes is diagnostic of genospecies. This phenotypic variation is conferred by mobile genes. The genospecies meet the Mayr criteria for biological species in respect of their core genes, but do not correspond to coherent ecological groups, so periodic selection may not be effective in purging variation within them. The population structure is incompatible with traditional ‘polyphasic taxonomy′ that requires bacterial species to have both phylogenetic coherence and distinctive phenotypes. More generally, genomics has revealed that many bacterial species share adaptive modules by horizontal gene transfer, and we envisage a more consistent taxonomic framework that explicitly recognizes this. Significant phenotypes should be recognized as ‘biovars' within species that are defined by core gene phylogeny.


2020 ◽  
Author(s):  
Natasha Pavlovikj ◽  
Joao Carlos Gomes-Neto ◽  
Jitender S. Deogun ◽  
Andrew K. Benson

AbstractWhole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Scalability and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: 1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; 2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; 3) Use of high-performance and high-throughput computational platforms; 4) Generation of hierarchical population-based genotypes at different scales of resolution based on combinations of multi-locus and Bayesian statistical approaches for classification; 5) Detection of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases and association with genotypic classifications; and 6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species and the Pegasus WMS facilitates addition or removal of programs from the workflow or modification of options within them. All the dependencies of ProkEvo can be distributed via conda environment or Docker image. To demonstrate versatility of the ProkEvo platform, we performed population-based analyses from available genomes of three distinct pathogenic bacterial species as individual case studies (three serovars of Salmonella enterica, as well as Campylobacter jejuni and Staphylococcus aureus). The specific case studies used reproducible Python and R scripts documented in Jupyter Notebooks and collectively illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be used to generate novel hypotheses about the evolutionary history and ecological characteristics of specific populations of each pathogen. Collectively, our study shows that ProkEvo presents a viable option for scalable, automated analyses of bacterial populations with powerful applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.


2017 ◽  
Author(s):  
James Hadfield ◽  
Nicholas J. Croucher ◽  
Richard J Goater ◽  
Khalil Abudahab ◽  
David M Aanensen ◽  
...  

ABSTRACTSummaryFully exploiting the wealth of data in current bacterial population genomics datasets requires synthesising and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.AvailabilityPhandango is a web application freely available for use at https://jameshadfield.github.io/phandango and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/[email protected], [email protected]


Genes ◽  
2012 ◽  
Vol 3 (2) ◽  
pp. 261-277 ◽  
Author(s):  
Samuel K. Sheppard ◽  
Keith A. Jolley ◽  
Martin C. J. Maiden

2020 ◽  
Vol 8 (4) ◽  
pp. 549 ◽  
Author(s):  
DJ Darwin R. Bandoy ◽  
Bart C. Weimer

Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms.


2019 ◽  
Author(s):  
DJ Darwin R. Bandoy ◽  
Bart C. Weimer

AbstractHighly dimensional data generated from bacterial whole genome sequencing is providing unprecedented scale of information that requires appropriate statistical frameworks of analysis to infer biological function from bacterial genomic populations. Application of genome wide association study (GWAS) methods is an emerging approach with bacterial population genomics that yields a list of genes associated with a phenotype with an undefined importance among the candidates in the list. Here, we validate the combination of GWAS, machine learning, and pathogenic bacterial population genomics as a novel scheme to identify SNPs and rank allelic variants to determine associations for accurate estimation of disease phenotype. This approach parsed a dataset of 1.2 million SNPs that resulted in a ranked importance of associated alleles of Campylobacter jejuni porA using multiple spatial locations over a 30-year period. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This approach, termed BioML, defined intestinal and extraintestinal groups that have differential allelic variants that cause abortion. Divergent variants containing indels that defeated gene callers were rescued using biological context and knowledge that resulted in defining rare and divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled to GWAS and population genomics to simultaneously identify and rank alleles to define their role in abortion, and more broadly infectious disease.


2019 ◽  
pp. 997-1020 ◽  
Author(s):  
Jukka Corander ◽  
Nicholas J. Croucher ◽  
Simon R. Harris ◽  
John A. Lees ◽  
Gerry Tonkin‐Hill

Sign in / Sign up

Export Citation Format

Share Document