scholarly journals SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

2017 ◽  
Author(s):  
Lennard Epping ◽  
Andries J. van Tonder ◽  
Rebecca A. Gladstone ◽  
Stephen D. Bentley ◽  
Andrew J. Page ◽  
...  

ABSTRACTStreptococcus pneumoniae is responsible for 240,000 - 460,000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a hybrid assembly and mapping approach. We compared SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage in the cps locus region on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98% concordance using a k-mer based method, can process 10,000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 10x. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 license from: https://github.com/sanger-pathogens/seroba.DATA SUMMARYThe reference genome Streptococcus pneumoniae ATCC 700669 is available from National Center for Biotechnology Information (NCBI) with the accession number: FM211187Simulated paired end reads for experiment 2 have been deposited in FigShare: https://doi.org/10.6084/m9.figshare.5086054.v1Accession numbers for all other experiments are listed in Supplementary Table S1 and Supplementary Table S2.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTThis article describes SeroBA, a A-mer based method for predicting the serotypes of Streptococcus pneumoniae from Whole Genome Sequencing (WGS) data. SeroBA can identify 92 serotypes and 2 subtypes with constant memory usage and low computational costs. We showed that SeroBA is able to reliably predict serotypes at a depth of coverage as low as 10x and is scalable to large datasets.

2021 ◽  
Vol 7 (9) ◽  
Author(s):  
Geetha Nagaraj ◽  
Vandana Govindan ◽  
Feroze Ganaie ◽  
V. T. Venkatesha ◽  
Paulina A. Hawkins ◽  
...  

Globally, India has a high burden of pneumococcal disease, and pneumococcal conjugate vaccine (PCV) has been rolled out in different phases across the country since May 2017 in the national infant immunization programme (NIP). To provide a baseline for assessing the impact of the vaccine on circulating pneumococci in India, genetic characterization of pneumococcal isolates detected prior to introduction of PCV would be helpful. Here we present a population genomic study of 480 Streptococcus pneumoniae isolates collected across India and from all age groups before vaccine introduction (2009–2017), including 294 isolates from pneumococcal disease and 186 collected through nasopharyngeal surveys. Population genetic structure, serotype and antimicrobial susceptibility profile were characterized and predicted from whole-genome sequencing data. Our findings revealed high levels of genetic diversity represented by 110 Global Pneumococcal Sequence Clusters (GPSCs) and 54 serotypes. Serotype 19F and GPSC1 (CC320) was the most common serotype and pneumococcal lineage, respectively. Coverage of PCV13 (Pfizer) and 10-valent Pneumosil (Serum Institute of India) serotypes in age groups of ≤2 and 3–5 years were 63–75 % and 60–69 %, respectively. Coverage of PPV23 (Merck) serotypes in age groups of ≥50 years was 62 % (98/158). Among the top five lineages causing disease, GPSC10 (CC230), which ranked second, is the only lineage that expressed both PCV13 (serotypes 3, 6A, 14, 19A and 19F) and non-PCV13 (7B, 13, 10A, 11A, 13, 15B/C, 22F, 24F) serotypes. It exhibited multidrug resistance and was the largest contributor (17 %, 18/103) of NVTs in the disease-causing population. Overall, 42 % (202/480) of isolates were penicillin-resistant (minimum inhibitory concentration ≥0.12 µg ml−1) and 45 % (217/480) were multidrug-resistant. Nine GPSCs (GPSC1, 6, 9, 10, 13, 16, 43, 91, 376) were penicillin-resistant and among them six were multidrug-resistant. Pneumococci expressing PCV13 serotypes had a higher prevalence of antibiotic resistance. Sequencing of pneumococcal genomes has significantly improved our understanding of the biology of these bacteria. This study, describing the pneumococcal disease and carriage epidemiology pre-PCV introduction, demonstrates that 60–75 % of pneumococcal serotypes in children ≤5 years are covered by PCV13 and Pneumosil. Vaccination against pneumococci is very likely to reduce antibiotic resistance. A multidrug-resistant pneumococcal lineage, GPSC10 (CC230), is a high-risk clone that could mediate serotype replacement.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Peter Higgins ◽  
Cooper A Grace ◽  
Soon A Lee ◽  
Matthew R Goddard

Abstract Saccharomyces cerevisiae is extensively utilized for commercial fermentation, and is also an important biological model; however, its ecology has only recently begun to be understood. Through the use of whole-genome sequencing, the species has been characterized into a number of distinct subpopulations, defined by geographical ranges and industrial uses. Here, the whole-genome sequences of 104 New Zealand (NZ) S. cerevisiae strains, including 52 novel genomes, are analyzed alongside 450 published sequences derived from various global locations. The impact of S. cerevisiae novel range expansion into NZ was investigated and these analyses reveal the positioning of NZ strains as a subgroup to the predominantly European/wine clade. A number of genomic differences with the European group correlate with range expansion into NZ, including 18 highly enriched single-nucleotide polymorphism (SNPs) and novel Ty1/2 insertions. While it is not possible to categorically determine if any genetic differences are due to stochastic process or the operations of natural selection, we suggest that the observation of NZ-specific copy number increases of four sugar transporter genes in the HXT family may reasonably represent an adaptation in the NZ S. cerevisiae subpopulation, and this correlates with the observations of copy number changes during adaptation in small-scale experimental evolution studies.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Weili Cai ◽  
Schyler Nunziata ◽  
John Rascoe ◽  
Michael J. Stulberg

AbstractHuanglongbing (HLB) is a worldwide deadly citrus disease caused by the phloem-limited bacteria ‘Candidatus Liberibacter asiaticus’ (CLas) vectored by Asian citrus psyllids. In order to effectively manage this disease, it is crucial to understand the relationship among the bacterial isolates from different geographical locations. Whole genome sequencing approaches will provide more precise molecular characterization of the diversity among populations. Due to the lack of in vitro culture, obtaining the whole genome sequence of CLas is still a challenge, especially for medium to low titer samples. Hundreds of millions of sequencing reads are needed to get good coverage of CLas from an HLB positive citrus sample. In order to overcome this limitation, we present here a new method, Agilent SureSelect XT HS target enrichment, which can specifically enrich CLas from a metagenomic sample while greatly reducing cost and increasing whole genome coverage of the pathogen. In this study, the CLas genome was successfully sequenced with 99.3% genome coverage and over 72X sequencing coverage from low titer tissue samples (equivalent to 28.52 Cq using Li 16 S qPCR). More importantly, this method also effectively captures regions of diversity in the CLas genome, which provides precise molecular characterization of different strains.


2015 ◽  
Vol 117 (suppl_1) ◽  
Author(s):  
Matthew Wheeler ◽  
Daryl Waggott ◽  
Megan Grove ◽  
Frederick Dewey ◽  
Cuiping Pan ◽  
...  

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.


Antibiotics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1152
Author(s):  
Samuel M. Chekabab ◽  
John R. Lawrence ◽  
Alvin C. Alvarado ◽  
Bernardo Z. Predicala ◽  
Darren R. Korber

In response to new stringent regulations in Canada regarding the use of antibiotics in animal production, many farms have implemented practices to produce animals that are raised without antibiotics (RWA) from birth to slaughter. This study aims to assess the impact of RWA production practices on reducing the actual total on-farm use of antibiotics, the occurrence of pathogens, and the prevalence of antimicrobial resistance (AMR). A 28-month longitudinal surveillance of farms that adopted the RWA program and conventional farms using antibiotics in accordance with the new regulations (non-RWA) was conducted by collecting fecal samples from 6-week-old pigs and composite manure from the barn over six time points and applying whole-genome sequencing (WGS) to assess the prevalence of AMR genes as well as the abundance of pathogens. Analysis of in-barn drug use records confirmed the decreased consumption of antibiotics in RWA barns compared to non-RWA barns. WGS analyses revealed that RWA barns had reduced the frequency of AMR genes in piglet feces and in-barn manure. However, metagenomic analyses showed that RWA barns had a significant increase in the frequency of pathogenic Firmicutes in fecal samples and pathogenic Proteobacteria in barn manure samples.


2018 ◽  
Author(s):  
David R. Greig ◽  
Ulf Schafer ◽  
Sophie Octavia ◽  
Ebony Hunter ◽  
Marie A. Chattaway ◽  
...  

AbstractEpidemiological and microbiological data on Vibrio cholerae isolated between 2004 and 2017 (n=836) and held in the Public Health England culture archive were reviewed. The traditional biochemical species identification and serological typing results were compared with the genome derived species identification and serotype for a sub-set of isolates (n=152). Of the 836 isolates, 750 (89.7%) were from faecal specimens, 206 (24.6%) belonged to serogroup O1 and seven (0.8%) were serogroup O139, and 792 (94.7%) isolates from patients reporting recent travel abroad, most commonly to India (n=209) and Pakistan (n=104). Of the 152 isolates of V. cholerae speciated by kmer identification, 149 (98.1%) were concordant with the traditional biochemical approach. Traditional serotyping results were 100% concordant with the whole genome sequencing (WGS) analysis for identification of serogroups O1 and O139 and Classical and El Tor biotypes. ctxA was detected in all isolates of V. cholerae O1 El Tor and O139 belonging to sequence type (ST) 69, and in V. cholerae O1 Classical variants belonging to ST73. A phylogeny of isolates belonging to ST69 from UK travellers clustered geographically, with isolates from India and Pakistan located on separate branches. Moving forward, WGS data from UK travellers will contribute to global surveillance programs, and the monitoring of emerging threats to public health and the global dissemination of pathogenic lineages. At the national level, these WGS data will inform the timely reinforcement of direct public health messaging to travellers and mitigate the impact of imported infections and the associated risks to public health.


2015 ◽  
Vol 81 (17) ◽  
pp. 6024-6037 ◽  
Author(s):  
Matthew J. Stasiewicz ◽  
Haley F. Oliver ◽  
Martin Wiedmann ◽  
Henk C. den Bakker

ABSTRACTWhile the food-borne pathogenListeria monocytogenescan persist in food associated environments, there are no whole-genome sequence (WGS) based methods to differentiate persistent from sporadic strains. Whole-genome sequencing of 188 isolates from a longitudinal study ofL. monocytogenesin retail delis was used to (i) apply single-nucleotide polymorphism (SNP)-based phylogenetics for subtyping ofL. monocytogenes, (ii) use SNP counts to differentiate persistent from repeatedly reintroduced strains, and (iii) identify genetic determinants ofL. monocytogenespersistence. WGS analysis revealed three prophage regions that explained differences between three pairs of phylogenetically similar populations with pulsed-field gel electrophoresis types that differed by ≤3 bands. WGS-SNP-based phylogenetics found that putatively persistentL. monocytogenesrepresent SNP patterns (i) unique to a single retail deli, supporting persistence within the deli (11 clades), (ii) unique to a single state, supporting clonal spread within a state (7 clades), or (iii) spanning multiple states (5 clades). Isolates that formed one of 11 deli-specific clades differed by a median of 10 SNPs or fewer. Isolates from 12 putative persistence events had significantly fewer SNPs (median, 2 to 22 SNPs) than between isolates of the same subtype from other delis (median up to 77 SNPs), supporting persistence of the strain. In 13 events, nearly indistinguishable isolates (0 to 1 SNP) were found across multiple delis. No individual genes were enriched among persistent isolates compared to sporadic isolates. Our data show that WGS analysis improves food-borne pathogen subtyping and identification of persistent bacterial pathogens in food associated environments.


2018 ◽  
Vol 56 (11) ◽  
Author(s):  
David R. Greig ◽  
Ulf Schaefer ◽  
Sophie Octavia ◽  
Ebony Hunter ◽  
Marie A. Chattaway ◽  
...  

ABSTRACT Epidemiological and microbiological data on Vibrio cholerae strains isolated between April 2004 and March 2018 (n = 836) and held at the Public Health England culture archive were reviewed. The traditional biochemical species identification and serological typing results were compared with the genome-derived species identification and serotype for a subset of isolates (n = 152). Of the 836 isolates, 750 (89.7%) were from a fecal specimen, 206 (24.6%) belonged to serogroup O1, and 7 (0.8%) were serogroup O139; 792 (94.7%) isolates were from patients reporting recent travel abroad, most commonly to India (n = 209) and Pakistan (n = 104). Of the 152 V. cholerae isolates identified by use of kmer, 149 (98.1%) were concordant with those identified using the traditional biochemical approach. Traditional serotyping results were 100% concordant with those of the whole-genome sequencing (WGS) analysis for the identification of serogroups O1 and O139 and classical and El Tor biotypes. ctxA was detected in all isolates of V. cholerae O1 El Tor and O139 belonging to sequence type 69 (ST69) and in V. cholerae O1 classical variants belonging to ST73. A phylogeny of isolates belonging to ST69 from U.K. travelers clustered geographically, with isolates from India and Pakistan located on separate branches. Moving forward, WGS data from U.K. travelers will contribute to global surveillance programs and the monitoring of emerging threats to public health and the global dissemination of pathogenic lineages. At the national level, these WGS data will inform the timely reinforcement of direct public health messaging to travelers and mitigate the impact of imported infections and the associated risks to public health.


Viruses ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 108 ◽  
Author(s):  
Bruno Simon ◽  
Maxime Pichon ◽  
Martine Valette ◽  
Gwendolyne Burfin ◽  
Mathilde Richard ◽  
...  

Influenza viruses cause a remarkable disease burden and significant morbidity and mortality worldwide, and these impacts vary between seasons. To understand the mechanisms associated with these differences, a comprehensive approach is needed to characterize the impact of influenza genomic traits on the burden of disease. During 2016–2017, a year with severe A(H3N2), we sequenced 176 A(H3N2) influenza genomes using next generation sequencing (NGS) for routine surveillance of circulating influenza viruses collected via the French national influenza community-based surveillance network or from patients hospitalized in the intensive care units of the University Hospitals of Lyon, France. Taking into account confounding factors, sequencing and clinical data were used to identify genomic variants and quasispecies associated with influenza severity or vaccine failure. Several amino acid substitutions significantly associated with clinical traits were found, including NA V263I and NS1 K196E which were associated with severity and co-occurred only in viruses from the 3c.2a1 clade. Additionally, we observed that intra-host diversity as a whole and on a specific set of gene segments increased with severity. These results support the use of whole genome sequencing as a tool for the identification of genetic traits associated with severe influenza in the context of influenza surveillance.


Sign in / Sign up

Export Citation Format

Share Document