Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku

Michael Baym; Lev Shaket; Isao A. Anzai; Oluwakemi Adesina; Buz Barstow

doi:10.1038/ncomms13270

Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku

Nature Communications ◽

10.1038/ncomms13270 ◽

2016 ◽

Vol 7 (1) ◽

Cited By ~ 19

Author(s):

Michael Baym ◽

Lev Shaket ◽

Isao A. Anzai ◽

Oluwakemi Adesina ◽

Buz Barstow

Keyword(s):

Single Gene ◽

Shewanella Oneidensis ◽

Whole Genome ◽

Sequencing Data ◽

Kinetic Measurements ◽

Data Set ◽

Quinone Reduction ◽

Transposon Insertion ◽

Sequencing Library ◽

Technical Effort

Abstract Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction.

Download Full-text

Rapid Construction of a Whole-genome Transposon Insertion Collection for Shewanella oneidensis by Knockout Sudoku

10.1101/044768 ◽

2016 ◽

Author(s):

Michael Baym ◽

Lev Shaket ◽

Isao A. Anzai ◽

Oluwakemi Adesina ◽

Buz Barstow

Keyword(s):

Shewanella Oneidensis ◽

Extracellular Electron Transfer ◽

Whole Genome ◽

Kinetic Measurements ◽

Predictive Algorithm ◽

Transposon Insertion ◽

Sequencing Library ◽

Transposon Mutants ◽

Technical Effort ◽

Combinatorial Pooling

AbstractWhole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally they have needed an extraordinary technical effort to construct. Knockout Sudoku is a new method for directing the construction and purification of a curated whole-genome collection of singlegene disruption mutants generated by transposon mutagenesis. Using a simple combinatorial pooling scheme, a highly oversampled collection of transposon mutants can be condensed into a next-generation sequencing library in a single day. The identities of the mutants in the collection are then solved by a predictive algorithm based on Bayesian inference, allowing for rapid curation and validation. Starting from a progenitor collection of 39,918 transposon mutants, we compiled a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR–1 containing representatives for 3,667 genes. High-throughput kinetic measurements on this collection provide a comprehensive view of multiple extracellular electron transfer pathways operating in parallel.

Download Full-text

Whole-Genome CRISPR Screening Identifies N-Glycosylation As an Essential Pathway and a Potential Novel Therapeutic Target in CALR-Mutant MPN

Blood ◽

10.1182/blood-2021-149218 ◽

2021 ◽

Vol 138 (Supplement 1) ◽

pp. 58-58

Author(s):

Anna E. Marneth ◽

Jonas S. Jutzi ◽

Angel Guerra-Moreno ◽

Michele Ciboddo ◽

María José Jiménez Santos ◽

...

Keyword(s):

Ex Vivo ◽

Myeloproliferative Neoplasms ◽

Lectin Binding ◽

Single Gene ◽

Whole Genome ◽

Sequencing Data ◽

Therapeutic Implications ◽

Platelet Counts ◽

Secretion Pathways

Abstract Somatic mutations in the ER chaperone calreticulin (CALR) are frequent and disease-initiating in myeloproliferative neoplasms (MPN). Although the mechanism of mutant CALR-induced MPN is known to involve pathogenic binding between mutant CALR and MPL, this insight has not yet been exploited therapeutically. Consequently, a major deficiency is the lack of clonally selective therapeutic agents with curative potential. Hence, we set out to discover and validate unique genetic dependencies for mutant CALR-driven oncogenesis. We first performed a whole-genome CRISPR knockout screen in CALR Δ52 MPL-expressing hematopoietic cells to identify genes that were differentially required for the growth of cytokine-independent, transformed CALR Δ52 cells as compared to control cells. Using gene-set enrichment analyses, we identified the N-glycan biosynthesis, unfolded protein response, and the protein secretion pathways to be amongst the most significantly differentially depleted pathways (FDR q values <0.001, 0.014, and 0.025, respectively) in CALR Δ52 cells. We performed a secondary CRISPR pooled screen focused on significant pathways from the primary screen and confirmed these findings. Strikingly, seven of the top ten hits in both screens were linked to protein N-glycosylation. Four of those genes encode proteins involved in the enzymatic activity of dolichol-phosphate mannose synthase (DPM1, DPM2, DPM3, and MPDU1). This enzyme synthesizes dolichol D-mannosyl phosphate, an essential substrate for protein N-glycosylation. Importantly, these findings from an unbiased whole-genome screen align with prior mechanistic studies demonstrating that both the N-glycosylation sites on MPL and the lectin-binding sites on CALR Δ52 are required for mutant CALR-driven oncogenesis. We next performed single gene CRISPR Cas9 validation studies and found that DPM2 is required for CALR Δ52-mediated transformation, as demonstrated by increased cell death, reduced p-STAT5 and decreased MPL cell-surface levels, when Dpm2 is knocked out. Importantly, cells cultured in cytokine-rich medium were unaffected by DPM2 loss. Upon cytokine withdrawal, a sub-clone of non-edited Dpm2WT CALR Δ52 cells grew out, further demonstrating requirement for DPM2 for the survival of CALR Δ52 cells. Additionally, we observed a >50% reduction in ex vivo myeloid colony formation of murine CalrΔ52 Dpm2 ko bone marrow (BM) compared with CRISPR-Cas9 non-targeting controls, with non-significant effects on CalrWT BM cells. To enable clinical translation, we performed a pharmacological screen targeting pathways significantly depleted in our CRISPR screens. Screening 70 drugs, we found that the N-glycosylation pathway was the only pathway in which all tested compounds preferentially killed CALR Δ52 transformed cells. We then treated primary Calr Δ52/+ mice with a clinical grade N-glycosylation (N-Gi) inhibitor and found platelet counts (Sysmex) to be significantly reduced (vehicle 3x10 6/mL, N-Gi 1x10 6/mL after 18 days, p<.0001). Concordantly, the proportion of megakaryocyte erythrocyte progenitors (MEPs) was significantly reduced in CalrΔ52 BM (p=0.03). We next performed competitive BM transplantation assays using CD45.2 UBC-GFP MxCre CalrΔ52 knockin and CD45.1 mice. We found that mice treated with N-Gi had significantly reduced platelet counts (vehicle 1440x10 6/mL, N-Gi 845x10 6/mL, p=0.005) as well as significantly reduced platelet chimerism (vehicle 55%, N-Gi 27%, p<0.001), indicating a distinct vulnerability of CalrΔ52 over WT cells. Finally, we interrogated RNA-sequencing data from primary human MPN platelets. We found N-glycosylation-related pathways to be significantly upregulated in CALR-mutated platelets (n = 13) compared to healthy control platelets (n = 21), highlighting the relevance of our findings to human MPN. In summary, using unbiased genetic and focused pharmacological screens, we identified the N-glycan biosynthesis pathway as essential for mutant CALR-driven oncogenesis. Using a pre-clinical MPN model, we found that in vivo inhibition of N-glycosylation normalizes key features of MPN and preferentially targets CalrΔ52 over WT cells. These findings have therapeutic implications through inhibiting N-glycosylation alone or in combination with other agents to advance the development of clonally selective therapeutic approaches in CALR-mutant MPN. AEM and JSJ contributed equally. Figure 1 Figure 1. Disclosures Mullally: Janssen, PharmaEssentia, Constellation and Relay Therapeutics: Consultancy.

Download Full-text

Rapid Genotype Refinement for Whole-Genome Sequencing Data using Multi-Variate Normal Distributions

10.1101/031484 ◽

2015 ◽

Author(s):

Rudy Arthur ◽

Jared O'Connell ◽

Ole Schulz-Trieglaff ◽

Anthony J Cox

Keyword(s):

Markov Models ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

High Coverage ◽

Multivariate Gaussian Distribution ◽

Data Set ◽

Normal Distributions ◽

Computationally Expensive ◽

Low Coverage

Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD) based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed, it is hundreds of times faster than other methods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low-coverage and high-coverage samples.

Download Full-text

Genotyping by sequencing can reveal the complex mosaic genomes in gene pools resulting from reticulate evolution: a case study in diploid and polyploid citrus

Annals of Botany ◽

10.1093/aob/mcz029 ◽

2019 ◽

Vol 123 (7) ◽

pp. 1231-1251 ◽

Cited By ~ 8

Author(s):

Dalel Ahmed ◽

Aurore Comte ◽

Franck Curk ◽

Gilles Costantino ◽

François Luro ◽

...

Keyword(s):

Genotyping By Sequencing ◽

Reproductive Behaviour ◽

Reticulate Evolution ◽

Gene Pools ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Genome Complexity ◽

Large Populations ◽

Sequencing Library

Abstract Background and Aims Reticulate evolution, coupled with reproductive features limiting further interspecific recombinations, results in admixed mosaics of large genomic fragments from the ancestral taxa. Whole-genome sequencing (WGS) data are powerful tools to decipher such complex genomes but still too costly to be used for large populations. The aim of this work was to develop an approach to infer phylogenomic structures in diploid, triploid and tetraploid individuals from sequencing data in reduced genome complexity libraries. The approach was applied to the cultivated Citrus gene pool resulting from reticulate evolution involving four ancestral taxa, C. maxima, C. medica, C. micrantha and C. reticulata. Methods A genotyping by sequencing library was established with the restriction enzyme ApeKI applying one base (A) selection. Diagnostic single nucleotide polymorphisms (DSNPs) for the four ancestral taxa were mined in 29 representative varieties. A generic pipeline based on a maximum likelihood analysis of the number of read data was established to infer ancestral contributions along the genome of diploid, triploid and tetraploid individuals. The pipeline was applied to 48 diploid, four triploid and one tetraploid citrus accessions. Key Results Among 43 598 mined SNPs, we identified a set of 15 946 DSNPs covering the whole genome with a distribution similar to that of gene sequences. The set efficiently inferred the phylogenomic karyotype of the 53 analysed accessions, providing patterns for common accessions very close to that previously established using WGS data. The complex phylogenomic karyotypes of 21 cultivated citrus, including bergamot, triploid and tetraploid limes, were revealed for the first time. Conclusions The pipeline, available online, efficiently inferred the phylogenomic structures of diploid, triploid and tetraploid citrus. It will be useful for any species whose reproductive behaviour resulted in an interspecific mosaic of large genomic fragments. It can also be used for the first generations of interspecific breeding schemes.

Download Full-text

Phylogenetic Analysis of Mycobacterium tuberculosis Strains in Wales by Use of Core Genome Multilocus Sequence Typing To Analyze Whole-Genome Sequencing Data

Journal of Clinical Microbiology ◽

10.1128/jcm.02025-18 ◽

2019 ◽

Vol 57 (6) ◽

Cited By ~ 4

Author(s):

R. C. Jones ◽

L. G. Harris ◽

S. Morgan ◽

M. C. Ruddy ◽

M. Perry ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Mycobacterium Tuberculosis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Multilocus Sequence Typing ◽

Core Genome ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Data Set

ABSTRACT An inability to standardize the bioinformatic data produced by whole-genome sequencing (WGS) has been a barrier to its widespread use in tuberculosis phylogenetics. The aim of this study was to carry out a phylogenetic analysis of tuberculosis in Wales, United Kingdom, using Ridom SeqSphere software for core genome multilocus sequence typing (cgMLST) analysis of whole-genome sequencing data. The phylogenetics of tuberculosis in Wales have not previously been studied. Sixty-six Mycobacterium tuberculosis isolates (including 42 outbreak-associated isolates) from south Wales were sequenced using an Illumina platform. Isolates were assigned to principal genetic groups, single nucleotide polymorphism (SNP) cluster groups, lineages, and sublineages using SNP-calling protocols. WGS data were submitted to the Ridom SeqSphere software for cgMLST analysis and analyzed alongside 179 previously lineage-defined isolates. The data set was dominated by the Euro-American lineage, with the sublineage composition being dominated by T, X, and Haarlem family strains. The cgMLST analysis successfully assigned 58 isolates to major lineages, and the results were consistent with those obtained by traditional SNP mapping methods. In addition, the cgMLST scheme was used to resolve an outbreak of tuberculosis occurring in the region. This study supports the use of a cgMLST method for standardized phylogenetic assignment of tuberculosis isolates and for outbreak resolution and provides the first insight into Welsh tuberculosis phylogenetics, identifying the presence of the Haarlem sublineage commonly associated with virulent traits.

Download Full-text

Crambled: A Shiny application to enable intuitive resolution of conflicting cellularity estimates

F1000Research ◽

10.12688/f1000research.7453.1 ◽

2015 ◽

Vol 4 ◽

pp. 1407 ◽

Cited By ~ 2

Author(s):

Andy G. Lynch

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Sequencing Data ◽

Data Set ◽

Tumour Sample ◽

Copy Numbers ◽

Genome Wide ◽

Insight Into ◽

Shiny Application

It is now commonplace to investigate tumour samples using whole-genome sequencing, and some commonly performed tasks are the estimation of cellularity (or sample purity), the genome-wide profiling of copy numbers, and the assessment of sub-clonal behaviours. Several tools are available to undertake these tasks, but often give conflicting results – not least because there is often genuine uncertainty due to a lack of model identifiability. Presented here is a tool, "Crambled", that allows for an intuitive visual comparison of the conflicting solutions. Crambled is implemented as a Shiny application within R, and is accompanied by example images from two use cases (one tumour sample with matched normal sequencing, and one standalone cell line example) as well as functions to generate the necessary images from any sequencing data set. Through the use of Crambled, a user may gain insight into why each tool has offered its given solution and combined with a knowledge of the disease being studied can choose between the competing solutions in an informed manner.

Download Full-text

Comparison of three variant callers for human whole genome sequencing

10.1101/461798 ◽

2018 ◽

Author(s):

Anna Supernat ◽

Oskar Valdimar Vidarsson ◽

Vidar M. Steen ◽

Tomasz Stokowy

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Single Gene ◽

Reference Sample ◽

Variant Calling ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Whole Exome ◽

Indel Calling

ABSTRACTTesting of patients with genetics-related disorders is in progress of shifting from single gene assays to gene panel sequencing, whole-exome sequencing (WES) and whole-genome sequencing (WGS). Since WGS is unquestionably becoming a new foundation for molecular analyses, we decided to compare three currently used tools for variant calling of human whole genome sequencing data. We tested DeepVariant, a new TensorFlow machine learning-based variant caller, and compared this tool to GATK 4.0 and SpeedSeq, using 30×, 15× and 10× WGS data of the well-known NA12878 DNA reference sample.According to our comparison, the performance on SNV calling was almost similar in 30× data, with all three variant callers reaching F-Scores (i.e. harmonic mean of recall and precision) equal to 0.98. In contrast, DeepVariant was more precise in indel calling than GATK and SpeedSeq, as demonstrated by F-Scores of 0.94, 0.90 and 0.84, respectively.We conclude that the DeepVariant tool has great potential and usefulness for analysis of WGS data in medical genetics.

Download Full-text

Evaluation of whole-genome DNA methylation sequencing library preparation protocols

Epigenetics & Chromatin ◽

10.1186/s13072-021-00401-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Jacob Morrison ◽

Julie M. Koeman ◽

Benjamin K. Johnson ◽

Kelly K. Foy ◽

Ian Beddows ◽

...

Keyword(s):

Dna Methylation ◽

Bioinformatic Analysis ◽

Limiting Factor ◽

Whole Genome ◽

Library Preparation ◽

Sequencing Data ◽

Bioinformatic Pipeline ◽

Sequencing Cost ◽

Sequencing Library ◽

Sequencing Library Preparation

Abstract Background With rapidly dropping sequencing cost, the popularity of whole-genome DNA methylation sequencing has been on the rise. Multiple library preparation protocols currently exist. We have performed 22 whole-genome DNA methylation sequencing experiments on snap frozen human samples, and extensively benchmarked common library preparation protocols for whole-genome DNA methylation sequencing, including three traditional bisulfite-based protocols and a new enzyme-based protocol. In addition, different input DNA quantities were compared for two kits compatible with a reduced starting quantity. In addition, we also present bioinformatic analysis pipelines for sequencing data from each of these library types. Results An assortment of metrics were collected for each kit, including raw read statistics, library quality and uniformity metrics, cytosine retention, and CpG beta value consistency between technical replicates. Overall, the NEBNext Enzymatic Methyl-seq and Swift Accel-NGS Methyl-Seq kits performed quantitatively better than the other two protocols. In addition, the NEB and Swift kits performed well at low-input amounts, validating their utility in applications where DNA is the limiting factor. Results The NEBNext Enzymatic Methyl-seq kit appeared to be the best option for whole-genome DNA methylation sequencing of high-quality DNA, closely followed by the Swift kit, which potentially works better for degraded samples. Further, a general bioinformatic pipeline is applicable across the four protocols, with the exception of extra trimming needed for the Swift Biosciences’s Accel-NGS Methyl-Seq protocol to remove the Adaptase sequence.

Download Full-text

Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set

BMC Proceedings ◽

10.1186/1753-6561-8-s1-s21 ◽

2014 ◽

Vol 8 (S1) ◽

Cited By ~ 9

Author(s):

Valentina V Pilipenko ◽

Hua He ◽

Brad G Kurowski ◽

Eileen S Alexander ◽

Xue Zhang ◽

...

Keyword(s):

Quality Control ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Mendelian Inheritance ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Data Set ◽

Quality Control Criteria ◽

Control Criteria

Download Full-text

Evaluation of Whole-Genome DNA Methylation Sequencing Library Preparation Protocols

10.21203/rs.3.rs-249202/v1 ◽

2021 ◽

Author(s):

Jacob Morrison ◽

Julie M. Koeman ◽

Benjamin K. Johnson ◽

Kelly K. Foy ◽

Wanding Zhou ◽

...

Keyword(s):

Dna Methylation ◽

Bioinformatic Analysis ◽

Systematic Evaluation ◽

Whole Genome ◽

Library Preparation ◽

Sequencing Data ◽

Sequencing Cost ◽

Sequencing Library ◽

Fresh Frozen ◽

Sequencing Library Preparation

Abstract Background: With rapidly dropping sequencing cost, the popularity of whole-genome DNA methylation sequencing has been on the rise. Multiple library preparation protocols exist, but a systematic evaluation and benchmarking of their performance against each other is currently lacking. We have performed 22 whole-genome DNA methylation sequencing experiments on fresh frozen human samples, and extensively benchmarked common library preparation protocols for whole-genome DNA methylation sequencing, including three traditional bisulfite-based protocols and a new enzyme-based protocol. Additionally, different input DNA quantities were compared for two kits compatible with a reduced starting quantity. In addition, we also present bioinformatic analysis pipelines for sequencing data from each of these library types. Results: An assortment of metrics were collected for each kit, including raw read statistics, library quality and uniformity metrics, cytosine retention, and CpG beta value consistency between technical replicates. Overall, the NEBNext Enzymatic Methyl-seq kit performed quantitatively better than the other three protocols at two different DNA input amounts. Additionally, the results for the different input amounts were generally consistent across all metrics. Conclusions: Based on these results, we recommend use of the NEBNext Enzymatic Methyl-seq kit for whole-genome DNA methylation sequencing. Further, a general bioinformatic pipeline is applicable across the four protocols, with the exception of extra trimming needed for the Swift Bioscience's Accel-NGS Methyl-Seq protocol to remove the Adaptase sequence.

Download Full-text