Identifying differentially methylated sites in samples with varying tumor purity

Mapping Intimacies ◽

10.1101/248781 ◽

2018 ◽

Author(s):

Antti Häkkinen ◽

Amjad Alkodsi ◽

Chiara Facciotto ◽

Kaiyang Zhang ◽

Katja Kaipio ◽

...

Keyword(s):

Normal Cell ◽

Computational Method ◽

Clinical Samples ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Cell Control ◽

Tumor Purity ◽

Wide Range ◽

Cancer Types ◽

Control Samples

AbstractDNA methylation aberrations are common in many cancer types. A major challenge hindering comparison of patient-derived samples is that they comprise of heterogeneous collection of cancer and microenvironment cells. We present a computational method that allows comparing cancer methylomes in two or more heterogeneous tumor samples featuring differing, unknown fraction of cancer cells. The method is unique in that it allows comparison also in the absence of normal cell control samples and without prior tumor purity estimates, as these are often unavailable or unreliable in clinical samples. We use simulations and next-generation methylome, RNA, and whole-genome sequencing data from two cancer types to demonstrate that the method is accurate and outperforms alternatives. The results show that our method adapts well to various cancer types and to a wide range of tumor content, and works robustly without a control or with controls derived from various sources. The method is freely available at https://bitbucket.org/anthakki/dmml.

Download Full-text

eSCAN: Scan Regulatory Regions for Aggregate Association Testing using Whole Genome Sequencing Data

10.1101/2020.11.30.405266 ◽

2020 ◽

Author(s):

Yingxi Yang ◽

Yuchen Yang ◽

Le Huang ◽

Jai G. Broome ◽

Adolfo Correa ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

New Technologies ◽

Real Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Association Testing ◽

Wide Range ◽

Sequencing Studies

AbstractWith advances in whole genome sequencing (WGS) technology, multiple statistical methods for aggregate association testing have been developed. Many common approaches aggregate variants in a given genomic window of a fixed/varying size and are not reliant on existing knowledge to define appropriate test units, resulting in most identified regions not being clearly linked to genes, limiting biological understanding. Functional information from new technologies (such as Hi-C and its derivatives), which can help link enhancers to the genes they affect, can be leveraged to predefine variant sets for aggregate testing in WGS. Therefore, in this paper we propose the eSCAN (Scan the Enhancers) method for genome-wide assessment of enhancer regions in sequencing studies, combining the advantages of dynamic window selection in SCANG with the advantages of increased incorporation of genomic annotation. eSCAN searches biologically meaningful searching windows, increasing power and aiding biological interpretation, as demonstrated by simulation studies under a wide range of scenarios. We also apply eSCAN for association analysis of blood cell traits using TOPMed WGS data from Women’s Health Initiative (WHI) and Jackson Heart Study (JHS). Results from this real data example show that eSCAN is able to capture more significant signals, and these signals are of shorter length and drive association of larger regions detected by other methods.

Download Full-text

Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: An inter-laboratory study

10.1101/793885 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ronan M. Doyle ◽

Denise M. O’Sullivan ◽

Sean D. Aller ◽

Sebastian Bruchmann ◽

Taane Clark ◽

...

Keyword(s):

Antimicrobial Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Laboratory Study ◽

Clinical Microbiology ◽

Sequence Data ◽

Clinical Samples ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

AbstractBackgroundAntimicrobial resistance (AMR) poses a threat to public health. Clinical microbiology laboratories typically rely on culturing bacteria for antimicrobial susceptibility testing (AST). As the implementation costs and technical barriers fall, whole-genome sequencing (WGS) has emerged as a ‘one-stop’ test for epidemiological and predictive AST results. Few published comparisons exist for the myriad analytical pipelines used for predicting AMR. To address this, we performed an inter-laboratory study providing sets of participating researchers with identical short-read WGS data sequenced from clinical isolates, allowing us to assess the reproducibility of the bioinformatic prediction of AMR between participants and identify problem cases and factors that lead to discordant results.MethodsWe produced ten WGS datasets of varying quality from cultured carbapenem-resistant organisms obtained from clinical samples sequenced on either an Illumina NextSeq or HiSeq instrument. Nine participating teams (‘participants’) were provided these sequence data without any other contextual information. Each participant used their own pipeline to determine the species, the presence of resistance-associated genes, and to predict susceptibility or resistance to amikacin, gentamicin, ciprofloxacin and cefotaxime.ResultsIndividual participants predicted different numbers of AMR-associated genes and different gene variants from the same clinical samples. The quality of the sequence data, choice of bioinformatic pipeline and interpretation of the results all contributed to discordance between participants. Although much of the inaccurate gene variant annotation did not affect genotypic resistance predictions, we observed low specificity when compared to phenotypic AST results but this improved in samples with higher read depths. Had the results been used to predict AST and guide treatment a different antibiotic would have been recommended for each isolate by at least one participant.ConclusionsWe found that participants produced discordant predictions from identical WGS data. These challenges, at the final analytical stage of using WGS to predict AMR, suggest the need for refinements when using this technology in clinical settings. Comprehensive public resistance sequence databases and standardisation in the comparisons between genotype and resistance phenotypes will be fundamental before AST prediction using WGS can be successfully implemented in standard clinical microbiology laboratories.

Download Full-text

JuLI: accurate detection of DNA fusions in clinical sequencing for precision oncology

10.1101/521039 ◽

2019 ◽

Author(s):

Hyun-Tae Shin ◽

Nayoung K. D. Kim ◽

Jae Won Yun ◽

Boram Lee ◽

Sungkyu Kyung ◽

...

Keyword(s):

High Throughput Sequencing ◽

False Negative ◽

Detection Algorithm ◽

Clinical Samples ◽

Whole Genome Sequencing Data ◽

Precision Oncology ◽

Sequencing Data ◽

Clinical Sequencing ◽

Accurate Detection ◽

High Depth

ABSTRACTAccurate detection of genomic fusions by high-throughput sequencing in clinical samples with inadequate tumor purity and formalin-fixed paraffin embedded (FFPE) tissue is an essential task in precise oncology. We developed the fusion detection algorithm Junction Location Identifier (JuLI) for optimization of high-depth clinical sequencing. We implemented novel filtering steps to minimize false positives and a joint calling function to increase sensitivity in clinical setting. We comprehensively validated the algorithm using high-depth sequencing data from cancer cell lines and clinical samples and whole genome sequencing data from NA12878. We showed that JuLI outperformed state-of-the-art fusion callers in cases with high-depth clinical sequencing and rescued a driver fusion from false negative in plasma cell-free DNA. JuLI is freely available via GitHub (https://github.com/sgilab/JuLI).

Download Full-text

Evolution and expansion of multidrug resistant malaria in Southeast Asia: a genomic epidemiology study

10.1101/621763 ◽

2019 ◽

Author(s):

William L Hamilton ◽

Roberto Amato ◽

Rob W van der Pluijm ◽

Christopher G Jacob ◽

Huynh Hong Quang ◽

...

Keyword(s):

Southeast Asia ◽

International Development ◽

Genetic Relatedness ◽

Multidrug Resistant ◽

Added Value ◽

Chloroquine Resistance ◽

Clinical Samples ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Drug Pressure

SummaryBackgroundA multidrug resistant co-lineage of Plasmodium falciparum malaria, named KEL1/PLA1, spread across Cambodia c.2008-2013, causing high treatment failure rates to the frontline combination therapy dihydroartemisinin-piperaquine. Here, we report on the evolution and spread of KEL1/PLA1 in subsequent years.MethodsWe analysed whole genome sequencing data from 1,673 P. falciparum clinical samples collected in 2008-2018 from northeast Thailand, Laos, Cambodia and Vietnam. By investigating genome-wide relatedness between parasites, we inferred patterns of shared ancestry in the KEL1/PLA1 population.FindingsKEL1/PLA1 spread rapidly from 2015 into all of the surveyed countries and now exceeds 80% of the P. falciparum population in several regions. These parasites maintained a high level of genetic relatedness reflecting their common origin. However, several genetic subgroups have recently emerged within this co-lineage with diverse geographical distributions. Some of these emerging KEL1/PLA1 subgroups carry recent mutations in the chloroquine resistance transporter (crt) gene, which arise on a specific genetic background comprising multiple genomic regions.InterpretationAfter emerging and circulating for several years within Cambodia, the P. falciparum KEL1/PLA1 co-lineage diversified into multiple subgroups and acquired new genetic features including novel crt mutations. These subgroups have rapidly spread into neighbouring countries, suggesting enhanced fitness. These findings highlight the urgent need for elimination of this increasingly drug-resistant parasite co-lineage, and the importance of genetic surveillance in accelerating elimination efforts.FundingWellcome Trust, Bill & Melinda Gates Foundation, UK Medical Research Council, UK Department for International Development.Research in contextEvidence before this studyThis study updates our previous work describing the emergence and spread of a multidrug resistant P. falciparum co-lineage (KEL1/PLA1) within Cambodia up to 2013. Since then, a regional genetic surveillance project, GenRe-Mekong, has reported that markers of dihydroartemisinin-piperaquine (DHA-PPQ) resistance have increased in frequency in neighbouring countries. A PubMed search (terms: “artemisinin”, “piperaquine”, “resistance”, “southeast asia”) for articles listed since our previous study (from 30/10/2017 to 05/01/2019) yielded 28 results, including reports of a recent sharp decline in DHA-PPQ clinical efficacy in Vietnam; the spread of genetic markers of DHA-PPQ resistance into neighbouring countries by Imwong and colleagues; and multiple reports associating mutations in the crt gene with piperaquine resistance, including newly emerging crt variants in Southeast Asia.Added value of this studyWe analysed P. falciparum whole genomes collected up to early 2018 from Eastern Southeast Asia (Cambodia and surrounding regions), describing the fine-scale epidemiology of multiple KEL1/PLA1 genetic subgroups that have spread out from Cambodia since 2015 and taken over indigenous parasite populations in northeastern Thailand, southern and central Vietnam and parts of southern Laos. Several newly emerging crt mutations accompanied the spread and expansion of KEL1/PLA1 subgroups, suggesting an active proliferation of biologically fit, multidrug resistant parasites.Implications of all the available evidenceThe problem of P. falciparum multidrug resistance has dramatically worsened in Eastern Southeast Asia since previous reports. KEL1/PLA1 has diversified and spread widely across Eastern Southeast Asia since 2015, becoming the predominant parasite group in several regions. This may have been fuelled by continued parasite exposure to DHA-PPQ, resulting in sustained selection after KEL1/PLA1 became established. Continued drug pressure enabled the acquisition of further mutations, resulting in higher levels of resistance. These data demonstrate the value of pathogen genetic surveillance and the urgent need to eliminate these dangerous parasites.

Download Full-text

368. Performance Characteristics of Sequencing Assays for Identification of the SARS-CoV-2 Viral Genome

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab466.569 ◽

2021 ◽

Vol 8 (Supplement_1) ◽

pp. S286-S287

Author(s):

Danny Antaki ◽

Mara Couto-Rodriguez ◽

Tong Liu ◽

Kristin Butcher ◽

Esteban Toro ◽

...

Keyword(s):

In Silico ◽

Viral Genome ◽

Board Member ◽

Panel Member ◽

Cost Effective ◽

Dropout Rate ◽

Clinical Samples ◽

Sequencing Data ◽

Hybrid Capture ◽

Wide Range

Abstract Background As the SARS-CoV-2 (SCV-2) virus evolves, diagnostics and vaccines against novel strains rely on viral genome sequencing. Researchers have gravitated towards the cost-effective and highly sensitive amplicon-based (e.g. ARTIC) and hybrid capture sequencing (e.g. SARS-CoV-2 NGS Assay) to selectively target the SCV-2 genome. We provide an in silico model to compare these 2 technologies and present data on the high scalability of the Research Use Only (RUO) workflow of the SARS-CoV-2 NGS Assay. Methods In silico work included alignments of 383,656 high-quality genome sequences belonging to variant of concern (VOC) or variant of interest (VOI) isolates (GISAID). We profiled mismatches and sequencing dropouts using the ARTIC V3 primers, SARS-CoV-2 NGS Assay probes (Twist Bioscience) and 11 synthesized viral sequences containing mutations and compared the performance of these assays using clinical samples. Further, the miniaturized hybrid capture workflow was optimized and evaluated to support high-throughput (384-plex). The sequencing data was processed by COVID-DX software. Results We detected 101,432 viruses (27%) with > = 1 mismatch in the last 6 base pairs of the 3’ end of ARTIC primers; of these, 413 had > = 2 mismatches in one primer. In contrast, only 38 viruses (0.01%) had enough mutations ( > = 10) in a hybrid capture probe to have a similar effect on coverage. We observed that mutations in ARTIC primers led to complete dropout of the amplicon for 4/11 isolates and diminished coverage in additional 4. Twist probes showed uniform coverage throughout with little to no dropouts. Both assays detected a wide range of variants (~99.9% coverage at 5X depth) in clinical samples (CT value < 30) collected in NY (Spring 2020-Spring 2021). The distribution of the number of reads and on target rates were more uniform among specimens within amplicon-based sequencing. However, uneven genome coverage and primer dropouts, some in the spike protein, were observed on VOC/VOI and other isolates highlighting limitations of an amplicon-based approach. Conclusion The RUO workflow of the SARS-CoV-2 NGS Assay is a comprehensive and scalable sequencing tool for variant profiling, yields more consistent coverage and smaller dropout rate compared to ARTIC (0.05% vs. 7.7%). Disclosures Danny Antaki, PhD, Twist Bioscience (Employee, Shareholder) Mara Couto-Rodriguez, MS, Biotia (Employee) Kristin Butcher, MS, Twist Bioscience (Employee, Shareholder) Esteban Toro, PhD, Twist Bioscience (Employee) Bryan Höglund, BS, Twist Bioscience (Employee, Shareholder) Xavier O. Jirau Serrano, B.S., Biotia (Employee) Joseph Barrows, MS, Biotia (Employee) Christopher Mason, PhD, Biotia (Board Member, Advisor or Review Panel member, Shareholder) Niamh B. O’Hara, PhD, Biotia (Board Member, Employee, Shareholder) Dorottya Nagy-Szakal, MD PhD, Biotia Inc (Employee, Shareholder)

Download Full-text

Genomic basis for RNA alterations revealed by whole-genome analyses of 27 cancer types

10.1101/183889 ◽

2017 ◽

Cited By ~ 10

Author(s):

◽

Claudia Calabrese ◽

Natalie R. Davidson ◽

Nuno A. Fonseca ◽

Yao He ◽

...

Keyword(s):

Molecular Mechanisms ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Specific Expression ◽

Alu Elements ◽

Cancer Types ◽

Genome Analyses ◽

Gene Alterations

AbstractWe present the most comprehensive catalogue of cancer-associated gene alterations through characterization of tumor transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes project. Using matched whole-genome sequencing data, we attributed RNA alterations to germline and somatic DNA alterations, revealing likely genetic mechanisms. We identified 444 associations of gene expression with somatic non-coding single-nucleotide variants. We found 1,872 splicing alterations associated with somatic mutation in intronic regions, including novel exonization events associated with Alu elements. Somatic copy number alterations were the major driver of total gene and allele-specific expression (ASE) variation. Additionally, 82% of gene fusions had structural variant support, including 75 of a novel class called “bridged” fusions, in which a third genomic location bridged two different genes. Globally, we observe transcriptomic alteration signatures that differ between cancer types and have associations with DNA mutational signatures. Given this unique dataset of RNA alterations, we also identified 1,012 genes significantly altered through both DNA and RNA mechanisms. Our study represents an extensive catalog of RNA alterations and reveals new insights into the heterogeneous molecular mechanisms of cancer gene alterations.

Download Full-text

Deep assessment of human disease-associated ribosomal RNA modifications using Nanopore direct RNA sequencing

10.1101/2021.11.10.467884 ◽

2021 ◽

Author(s):

Isabel S Naarmann-de Vries ◽

Christiane Zorbas ◽

Amina Lemsara ◽

Maja Bencun ◽

Sarah Schudy ◽

...

Keyword(s):

Rna Sequencing ◽

Clinical Samples ◽

Sequencing Data ◽

Rna Modifications ◽

Flow Cells ◽

Wide Range ◽

Catalytically Active ◽

Induced Pluripotent ◽

First Time

The catalytically active component of ribosomes, rRNA, is long studied and heavily modified. However, little is known about functional and pathological consequences of changes in human rRNA modification status. Direct RNA sequencing on the Nanopore platform enables the direct assessment of rRNA modifications. We established a targeted Nanopore direct rRNA sequencing approach and applied it to CRISPR-Cas9 engineered HCT116 cells, lacking specific enzymatic activities required to establish defined rRNA base modifications. We analyzed these sequencing data along with wild type samples and in vitro transcribed reference sequences to specifically detect changes in modification status. We show for the first time that direct RNA-sequencing is feasible on smaller, i.e. Flongle, flow cells. Our targeted approach reduces RNA input requirements, making it accessible to the analysis of limited samples such as patient derived material. The analysis of rRNA modifications during cardiomyocyte differentiation of human induced pluripotent stem cells, and of heart biopsies from cardiomyopathy patients revealed altered modifications of specific sites, among them pseudouridine, 2-O-methylation of ribose and acetylation of cytidine. Targeted direct rRNA-seq analysis with JACUSA2 opens up the possibility to analyze dynamic changes in rRNA modifications in a wide range of biological and clinical samples.

Download Full-text

Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing

10.1101/333617 ◽

2018 ◽

Cited By ~ 11

Author(s):

Isidro Cortés-Ciriano ◽

June-Koo Lee ◽

Ruibin Xi ◽

Dhawal Jain ◽

Youngsook L. Jung ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Human Cancer ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

End Joining ◽

Cancer Types ◽

Non Homologous End Joining

SummaryChromothripsis is a newly discovered mutational phenomenon involving massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in cancer suggest that chromothripsis may be far more common than initially inferred from low resolution DNA copy number data. Here, we analyze the patterns of chromothripsis across 2,658 tumors spanning 39 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of >50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy number states, a considerable fraction of the events involves multiple chromosomes as well as additional structural alterations. In addition to non-homologous end-joining, we detect signatures of replicative processes and templated insertions. Chromothripsis contributes to oncogene amplification as well as to inactivation of genes such as mismatch-repair related genes. These findings show that chromothripsis is a major process driving genome evolution in human cancer.

Download Full-text

SMuRF: Portable and accurate ensemble-based somatic variant calling

10.1101/270413 ◽

2018 ◽

Cited By ~ 2

Author(s):

Weitai Huang ◽

Yu Amanda Guo ◽

Karthik Muthukumar ◽

Probhonjon Baruah ◽

Meimei Chang ◽

...

Keyword(s):

Point Mutations ◽

Variant Calling ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Somatic Variant ◽

Level Data ◽

Machine Learning Approach ◽

Cancer Types ◽

User Friendly ◽

Improved Accuracy

ABSTARCTSummarySMuRF is an ensemble method for prediction of somatic point mutations (SNVs) and small insertions/deletions (indels) in cancer genomes. The method integrates predictions and auxiliary features from different somatic mutation callers using a Random Forest machine learning approach. SMuRF is trained on community-curated tumor whole genome sequencing data, is robust across cancer types, and achieves improved accuracy for both SNV and indel predictions of genome and exome-level data. The software is user-friendly and portable by design, operating as an add-on to the community-developed bcbio-nextgen somatic variant calling [email protected]

Download Full-text

Benchmarking topological accuracy of bacterial phylogenomic workflows using in silico evolution

10.1101/2021.08.03.454900 ◽

2021 ◽

Author(s):

Boas CL van der Putten ◽

Niek AH Huijsmans ◽

Daniel R Mende ◽

Constance Schultsz

Keyword(s):

De Novo ◽

Phylogenetic Analyses ◽

Bacterial Species ◽

Phylogenetic Reconstruction ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Relevant Alternatives ◽

Wide Range ◽

Similar Accuracy

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95% average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked seventeen distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and SKA achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or SKESA outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction k-mer alignment methods are relevant alternatives to reference mapping at species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

Download Full-text