Error rates, PCR recombination, and sampling depth in HIV-1 Whole Genome Deep Sequencing

Mapping Intimacies ◽

10.1101/077313 ◽

2016 ◽

Author(s):

Fabio Zanini ◽

Johanna Brodin ◽

Jan Albert ◽

Richard A. Neher

Keyword(s):

Deep Sequencing ◽

Large Scale ◽

Rare Variants ◽

Cost Effective ◽

Data Interpretation ◽

Error Rates ◽

Whole Genome ◽

Viral Genomes ◽

Sequencing Errors ◽

Hiv 1

Deep sequencing is a powerful and cost-effective tool to characterize the genetic diversity and evolution of virus populations. While modern sequencing instruments readily cover viral genomes many thousand fold and very rare variants can in principle be detected, sequencing errors, amplification biases, and other artifacts can limit sensitivity and complicate data interpretation. Here, we describe several control experiments and error correction methods for whole-genome deep sequencing of viral genomes. We developed many of these in the course of a large scale whole genome deep sequencing study of HIV-1 populations. We measured the substitution and indel errors that arose during sequencing and PCR and quantified PCR-mediated recombination. We find that depending on the viral load in the samples, rare mutations down to 0.2% can be reproducibly detected. PCR recombination can be avoided by consistently working at low amplicon concentrations.

Download Full-text

Faculty Opinions recommendation of Error rates, PCR recombination, and sampling depth in HIV-1 whole genome deep sequencing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727160877.793531007 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Deep Sequencing ◽

Error Rates ◽

Whole Genome ◽

Sampling Depth ◽

Hiv 1

Download Full-text

Error rates, PCR recombination, and sampling depth in HIV-1 whole genome deep sequencing

Virus Research ◽

10.1016/j.virusres.2016.12.009 ◽

2017 ◽

Vol 239 ◽

pp. 106-114 ◽

Cited By ~ 21

Author(s):

Fabio Zanini ◽

Johanna Brodin ◽

Jan Albert ◽

Richard A. Neher

Keyword(s):

Deep Sequencing ◽

Error Rates ◽

Whole Genome ◽

Sampling Depth ◽

Hiv 1

Download Full-text

Large-Scale Whole-Genome Sequencing Reveals the Genetic Architecture of Primary Membranoproliferative GN and C3 Glomerulopathy

Journal of the American Society of Nephrology ◽

10.1681/asn.2019040433 ◽

2020 ◽

Vol 31 (2) ◽

pp. 365-373 ◽

Cited By ~ 7

Author(s):

Adam P. Levine ◽

Melanie M.Y. Chan ◽

Omid Sadeghi-Alavijeh ◽

Edwin K.S. Wong ◽

H. Terence Cook ◽

...

Keyword(s):

Large Scale ◽

Rare Variants ◽

Alternative Pathway ◽

Atypical Hemolytic Uremic Syndrome ◽

Gene Mutations ◽

Whole Genome Sequence ◽

European Ancestry ◽

Whole Genome ◽

C3 Glomerulopathy ◽

Complement Gene

BackgroundPrimary membranoproliferative GN, including complement 3 (C3) glomerulopathy, is a rare, untreatable kidney disease characterized by glomerular complement deposition. Complement gene mutations can cause familial C3 glomerulopathy, and studies have reported rare variants in complement genes in nonfamilial primary membranoproliferative GN.MethodsWe analyzed whole-genome sequence data from 165 primary membranoproliferative GN cases and 10,250 individuals without the condition (controls) as part of the National Institutes of Health Research BioResource–Rare Diseases Study. We examined copy number, rare, and common variants.ResultsOur analysis included 146 primary membranoproliferative GN cases and 6442 controls who were unrelated and of European ancestry. We observed no significant enrichment of rare variants in candidate genes (genes encoding components of the complement alternative pathway and other genes associated with the related disease atypical hemolytic uremic syndrome; 6.8% in cases versus 5.9% in controls) or exome-wide. However, a significant common variant locus was identified at 6p21.32 (rs35406322) (P=3.29×10−8; odds ratio [OR], 1.93; 95% confidence interval [95% CI], 1.53 to 2.44), overlapping the HLA locus. Imputation of HLA types mapped this signal to a haplotype incorporating DQA1*05:01, DQB1*02:01, and DRB1*03:01 (P=1.21×10−8; OR, 2.19; 95% CI, 1.66 to 2.89). This finding was replicated by analysis of HLA serotypes in 338 individuals with membranoproliferative GN and 15,614 individuals with nonimmune renal failure.ConclusionsWe found that HLA type, but not rare complement gene variation, is associated with primary membranoproliferative GN. These findings challenge the paradigm of complement gene mutations typically causing primary membranoproliferative GN and implicate an underlying autoimmune mechanism in most cases.

Download Full-text

Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection

PLoS Pathogens ◽

10.1371/journal.ppat.1002529 ◽

2012 ◽

Vol 8 (3) ◽

pp. e1002529 ◽

Cited By ~ 246

Author(s):

Matthew R. Henn ◽

Christian L. Boutwell ◽

Patrick Charlebois ◽

Niall J. Lennon ◽

Karen A. Power ◽

...

Keyword(s):

Deep Sequencing ◽

Acute Infection ◽

Immune Recognition ◽

Whole Genome ◽

The Impact ◽

Hiv 1

Download Full-text

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies

10.1101/552950 ◽

2019 ◽

Author(s):

Zilin Li ◽

Xihao Li ◽

Yaowu Liu ◽

Jincheng Shen ◽

Han Chen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variant ◽

Type I Error ◽

Rare Variants ◽

Error Rates ◽

Type I ◽

Whole Genome ◽

Rare Variant Association ◽

Dynamic Scan

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.

Download Full-text

807. Same-day Transmission Analysis of Nosocomial Transmission Using Nanopore Whole Genome Sequencing

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab466.1003 ◽

2021 ◽

Vol 8 (Supplement_1) ◽

pp. S497-S498

Author(s):

Mohamad Sater ◽

Remy Schwab ◽

Ian Herriott ◽

Tim Farrell ◽

Miriam Huntley

Keyword(s):

High Resolution ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Cost Effective ◽

Error Rates ◽

Sequencing Error ◽

Snp Analysis ◽

Whole Genome ◽

Snp Calling

Abstract Background Healthcare associated infections (HAIs) are a major contributor to patient morbidity and mortality worldwide. HAIs are increasingly important due to the rise of multidrug resistant pathogens which can lead to deadly nosocomial outbreaks. Current methods for investigating transmissions are slow, costly, or have poor detection resolution. A rapid, cost-effective and high-resolution method to identify transmission events is imperative to guide infection control. Whole genome sequencing of infecting pathogens paired with a single nucleotide polymorphism (SNP) analysis can provide high-resolution clonality determination, yet these methods typically have long turnaround times. Here we examined the utility of the Oxford Nanopore Technologies (ONT) platform, a rapid sequencing technology, for whole genome sequencing based transmission analysis. Methods We developed a SNP calling pipeline customized for ONT data, which exhibit higher sequencing error rates and can therefore be challenging for transmission analysis. The pipeline leverages the latest basecalling tools as well as a suite of custom variant calling and filtering algorithms to achieve highest accuracy in clonality calls compared to Illumina-based sequencing. We also capitalize on ONT long reads by assembling outbreak-specific genomes in order to overcome the need for an external reference genome. Results We examined 20 bacterial isolates from 5 HAI investigations previously performed at Day Zero Diagnostics as part of epiXact®, our commercialized Illumina-based HAI sequencing and analysis service. Using the ONT data and pipeline, we achieved greater than 90% SNP-calling sensitivity and precision, allowing 100% accuracy of clonality classification compared to Illumina-based results across common HAI species. We demonstrate the validity and increased resolution of our SNP analysis pipeline using assembled genomes from each outbreak. We also demonstrate that this ONT-based workflow can produce isolate to transmission determination (i.e. including WGS and analysis) in less than 24 hours. SNP calling performance ONT-based SNP calling sensitivity and precision compared to Illumina-based pipeline Conclusion We demonstrate the utility of ONT for HAI investigation, establishing the potential to transform healthcare epidemiology with same-day high-resolution transmission determination. Disclosures Mohamad Sater, PhD, Day Zero Diagnostics (Employee, Shareholder) Remy Schwab, MSc, Day Zero Diagnostics (Employee, Shareholder) Ian Herriott, BS, Day Zero Diagnostics (Employee, Shareholder) Tim Farrell, MS, Day Zero Diagnostics, Inc. (Employee, Shareholder) Miriam Huntley, PhD, Day Zero Diagnostics (Employee, Shareholder)

Download Full-text

P4-097: RARE VARIANTS IN FAMILIAL LATE-ONSET ALZHEIMER'S DISEASE IDENTIFIED FROM LARGE SCALE WHOLE GENOME SEQUENCING

Alzheimer s & Dementia ◽

10.1016/j.jalz.2019.06.3757 ◽

2019 ◽

Vol 15 ◽

pp. P1312-P1312

Author(s):

Badri N. Vardarajan ◽

James Jaworski ◽

Gary W. Beecham ◽

Sandra Barral ◽

Dolly Reyes-Dumeyer ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Rare Variants ◽

Late Onset ◽

Whole Genome

Download Full-text

Strain Structure and Dynamics Revealed by Targeted Deep Sequencing of the Honey Bee Gut Microbiome

mSphere ◽

10.1128/msphere.00694-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Louis-Marie Bobay ◽

Emily F. Wissel ◽

Kasie Raymann

Keyword(s):

Microbial Communities ◽

Honey Bee ◽

Deep Sequencing ◽

Gut Microbiome ◽

Cost Effective ◽

Whole Genome Shotgun ◽

Strain Level ◽

Whole Genome ◽

Metagenomic Sequencing ◽

Microbiome Composition

ABSTRACT Host-associated microbiomes can be critical for the health and proper development of animals and plants. The answers to many fundamental questions regarding the modes of acquisition and microevolution of microbiome communities remain to be established. Deciphering strain-level dynamics is essential to fully understand how microbial communities evolve, but the forces shaping the strain-level dynamics of microbial communities remain largely unexplored, mostly because of methodological issues and cost. Here, we used targeted strain-level deep sequencing to uncover the strain dynamics within a host-associated microbial community using the honey bee gut microbiome as a model system. Our results revealed that amplicon sequencing of conserved protein-coding gene regions using species-specific primers is a cost-effective and accurate method for exploring strain-level diversity. In fact, using this method we were able to confirm strain-level results that have been obtained from whole-genome shotgun sequencing of the honey bee gut microbiome but with a much higher resolution. Importantly, our deep sequencing approach allowed us to explore the impact of low-frequency strains (i.e., cryptic strains) on microbiome dynamics. Results show that cryptic strain diversity is not responsible for the observed variations in microbiome composition across bees. Altogether, the findings revealed new fundamental insights regarding strain dynamics of host-associated microbiomes. IMPORTANCE The factors driving fine-scale composition and dynamics of gut microbial communities are poorly understood. In this study, we used metagenomic amplicon deep sequencing to decipher the strain dynamics of two key members of the honey bee gut microbiome. Using this high-throughput and cost-effective approach, we were able to confirm results from previous large-scale whole-genome shotgun (WGS) metagenomic sequencing studies while also gaining additional insights into the community dynamics of two core members of the honey bee gut microbiome. Moreover, we were able to show that cryptic strains are not responsible for the observed variations in microbiome composition across bees.

Download Full-text

Whole Genome Rare-Variant Association Study of HIV-1 Progression in a Southern African Population

10.1101/2020.12.16.20248307 ◽

2020 ◽

Author(s):

Prisca K. Thami ◽

Wonderful Choga ◽

Delesa D. Mulisa ◽

Collet Dandara ◽

Andrey K. Shevchenko ◽

...

Keyword(s):

Functional Analysis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variant ◽

Rare Variants ◽

Variant Calling ◽

Analysis Tool ◽

Whole Genome ◽

Host Genetics ◽

Hiv 1

ABSTRACTDespite the high burden of HIV-1 in Botswana, the population of Botswana is significantly underrepresentation in host genetics studies of HIV-1. Furthermore, the bulk of previous genomics studies evaluated common human genetic variations, however, there is increasing evidence of the influence of rare variants in the outcome of diseases which may be uncovered by comprehensive complete and deep genome sequencing. This research aimed to evaluate the role of rare-variants in susceptibility to HIV-1 and progression through whole genome sequencing. Whole genome sequences (WGS) of 265 HIV-1 positive and 125 were HIV-1 negative unrelated individuals from Botswana were mapped to the human reference genome GRCh38. Population joint variant calling was performed using Genome Analysis Tool Kit (GATK) and BCFTools. Cumulative effects of rare variant sets on susceptibility to HIV-1 and progression (CD4+ T-cell decline) were determined with optimized Sequence Kernel Association Test (SKAT-O). In silico functional analysis of the prioritized variants was performed through gene-set enrichment using databases in GeneMANIA and Enrichr. Novel rare-variants within the ANKRD39 (8.48 × 10−8), LOC105378523 (7.45 × 10−7) and GTF3C3 (1.36 × 10−6) genes were significantly associated with HIV-1 progression. Functional analysis revealed that these genes are involved in viral translation and transcription. These findings highlight the significance of whole genome sequencing in pinpointing rare-variants of clinical relevance. The research contributes towards a deeper understanding of the host genetics HIV-1 and offers promise of population specific interventions against HIV-1.

Download Full-text

METHimpute: Imputation-guided construction of complete methylomes from WGBS data

10.1101/190223 ◽

2017 ◽

Author(s):

Aaron Taudt ◽

David Roquis ◽

Amaryllis Vidalis ◽

René Wardenaar ◽

Frank Johannes ◽

...

Keyword(s):

Large Scale ◽

Bisulfite Sequencing ◽

Hidden Markov ◽

Methylation Status ◽

Population Level ◽

Cost Effective ◽

High Accuracy ◽

Whole Genome ◽

Effective Solution ◽

Genome Bisulfite Sequencing

AbstractWhole-genome Bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage. Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies. Although METHimpute has been extensively tested in plants, it should be broadly applicable to other species.

Download Full-text