duphold: scalalable, depth-based annotation and curation of high-confidence structural variant calls

Mapping Intimacies ◽

10.1101/465385 ◽

2018 ◽

Cited By ~ 1

Author(s):

Brent S. Pedersen ◽

Aaron R. Quinlan

Keyword(s):

Copy Number ◽

Visual Inspection ◽

Rapid Change ◽

Depth Information ◽

Sequence Coverage ◽

High Confidence ◽

Structural Variant ◽

Split Read ◽

Long Read ◽

Variant Detection

AbstractMost structural variant detection tools use clusters of discordant read-pair and split-read alignments to identify variants, yet do not integrate depth of sequence coverage as an additional means to support or refute putative events. Here, we present duphold, as a new method to efficiently annotate structural variant calls with sequence depth information that can add (or remove) confidence to SV predicted to affect copy number. It indicates not only the change in depth across the event, but also the presence of a rapid change in depth relative to the regions surrounding the breakpoints. It uses a unique algorithm that allows the run time to be nearly independent of the number of variants. This performance is important for large, jointly-called projects with many samples, each of which must be evaluated at thousands of sites. We show that filtering on duphold annotations can greatly improve the specificity of deletion calls and that its annotations match visual inspection. Duphold can annotate structural variant predictions made from both short-read and long-read data. It is available under the MIT license at: https://github.com/brentp/duphold.

Download Full-text

SVLR: Genome Structural Variant Detection Using Long-Read Sequencing Data

Journal of Computational Biology ◽

10.1089/cmb.2021.0048 ◽

2021 ◽

Author(s):

Wenyan Gu ◽

Aizhong Zhou ◽

Lusheng Wang ◽

Shiwei Sun ◽

Xuefeng Cui ◽

...

Keyword(s):

Sequencing Data ◽

Structural Variant ◽

Long Read ◽

Variant Detection

Download Full-text

Abstract 1696: Structural variant detection with long read sequencing reveals driver and passenger mutationsin a melanoma cell line

10.1158/1538-7445.sabcs18-1696 ◽

2019 ◽

Author(s):

Aaron Wenger ◽

Marcel Nelen ◽

Meredith Ashby ◽

Wigard P. Kloosterman

Keyword(s):

Cell Line ◽

Melanoma Cell ◽

Melanoma Cell Line ◽

Structural Variant ◽

Long Read ◽

Variant Detection

Download Full-text

MsPAC: a tool for haplotype-phased structural variant detection

Bioinformatics ◽

10.1093/bioinformatics/btz618 ◽

2019 ◽

Vol 36 (3) ◽

pp. 922-924 ◽

Cited By ~ 3

Author(s):

Oscar L Rodriguez ◽

Anna Ritz ◽

Andrew J Sharp ◽

Ali Bashir

Keyword(s):

Genomic Data ◽

Supplementary Information ◽

Supplementary Data ◽

High Quality ◽

Structural Variant ◽

Long Read ◽

One Step ◽

Variant Detection ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Summary While next-generation sequencing (NGS) has dramatically increased the availability of genomic data, phased genome assembly and structural variant (SV) analyses are limited by NGS read lengths. Long-read sequencing from Pacific Biosciences and NGS barcoding from 10x Genomics hold the potential for far more comprehensive views of individual genomes. Here, we present MsPAC, a tool that combines both technologies to partition reads, assemble haplotypes (via existing software) and convert assemblies into high-quality, phased SV predictions. MsPAC represents a framework for haplotype-resolved SV calls that moves one step closer to fully resolved, diploid genomes. Availability and implementation https://github.com/oscarlr/MsPAC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Abstract 1696: Structural variant detection with long read sequencing reveals driver and passenger mutationsin a melanoma cell line

10.1158/1538-7445.am2019-1696 ◽

2019 ◽

Author(s):

Aaron Wenger ◽

Marcel Nelen ◽

Meredith Ashby ◽

Wigard P. Kloosterman

Keyword(s):

Cell Line ◽

Melanoma Cell ◽

Melanoma Cell Line ◽

Structural Variant ◽

Long Read ◽

Variant Detection

Download Full-text

Faculty Opinions recommendation of Systematic assessment of copy number variant detection via genome-wide SNP genotyping.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1123296.580444 ◽

2008 ◽

Author(s):

Yasushi Okazaki

Keyword(s):

Copy Number ◽

Copy Number Variant ◽

Snp Genotyping ◽

Systematic Assessment ◽

Genome Wide ◽

Variant Detection ◽

Copy Number Variant Detection

Download Full-text

Evidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans

Nature Communications ◽

10.1038/s41467-021-25435-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

PingHsun Hsieh ◽

Vy Dang ◽

Mitchell R. Vollger ◽

Yafei Mao ◽

Tzu-Hsueh Huang ◽

...

Keyword(s):

Copy Number ◽

Homo Sapiens ◽

Segmental Duplications ◽

Sensor Protein ◽

Long Read ◽

Number Variation ◽

Selective Forces ◽

Structural Mutations ◽

Cold Sensor ◽

Human Specific

AbstractTRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.

Download Full-text

Decomposing the subclonal structure of tumors with two-way mixture models on copy number aberrations

10.1101/278887 ◽

2018 ◽

Author(s):

An-Shun Tai ◽

Chien-Hua Peng ◽

Shih-Chi Peng ◽

Wen-Ping Hsieh

Keyword(s):

Head And Neck Cancer ◽

Head And Neck ◽

Neck Cancer ◽

Copy Number ◽

Tumor Heterogeneity ◽

Tumor Evolution ◽

Depth Information ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Copy Number Aberrations

AbstractMultistage tumorigenesis is a dynamic process characterized by the accumulation of mutations. Thus, a tumor mass is composed of genetically divergent cell subclones. With the advancement of next-generation sequencing (NGS), mathematical models have been recently developed to decompose tumor subclonal architecture from a collective genome sequencing data. Most of the methods focused on single-nucleotide variants (SNVs). However, somatic copy number aberrations (CNAs) also play critical roles in carcinogenesis. Therefore, further modeling subclonal CNAs composition would hold the promise to improve the analysis of tumor heterogeneity and cancer evolution. To address this issue, we developed a two-way mixture Poisson model, named CloneDeMix for the deconvolution of read-depth information. It can infer the subclonal copy number, mutational cellular prevalence (MCP), subclone composition, and the order in which mutations occurred in the evolutionary hierarchy. The performance of CloneDeMix was systematically assessed in simulations. As a result, the accuracy of CNA inference was nearly 93% and the MCP was also accurately restored. Furthermore, we also demonstrated its applicability using head and neck cancer samples from TCGA. Our results inform about the extent of subclonal CNA diversity, and a group of candidate genes that probably initiate lymph node metastasis during tumor evolution was also discovered. Most importantly, these driver genes are located at 11q13.3 which is highly susceptible to copy number change in head and neck cancer genomes. This study successfully estimates subclonal CNAs and exhibit the evolutionary relationships of mutation events. By doing so, we can track tumor heterogeneity and identify crucial mutations during evolution process. Hence, it facilitates not only understanding the cancer development but finding potential therapeutic targets. Briefly, this framework has implications for improved modeling of tumor evolution and the importance of inclusion of subclonal CNAs.

Download Full-text

SVIM-asm: Structural variant detection from haploid and diploid genome assemblies

10.1101/2020.10.27.356907 ◽

2020 ◽

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Genetic Information ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Diploid Genome ◽

Insertions And Deletions ◽

Structural Variant ◽

Sequencing Technologies ◽

Variant Detection ◽

Genome Assemblies

AbstractMotivationWith the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes.ResultsWe introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual.Availability and ImplementationSVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/[email protected] informationSupplementary data are available online.

Download Full-text

Multiple Displacement Amplification as a Solution for Low Copy Number Plasmid Sequencing

Frontiers in Microbiology ◽

10.3389/fmicb.2021.617487 ◽

2021 ◽

Vol 12 ◽

Author(s):

Kuan Yao ◽

Narjol González-Escalona ◽

Maria Hoffmann

Keyword(s):

Antibiotic Resistance ◽

Single Molecule ◽

Plasmid Dna ◽

Copy Number ◽

Sequence Data ◽

Multiple Displacement Amplification ◽

Fast Method ◽

Alkaline Lysis ◽

Low Copy Number ◽

Long Read

Plasmids play a major role in bacterial adaptation to environmental stress and often contribute to antibiotic resistance and disease virulence. Although the complete sequence of each plasmid is essential for studying plasmid biology, most antibiotic resistance and virulence plasmids in Salmonella are present only in a low copy number, making extraction and sequencing difficult. Long read sequencing technologies require higher concentrations of DNA to provide optimal results. To resolve this problem, we assessed the sufficiency of multiple displacement amplification (MDA) for replicating Salmonella plasmid DNA to a satisfactory concentration for accurate sequencing and multiplexing. Nine Salmonella enterica isolates, representing nine different serovars carrying plasmids for which sequence data are already available at NCBI, were cultured and their plasmids isolated using an alkaline lysis extraction protocol. We then used the Phi29 polymerase to perform MDA, thereby obtaining enough plasmid DNA for long read sequencing. These amplified plasmids were multiplexed and sequenced on one single molecule, real-time (SMRT) cell with the Pacific Biosciences (Pacbio) Sequel sequencer. We were able to close all Salmonella plasmids (sizes ranged from 38 to 166 Kb) with sequencing coverage from 24 to 2,582X. This protocol, consisting of plasmid isolation, MDA, and multiplex sequencing, is an effective and fast method for closing high-molecular weight and low-copy-number plasmids. This high throughput protocol reduces the time and cost of plasmid closure.

Download Full-text