read pair Latest Research Papers

Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data

10.1101/482554 ◽

2018 ◽

Author(s):

Whitney Whitford ◽

Klaus Lehnert ◽

Russell G. Snell ◽

Jessie C. Jacobsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Read Pair ◽

False Discovery ◽

Software Packages ◽

Variant Detection ◽

Detection Software ◽

Cnv Detection

AbstractBackgroundWhole genome sequencing (WGS) has increased in popularity and decreased in cost over the past decade, rendering this approach as a viable and sensitive method for variant detection. In addition to its utility for single nucleotide variant detection, WGS data has the potential to detect Copy Number Variants (CNV) to fine resolution. Many CNV detection software packages have been developed exploiting four main types of data: read pair, split read, read depth, and assembly based methods. The aim of this study was to evaluate the efficiency of each of these main approaches in detecting deletions.MethodsWGS data and high confidence deletion calls for the individual NA12878 from the Genome in a Bottle consortium were the benchmark dataset. The performance of Breakdancer, CNVnator, Delly, FermiKit, and Pindel was assessed by comparing the accuracy and sensitivity of each software package in detecting deletions exceeding 1kb.ResultsThere was considerable variability in the outputs of the different WGS CNV detection programs. The best performance was seen from Breakdancer and Delly, with 92.6% and 96.7% sensitivity, respectively and 34.5% and 68.5% false discovery rate (FDR), respectively. In comparison, Pindel, CNVnator, and FermiKit were less effective with sensitivities of 69.1%, 66.0%, and 15.8%, respectively and FDR of 91.3%, 69.0%, and 31.7%, respectively. Concordance across software packages was poor, with only 27 of the total 612 benchmark deletions identified by all five methodologies.ConclusionsThe WGS based CNV detection tools evaluated show disparate performance in identifying deletions ≥1kb, particularly those utilising different input data characteristics. Software that exploits read pair based data had the highest sensitivity, namely Breakdancer and Delly. Breakdancer also had the second lowest false discovery rate. Therefore, in this analysis read pair methods (Breakdancer in particular) were the best performing approaches for the identification of deletions ≥1kb, balancing accuracy and sensitivity. There is potential for improvement in the detection algorithms, particularly for reducing FDR. This analysis has validated the utility of WGS based CNV detection software to reliably identify deletions, and these findings will be of use when choosing appropriate software for deletion detection, in both research and diagnostic medicine.

Download Full-text

Combining probabilistic alignments with read pair information improves accuracy of split-alignments

Bioinformatics ◽

10.1093/bioinformatics/bty398 ◽

2018 ◽

Vol 34 (21) ◽

pp. 3631-3637

Author(s):

Anish M S Shrestha ◽

Naruki Yoshikawa ◽

Kiyoshi Asai

Keyword(s):

Read Pair

Download Full-text

Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias

Journal of Computational Biology ◽

10.1089/cmb.2016.0124 ◽

2017 ◽

Vol 24 (6) ◽

pp. 581-589 ◽

Cited By ~ 1

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Null Hypothesis ◽

Structural Variation ◽

Read Pair

Download Full-text

McClintock: An integrated pipeline for detecting transposable element insertions in whole genome shotgun sequencing data

10.1101/095372 ◽

2016 ◽

Cited By ~ 2

Author(s):

Michael G. Nelson ◽

Raquel S. Linheiro ◽

Casey M. Bergman

Keyword(s):

Transposable Element ◽

Comparative Evaluation ◽

Whole Genome Shotgun ◽

Detection Methods ◽

Yeast Genome ◽

Whole Genome ◽

Read Pair ◽

Positional Accuracy ◽

Gene Target ◽

Split Read

AbstractBackgroundTransposable element (TE) insertions are among the most challenging type of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication. Nevertheless, the recent availability of large resequencing datasets has spurred the development of many new methods to detect TE insertions in whole genome shotgun sequences. These methods generate output in diverse formats and have a large number of software and data dependencies, making their comparative evaluation challenging for potential users.ResultsHere we develop an integrated bioinformatics pipeline for the detection of TE insertions in whole genome shotgun data, called McClintock (https://github.com/bergmanlab/mcclintock), that automatically runs and generates standardized output for multiple TE detection methods. We demonstrate the utility of the McClintock system by performing comparative evaluation of six TE detection methods using simulated and real genome data from the model microbal eukaryote, Saccharomyces cerevisiae. We find substantial variation among McClintock component methods in their ability to detect non-reference insertions in the yeast genome, but show that non-reference TEs at nearly all biologically-realistic locations can be detected in simulated data by combining multiple methods that use split-read and read-pair evidence. In general, our results reveal that split-read methods detect fewer non-reference TE insertions than read-pair methods, but generally have much higher positional accuracy. Analysis of a large sample of real yeast genomes reveals that most, but not all, McClintock component methods can recover known aspects of TE biology in yeast such as the transpositional activity status of families, tRNA gene target preferences, and target site duplication structure, albeit with varying levels of positional accuracy.ConclusionsOur results suggest that no single TE detection method currently provides comprehensive detection of non-reference TEs, even in the context of a simplified model eukaryotic genome like S. cerevisiae. In spite of these limitations, the McClintock system provides a framework for testing, developing and integrating results from multiple TE detection methods to achieve this ultimate aim, as well as useful guidance for yeast researchers to select appropriate TE detection tools.

Download Full-text

Structural variation detection with read pair information --- An improved null-hypothesis reduces bias

10.1101/036707 ◽

2016 ◽

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Statistical Analysis ◽

Fragment Length ◽

Null Hypothesis ◽

Structural Variation ◽

False Positives ◽

Simplified Model ◽

Read Pair ◽

New Model ◽

Mate Pair ◽

Reference Implementation

Abstract. Reads from paired-end and mate-pair libraries are often utilized to find structural variation in genomes, and one common approach is to use their fragment length for detection. After aligning read-pairs to the reference, read-pair distances are analyzed for statistically significant deviations. However, previously proposed methods are based on a simplified model of observed fragment lengths that does not agree with data. We show how this model limits statistical analysis of identifying variants and propose a new model, by adapting a model we have previously introduced for contig scaffolding, which agrees with data. From this model we derive an improved improved null hypothesis that, when applied in the variant caller CLEVER, reduces the number of false positives and corrects a bias that contributes to more deletion calls than insertion calls. A reference implementation is freely available at https://github.com/ksahlin/GetDistr.

Download Full-text

Structural Variation Detection with Read Pair Information—An Improved Null-Hypothesis Reduces Bias

Lecture Notes in Computer Science - Research in Computational Molecular Biology ◽

10.1007/978-3-319-31957-5_13 ◽

2016 ◽

pp. 176-188

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Null Hypothesis ◽

Structural Variation ◽

Read Pair

Download Full-text

Genome scaffolding with PE-contaminated mate-pair libraries

10.1101/025650 ◽

2015 ◽

Cited By ~ 1

Author(s):

Kristoffer Sahlin ◽

Rayan Chikhi ◽

Lars Arvestad

Keyword(s):

Source Code ◽

Linear Programs ◽

Read Pair ◽

Insert Size ◽

Essential Step ◽

Genome Scaffolding ◽

Mate Pair ◽

A Genome ◽

The Impact ◽

Adapter Sequence

Scaffolding is often an essential step in a genome assembly process,in which contigs are ordered and oriented using read pairs from a combination of paired-ends libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problem is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed in previous work on integrated scaffolders in end-to-end assemblers such as Allpaths-LG and MaSuRCA but the methods relies on the fact that the orientation is observable, \emph{e.g.}, by finding the junction adapter sequence in the reads. This is not always the case, making orientation and insert size of a read pair stochastic. Furthermore, work on modeling PE-contamination has so far been disregarded in stand-alone scaffolders and the effect that PE-contamination has on scaffolding quality has not been examined before. We have addressed PE-contamination in an update of our scaffolder BESST. We formulate the problem as an Integer Linear Program (ILP) and use characteristics of the problem, such as contig lengths and insert size, to efficiently solve the ILP using a linear amount (with respect to the number of contigs) of Linear Programs. Our results show significant improvement over both integrated and standalone scaffolders. The impact of modeling PE-contamination is quantified by comparison with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in increased number of misassemblies, more conservative scaffolding, and inflated assembly sizes. The model is implemented in BESST. Source code and usage instructions are found at https://github.com/ksahlin/BESST. BESST can also be downloaded using PyPI.

Download Full-text

biobambam: tools for read pair collation based algorithms on BAM files

Source Code for Biology and Medicine ◽

10.1186/1751-0473-9-13 ◽

2014 ◽

Vol 9 (1) ◽

Cited By ~ 87

Author(s):

German Tischler ◽

Steven Leonard

Keyword(s):

Read Pair

Download Full-text

ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets

Source Code for Biology and Medicine ◽

10.1186/1751-0473-9-3 ◽

2014 ◽

Vol 9 (1) ◽

pp. 3 ◽

Cited By ~ 12

Author(s):

Bernard J Pope ◽

Tú Nguyen-Dumont ◽

Fleur Hammet ◽

Daniel J Park

Keyword(s):

Massively Parallel Sequencing ◽

Variant Calling ◽

Massively Parallel ◽

Read Pair ◽

Parallel Sequencing

Download Full-text

read pair
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data

Combining probabilistic alignments with read pair information improves accuracy of split-alignments

Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias

McClintock: An integrated pipeline for detecting transposable element insertions in whole genome shotgun sequencing data

Structural variation detection with read pair information --- An improved null-hypothesis reduces bias

Structural Variation Detection with Read Pair Information—An Improved Null-Hypothesis Reduces Bias

Genome scaffolding with PE-contaminated mate-pair libraries

biobambam: tools for read pair collation based algorithms on BAM files

ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets

Export Citation Format

read pairRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data

Combining probabilistic alignments with read pair information improves accuracy of split-alignments

Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias

McClintock: An integrated pipeline for detecting transposable element insertions in whole genome shotgun sequencing data

Structural variation detection with read pair information --- An improved null-hypothesis reduces bias

Structural Variation Detection with Read Pair Information—An Improved Null-Hypothesis Reduces Bias

Genome scaffolding with PE-contaminated mate-pair libraries

biobambam: tools for read pair collation based algorithms on BAM files

ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets

read pair
Recently Published Documents