Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias

Kristoffer Sahlin; Mattias Frånberg; Lars Arvestad

doi:10.1089/cmb.2016.0124

Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias

Journal of Computational Biology ◽

10.1089/cmb.2016.0124 ◽

2017 ◽

Vol 24 (6) ◽

pp. 581-589 ◽

Cited By ~ 1

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Null Hypothesis ◽

Structural Variation ◽

Read Pair

Download Full-text

Structural Variation Detection with Read Pair Information—An Improved Null-Hypothesis Reduces Bias

Lecture Notes in Computer Science - Research in Computational Molecular Biology ◽

10.1007/978-3-319-31957-5_13 ◽

2016 ◽

pp. 176-188

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Null Hypothesis ◽

Structural Variation ◽

Read Pair

Download Full-text

Structural variation detection with read pair information --- An improved null-hypothesis reduces bias

10.1101/036707 ◽

2016 ◽

Author(s):

Kristoffer Sahlin ◽

Mattias Frånberg ◽

Lars Arvestad

Keyword(s):

Statistical Analysis ◽

Fragment Length ◽

Null Hypothesis ◽

Structural Variation ◽

False Positives ◽

Simplified Model ◽

Read Pair ◽

New Model ◽

Mate Pair ◽

Reference Implementation

Abstract. Reads from paired-end and mate-pair libraries are often utilized to find structural variation in genomes, and one common approach is to use their fragment length for detection. After aligning read-pairs to the reference, read-pair distances are analyzed for statistically significant deviations. However, previously proposed methods are based on a simplified model of observed fragment lengths that does not agree with data. We show how this model limits statistical analysis of identifying variants and propose a new model, by adapting a model we have previously introduced for contig scaffolding, which agrees with data. From this model we derive an improved improved null hypothesis that, when applied in the variant caller CLEVER, reduces the number of false positives and corrects a bias that contributes to more deletion calls than insertion calls. A reference implementation is freely available at https://github.com/ksahlin/GetDistr.

Download Full-text

An Alternative to Cohen's κ

European Psychologist ◽

10.1027/1016-9040.11.1.12 ◽

2006 ◽

Vol 11 (1) ◽

pp. 12-24 ◽

Cited By ~ 19

Author(s):

Alexander von Eye

Keyword(s):

Simulation Study ◽

Null Hypothesis ◽

Categorical Variables ◽

Alternative Measure ◽

Rater Agreement ◽

Verbal Processing ◽

Heavy Tailed ◽

Applicant Selection

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In this article, a new coefficient, κ s , is proposed as an alternative measure of rater agreement. Both κ and κ s allow researchers to determine whether agreement in groups of two or more raters is significantly beyond chance. Stouffer's z is used to test the null hypothesis that κ s = 0. The coefficient κ s allows one, in addition to evaluating rater agreement in a fashion parallel to κ, to (1) examine subsets of cells in agreement tables, (2) examine cells that indicate disagreement, (3) consider alternative chance models, (4) take covariates into account, and (5) compare independent samples. Results from a simulation study are reported, which suggest that (a) the four measures of rater agreement, Cohen's κ, Brennan and Prediger's κ n , raw agreement, and κ s are sensitive to the same data characteristics when evaluating rater agreement and (b) both the z-statistic for Cohen's κ and Stouffer's z for κ s are unimodally and symmetrically distributed, but slightly heavy-tailed. Examples use data from verbal processing and applicant selection.

Download Full-text