Analytic combinatorics for bioinformatics I: seeding methods

Mapping Intimacies ◽

10.1101/205427 ◽

2017 ◽

Cited By ~ 2

Author(s):

Guillaume J. Filion

Keyword(s):

Success Rate ◽

Generating Function ◽

Error Rates ◽

Type I ◽

Read Mapping ◽

Analytic Combinatorics ◽

Sequencing Errors ◽

Different Types ◽

Speed Up ◽

Combinatorial Construction

AbstractSeeding heuristics are the most widely used strategies to speed up sequence alignment in bioinformatics. Such strategies are most successful if they are calibrated, so that the speed-versus-accuracy trade-off can be properly tuned. In the widely used case of read mapping, it has been so far impossible to predict the success rate of competing seeding strategies for lack of a theoretical framework. Here I present an approach to estimate such quantities based on the theory of analytic combinatorics. In a nutshell, the strategy is to specify a combinatorial construction of reads where the seeding heuristic fails, translate this specification into a generating function using formal rules, and finally extract the probabilities of interest from the singularities of the generating function. I use this approach to construct simple estimators of the success rate of the seeding heuristic under different types of sequencing errors. I also show how the analytic combinatorics strategy can be used to compute the associated type I and type II error rates (mapping the read to the wrong location, or being unable to map the read). Finally, I show how analytic combinatorics can be used to estimate average quantities such as the expected number of errors in reads where the seeding heuristic fails. Overall, this work introduces a theoretical and practical framework to find the success rate of seeding heuristics and related problems in bioinformatics.

Download Full-text

Methionine-Specific Staining of Collagen

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010005408x ◽

1982 ◽

Vol 40 ◽

pp. 294-295

Author(s):

E.M. Kuhn ◽

K.D. Marenus ◽

M. Beer

Keyword(s):

Amino Acids ◽

Collagen Fibers ◽

Type Iii Collagen ◽

Type I ◽

Electron Microscopic ◽

Good Correspondence ◽

Specific Staining ◽

Type Iii ◽

Different Types ◽

Sulfur Containing

Fibers composed of different types of collagen cannot be differentiated by conventional electron microscopic stains. We are developing staining procedures aimed at identifying collagen fibers of different types.Pt(Gly-L-Met)Cl binds specifically to sulfur-containing amino acids. Different collagens have methionine (met) residues at somewhat different positions. A good correspondence has been reported between known met positions and Pt(GLM) bands in rat Type I SLS (collagen aggregates in which molecules lie adjacent to each other in exact register). We have confirmed this relationship in Type III collagen SLS (Fig. 1).

Download Full-text

Type I error rates and power of several versions of scaled chi-square difference tests in investigations of measurement invariance.

Psychological Methods ◽

10.1037/met0000097 ◽

2017 ◽

Vol 22 (3) ◽

pp. 467-485 ◽

Cited By ~ 4

Author(s):

Jordan Campbell Brace ◽

Victoria Savalei

Keyword(s):

Measurement Invariance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Type I Error Rates

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

The Use of Theory of Linear Mixed-Effects Models to Detect Fraudulent Erasures at an Aggregate Level

Educational and Psychological Measurement ◽

10.1177/0013164421994893 ◽

2021 ◽

pp. 001316442199489

Author(s):

Luyao Peng ◽

Sandip Sinharay

Keyword(s):

Type I Error ◽

Real Data ◽

Mixed Effects ◽

Error Rates ◽

Mixed Effects Models ◽

Type I ◽

Aggregate Level ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects ◽

Best Linear Unbiased

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.

Download Full-text

Analysis of local habitat selection and large-scale attraction/avoidance based on animal tracking data: is there a single best method?

Movement Ecology ◽

10.1186/s40462-021-00260-y ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Moritz Mercker ◽

Philipp Schwemmer ◽

Verena Peschko ◽

Leonie Enners ◽

Stefan Garthe

Keyword(s):

Habitat Selection ◽

Large Scale ◽

Error Rates ◽

Process Models ◽

Statistical Hypothesis ◽

Type I ◽

Tracking Data ◽

Selection Models ◽

Animal Tracking ◽

Step Selection

Abstract Background New wildlife telemetry and tracking technologies have become available in the last decade, leading to a large increase in the volume and resolution of animal tracking data. These technical developments have been accompanied by various statistical tools aimed at analysing the data obtained by these methods. Methods We used simulated habitat and tracking data to compare some of the different statistical methods frequently used to infer local resource selection and large-scale attraction/avoidance from tracking data. Notably, we compared spatial logistic regression models (SLRMs), spatio-temporal point process models (ST-PPMs), step selection models (SSMs), and integrated step selection models (iSSMs) and their interplay with habitat and animal movement properties in terms of statistical hypothesis testing. Results We demonstrated that only iSSMs and ST-PPMs showed nominal type I error rates in all studied cases, whereas SSMs may slightly and SLRMs may frequently and strongly exceed these levels. iSSMs appeared to have on average a more robust and higher statistical power than ST-PPMs. Conclusions Based on our results, we recommend the use of iSSMs to infer habitat selection or large-scale attraction/avoidance from animal tracking data. Further advantages over other approaches include short computation times, predictive capacity, and the possibility of deriving mechanistic movement models.

Download Full-text

Immunological Determined Plasminogen Groups And Its Possible Biological Relevance

10.1055/s-0038-1652204 ◽

1981 ◽

Author(s):

V Sachs ◽

R Dörner ◽

E Szirmai

Keyword(s):

Human Plasma ◽

Type I ◽

Type Ii ◽

Precipitation Line ◽

Gene Frequencies ◽

Type Iii ◽

Different Types ◽

Human Plasminogen ◽

Gel Diffusion ◽

Good Agreement

Anti human plasminogen sera of the rabbit precipitate human plasma in the agar gel diffusion test by means of intra-basin absorption with plasminogenfree human plasma with three different types: type I is represented by one strong precipitation line, type II by two lines, a big one and a small one, and type III by three slight but distinct lines. The following frequencies of the different types have been observed in a sample of 516 human plasmas: type I 65%, type II 33% and type III 2%. Suppose the types are phenotypical groups of a diallelic system where the types I and III represent the homozygous genotypes and the type II the heterozygous the estimated gene frequencies are in good agreement with the expected values. There is also a good agreement of the distribution of plasminogen groups determined by electrofocussing from RAUM et al. and HOBART. The plasminogen groups possibly may have also a biological meaning because the plasmas of type III always have a lesser fibrinolytic activity than the plasmas of the other types.

Download Full-text

Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses

The International Journal of Biostatistics ◽

10.2202/1557-4679.1146 ◽

2009 ◽

Vol 5 (1) ◽

Cited By ~ 65

Author(s):

Peter C Austin

Keyword(s):

Propensity Score ◽

Confidence Intervals ◽

Variance Estimation ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates

Download Full-text

Type I and type II error rates for quantitative trait loci (QTL) mapping studies using recombinant inbred mouse strains

Behavior Genetics ◽

10.1007/bf02359892 ◽

1996 ◽

Vol 26 (2) ◽

pp. 149-160 ◽

Cited By ~ 103

Author(s):

J. K. Belknap ◽

S. R. Mitchell ◽

L. A. O'Toole ◽

M. L. Helms ◽

J. C. Crabbe

Keyword(s):

Quantitative Trait ◽

Recombinant Inbred ◽

Inbred Mouse ◽

Error Rates ◽

Mouse Strains ◽

Type I ◽

Inbred Mouse Strains ◽

Type Ii ◽

Type Ii Error ◽

Recombinant Inbred Mouse

Download Full-text

The Robustness of the Likelihood Ratio Chi-Square Test for Structural Equation Models: A Meta-Analysis

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986026001105 ◽

2001 ◽

Vol 26 (1) ◽

pp. 105-132 ◽

Cited By ~ 30

Author(s):

Douglas A. Powell ◽

William D. Schafer

Keyword(s):

Structural Equation ◽

Structural Equation Models ◽

Type I Error ◽

Meta Analysis ◽

Generalized Least Squares ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Distribution Free ◽

Projection Techniques

The robustness literature for the structural equation model was synthesized following the method of Harwell which employs meta-analysis as developed by Hedges and Vevea. The study focused on the explanation of empirical Type I error rates for six principal classes of estimators: two that assume multivariate normality (maximum likelihood and generalized least squares), elliptical estimators, two distribution-free estimators (asymptotic and others), and latent projection. Generally, the chi-square tests for overall model fit were found to be sensitive to non-normality and the size of the model for all estimators (with the possible exception of the elliptical estimators with respect to model size and the latent projection techniques with respect to non-normality). The asymptotic distribution-free (ADF) and latent projection techniques were also found to be sensitive to sample sizes. Distribution-free methods other than ADF showed, in general, much less sensitivity to all factors considered.

Download Full-text

GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data

Genome Biology ◽

10.1186/s13059-021-02403-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jisu Shin ◽

Sang Hong Lee

Keyword(s):

Complex Traits ◽

Error Rates ◽

Type I ◽

Phenotypic Variance ◽

Environment Interaction ◽

Summary Statistics ◽

Gxe Interaction ◽

Genome Wide ◽

Scale Data ◽

Variance Explained

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.

Download Full-text