The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models

Stefan Th. Gries

doi:10.3366/cor.2015.0068

The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models

Corpora ◽

10.3366/cor.2015.0068 ◽

2015 ◽

Vol 10 (1) ◽

pp. 95-125 ◽

Cited By ~ 63

Author(s):

Stefan Th. Gries

Keyword(s):

Statistical Analysis ◽

Statistical Method ◽

Gold Standard ◽

Corpus Linguistics ◽

Regression Models ◽

Mixed Effects ◽

Mixed Effects Models ◽

Regression Modelling ◽

Corpus Linguistic ◽

Multi Level

Much statistical analysis of psycholinguistic data is now being done with so-called mixed-effects regression models. This development was spearheaded by a few highly influential introductory articles that (i) showed how these regression models are superior to what was the previous gold standard and, perhaps even more importantly, (ii) showed how these models are used practically. Corpus linguistics can benefit from mixed-effects/multi-level models for the same reason that psycholinguistics can – because, for example, speaker-specific and lexically specific idiosyncrasies can be accounted for elegantly; but, in fact, corpus linguistics needs them even more because (i) corpus-linguistic data are observational and, thus, usually unbalanced and messy/noisy, and (ii) most widely used corpora come with a hierarchical structure that corpus linguists routinely fail to consider. Unlike nearly all overviews of mixed-effects/multi-level modelling, this paper is specifically written for corpus linguists to get more of them to start using these techniques more. After a short methodological history, I provide a non-technical introduction to mixed-effects models and then discuss in detail one example – particle placement in English – to show how mixed-effects/multi-level modelling results can be obtained and how they are far superior to those of traditional regression modelling.

Download Full-text

Evaluating Logistic Mixed-Effects Models of Corpus-Linguistic Data in Light of Lexical Diffusion

Quantitative Methods in the Humanities and Social Sciences - Mixed-Effects Regression Models in Linguistics ◽

10.1007/978-3-319-69830-4_6 ◽

2018 ◽

pp. 99-116 ◽

Cited By ~ 4

Author(s):

Danielle Barth ◽

Vsevolod Kapatsinski

Keyword(s):

Mixed Effects ◽

Mixed Effects Models ◽

Linguistic Data ◽

Corpus Linguistic ◽

Lexical Diffusion

Download Full-text

Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research

Neuron ◽

10.1016/j.neuron.2021.10.030 ◽

2021 ◽

Author(s):

Zhaoxia Yu ◽

Michele Guindani ◽

Steven F. Grieco ◽

Lujia Chen ◽

Todd C. Holmes ◽

...

Keyword(s):

Statistical Analysis ◽

Mixed Effects ◽

T Test ◽

Mixed Effects Models ◽

Neuroscience Research ◽

Rigorous Statistical Analysis

Download Full-text

Multi-level mixed effects models for bead arrays

Bioinformatics ◽

10.1093/bioinformatics/btq708 ◽

2010 ◽

Vol 27 (5) ◽

pp. 633-640 ◽

Cited By ~ 2

Author(s):

Ryung S. Kim ◽

Juan Lin

Keyword(s):

Mixed Effects ◽

Mixed Effects Models ◽

Multi Level ◽

Bead Arrays

Download Full-text

Statistical analysis of longitudinal neuroimage data with Linear Mixed Effects models

NeuroImage ◽

10.1016/j.neuroimage.2012.10.065 ◽

2013 ◽

Vol 66 ◽

pp. 249-260 ◽

Cited By ~ 158

Author(s):

Jorge L. Bernal-Rusiel ◽

Douglas N. Greve ◽

Martin Reuter ◽

Bruce Fischl ◽

Mert R. Sabuncu

Keyword(s):

Statistical Analysis ◽

Mixed Effects ◽

Mixed Effects Models ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects

Download Full-text

The Application of a Novel Statistical Method for Syndromic Surveillance in England

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v7i1.5814 ◽

2015 ◽

Vol 7 (1) ◽

Author(s):

Roger Morbey ◽

Helen Hughes ◽

Alex Elliot ◽

Neville Verlander ◽

Nick Andrews ◽

...

Keyword(s):

Public Health ◽

Statistical Method ◽

Real Time ◽

Syndromic Surveillance ◽

Mixed Effects ◽

Local Models ◽

Multi Level ◽

First Time ◽

Design And Application

This paper describes the design and application of a new statistical method for real-time syndromic surveillance, used by Public Health England. The Rising Activity, Multi-level Mixed effects, Indicator Emphasis (RAMMIE) statistical method was developed and tested alongside existing methods before being applied to a suite of syndromic surveillance in operation in England. The RAMMIE method has proved to be a reliable, effective method for generating automated alarms for syndromic surveillance. The multi-level models have enabled local models to be created for the first time across all systems and models have proved themselves to be robust across all the signals.

Download Full-text

Random effects structure for confirmatory hypothesis testing: Keep it maximal

10.31234/osf.io/39mhs ◽

2018 ◽

Author(s):

Dale Barr ◽

Roger Philip Levy ◽

Christoph Scheepers ◽

Harry Tily

Keyword(s):

Hypothesis Testing ◽

Random Effects ◽

Gold Standard ◽

Mixed Effects ◽

Mixed Effects Models ◽

Data Driven ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects ◽

Within Subjects ◽

Maximal Models

Linear mixed-effects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, many researchers do not seem to appreciate how random effects structures affect the generalizability of an analysis. Here, we argue that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that LMEMs generalize best when they include the maximal random effects structure justified by the design. The generalization performance of LMEMs including data-driven random effects structures strongly depends upon modeling criteria and sample size, yielding reasonable results on moderately-sized samples when conservative criteria are used, but with little or no power advantage over maximal models. Finally, random-intercepts-only LMEMs used on within-subjects and/or within-items data from populations where subjects and/or items vary in their sensitivity to experimental manipulations always generalize worse than separate F1 and F2 tests, and in many cases, even worse than F1 alone. Maximal LMEMs should be the ‘gold standard’ for confirmatory hypothesis testing in psycholinguistics and beyond.

Download Full-text

Stand Volume Growth Modeling with Mixed-Effects Models and Quantile Regressions for Major Forest Types in the Eastern Daxing’an Mountains, Northeast China

Forests ◽

10.3390/f12081111 ◽

2021 ◽

Vol 12 (8) ◽

pp. 1111

Author(s):

Tao Wang ◽

Longfei Xie ◽

Zheng Miao ◽

Faris Rafi Almay Widagdo ◽

Lihu Dong ◽

...

Keyword(s):

Quantile Regression ◽

Northeast China ◽

Regression Models ◽

Volume Growth ◽

Mixed Effects ◽

Forest Growth ◽

Mixed Effects Models ◽

Percentage Error ◽

Forest Types ◽

Daxing’An Mountains

The relative growth rate (RGRnv) is the standardized measurement of forest growth, whereby excluding the size differences between individuals allows their performance to be compared equally. The RGRnv model was developed using the National Forest Inventory (NFI) data on the Daxing’an Mountains, in Northeast China, which contain Dahurian larch (Larix gmelinii Rupr.), white birch (Betula platyphylla Suk.), and mixed coniferous–broadleaf forests. Four predictor variables—i.e., quadratic mean diameter (Dq), stand basal area (G), average tree height (Ha), and altitude (A)—and four different methods—i.e., the nonlinear mixed-effects models (NLME), three nonlinear quantile regression (NQR3), five nonlinear quantile regression (NQR5), and nine nonlinear quantile regression (NQR9) models—were used in this study. All the models were validated using the leave-one-out method. The results showed that (1) the mixed coniferous–broadleaf forest presented the highest RGRnv; (2) the RGRnv was negatively correlated with the four predictors, and the heteroscedasticity reduced significantly after the weighting function was integrated into the models; and (3) the quantile regression models performed better than NLME, and NQR9 outperformed both NQR3 and NQR5. To make more accurate predictions, parameters of the adjusted mixed-effects and quantile regression models should be recalculated and localized using sampled RGRnv in each region and then applied to predict all the other RGRnv of plots. MAPE% indicates the mean absolute percentage error. The values were stable when the sample numbers were greater than or equal to six across the three forest types, which showed relatively accurate and lowest-cost prediction results.

Download Full-text

Demography and management of the invasive plant species Hypericum perforatum. I. Using multi-level mixed-effects models for characterizing growth, survival and fecundity in a long-term data set

Journal of Applied Ecology ◽

10.1046/j.1365-2664.2003.00821.x ◽

2003 ◽

Vol 40 (3) ◽

pp. 481-493 ◽

Cited By ~ 83

Author(s):

Yvonne M. Buckley ◽

David T. Briese ◽

Mark Rees

Keyword(s):

Plant Species ◽

Hypericum Perforatum ◽

Invasive Plant ◽

Mixed Effects ◽

Mixed Effects Models ◽

Invasive Plant Species ◽

Data Set ◽

Multi Level ◽

Term Data

Download Full-text

A segment level analysis of multi-vehicle motorcycle crashes in Ohio using Bayesian multi-level mixed effects models

Safety Science ◽

10.1016/j.ssci.2013.12.006 ◽

2014 ◽

Vol 66 ◽

pp. 47-53 ◽

Cited By ~ 14

Author(s):

Thomas Flask ◽

William H. Schneider ◽

Dominique Lord

Keyword(s):

Mixed Effects ◽

Mixed Effects Models ◽

Multi Level ◽

Motorcycle Crashes ◽

Level Analysis

Download Full-text

Corrigendum to “Statistical analysis of longitudinal neuroimage data with Linear Mixed Effects models” [NeuroImage 66 (1 February 2013) 249–260]

NeuroImage ◽

10.1016/j.neuroimage.2014.12.053 ◽

2015 ◽

Vol 108 ◽

pp. 110

Author(s):

Jorge L. Bernal-Rusiel ◽

Douglas N. Greve ◽

Martin Reuter ◽

Bruce Fischl ◽

Mert R. Sabuncu

Keyword(s):

Statistical Analysis ◽

Mixed Effects ◽

Mixed Effects Models ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects

Download Full-text