Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

Non-Parametric Generalized Additive Models as a Tool for Evaluating Policy Interventions

Mathematics ◽

10.3390/math9040299 ◽

2021 ◽

Vol 9 (4) ◽

pp. 299

Author(s):

Jaime Pinilla ◽

Miguel Negrín

Keyword(s):

Linear Regression ◽

Regression Models ◽

Linear Trend ◽

Generalized Additive Models ◽

Additive Models ◽

Linear Regression Models ◽

Non Linear ◽

Nonlinear Trends ◽

The Impact ◽

Non Parametric

The interrupted time series analysis is a quasi-experimental design used to evaluate the effectiveness of an intervention. Segmented linear regression models have been the most used models to carry out this analysis. However, they assume a linear trend that may not be appropriate in many situations. In this paper, we show how generalized additive models (GAMs), a non-parametric regression-based method, can be useful to accommodate nonlinear trends. An analysis with simulated data is carried out to assess the performance of both models. Data were simulated from linear and non-linear (quadratic and cubic) functions. The results of this analysis show how GAMs improve on segmented linear regression models when the trend is non-linear, but they also show a good performance when the trend is linear. A real-life application where the impact of the 2012 Spanish cost-sharing reforms on pharmaceutical prescription is also analyzed. Seasonality and an indicator variable for the stockpiling effect are included as explanatory variables. The segmented linear regression model shows good fit of the data. However, the GAM concludes that the hypothesis of linear trend is rejected. The estimated level shift is similar for both models but the cumulative absolute effect on the number of prescriptions is lower in GAM.

Download Full-text

Normative modeling of neuroimaging data using generalized additive models of location scale and shape

10.1101/2021.06.14.448106 ◽

2021 ◽

Author(s):

Richard Dinga ◽

Charlotte J. Fraza ◽

Johanna M. M. Bayer ◽

Seyed M. Kia ◽

Christian F. Beckmann ◽

...

Keyword(s):

Generalized Additive Models ◽

Reference Sample ◽

Additive Models ◽

Diagnostic Tools ◽

Modeling Framework ◽

Model Order Selection ◽

Potential Biomarker ◽

Neuroimaging Data ◽

Non Gaussian ◽

Normative Models

Normative modeling aims to quantify the degree to which an individual's brain deviates from a reference sample with respect to one or more variables, which can be used as a potential biomarker of a healthy brain and as a tool to study heterogeneity of psychiatric disorders. The application of normative models is hindered by methodological challenges and lacks standards for the usage and evaluation of normative models. In this paper, we present generalized additive models for location scale and shape (GAMLSS) for normative modeling of neuroimaging data, a flexible modeling framework that can model heteroskedasticity, non-linear effects of variables, and hierarchical structure of the data. It can model non-Gaussian distributions, and it allows for an automatic model order selection, thus improving the accuracy of normative models while mitigating problems of overfitting. Furthermore, we describe measures and diagnostic tools suitable for evaluating normative models and step-by-step examples of normative modeling, including fitting several candidate models, selecting the best models, and transferring them to new scan sites.

Download Full-text

Comparison between Highly Complex Location Models and GAMLSS

Entropy ◽

10.3390/e23040469 ◽

2021 ◽

Vol 23 (4) ◽

pp. 469

Author(s):

Thiago G. Ramires ◽

Luiz R. Nakamura ◽

Ana J. Righetto ◽

Renan J. Carvalho ◽

Lucas A. Vieira ◽

...

Keyword(s):

Regression Models ◽

Generalized Additive Models ◽

Real Data ◽

Additive Models ◽

Location Model ◽

Main Motivation ◽

Location Models ◽

Location Class ◽

Response Variables

This paper presents a discussion regarding regression models, especially those belonging to the location class. Our main motivation is that, with simple distributions having simple interpretations, in some cases, one gets better results than the ones obtained with overly complex distributions. For instance, with the reverse Gumbel (RG) distribution, it is possible to explain response variables by making use of the generalized additive models for location, scale, and shape (GAMLSS) framework, which allows the fitting of several parameters (characteristics) of the probabilistic distributions, like mean, mode, variance, and others. Three real data applications are used to compare several location models against the RG under the GAMLSS framework. The intention is to show that the use of a simple distribution (e.g., RG) based on a more sophisticated regression structure may be preferable than using a more complex location model.

Download Full-text

Interactively visualizing distributional regression models with distreg.vis

Statistical Modelling ◽

10.1177/1471082x211007308 ◽

2021 ◽

pp. 1471082X2110073

Author(s):

Stanislaus Stadlmann ◽

Thomas Kneib

Keyword(s):

Regression Models ◽

Generalized Additive Models ◽

Additive Models ◽

Estimation Model ◽

Response Distribution ◽

Conditional Response ◽

Link Functions ◽

Covariate Effects ◽

Parametric Response ◽

Distributional Regression

A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.

Download Full-text

Error Bound of Mode-Based Additive Models

Entropy ◽

10.3390/e23060651 ◽

2021 ◽

Vol 23 (6) ◽

pp. 651

Author(s):

Hao Deng ◽

Jianghong Chen ◽

Biqin Song ◽

Zhibin Pan

Keyword(s):

Variable Selection ◽

Hilbert Spaces ◽

Regression Models ◽

Reproducing Kernel ◽

Additive Models ◽

Generalization Error ◽

Proposed Model ◽

Modal Regression ◽

Kernel Hilbert Spaces ◽

Non Gaussian

Due to their flexibility and interpretability, additive models are powerful tools for high-dimensional mean regression and variable selection. However, the least-squares loss-based mean regression models suffer from sensitivity to non-Gaussian noises, and there is also a need to improve the model’s robustness. This paper considers the estimation and variable selection via modal regression in reproducing kernel Hilbert spaces (RKHSs). Based on the mode-induced metric and two-fold Lasso-type regularizer, we proposed a sparse modal regression algorithm and gave the excess generalization error. The experimental results demonstrated the effectiveness of the proposed model.

Download Full-text

Sovereign Risk Indices and Bayesian Theory Averaging

Econometrics ◽

10.3390/econometrics8020022 ◽

2020 ◽

Vol 8 (2) ◽

pp. 22

Author(s):

Alex Lenkoski ◽

Fredrik L. Aanes

Keyword(s):

Sovereign Debt ◽

Generalized Additive Models ◽

Model Averaging ◽

Research Community ◽

Additive Models ◽

Sovereign Risk ◽

Risk Indices ◽

Growth Development ◽

Non Gaussian ◽

Selection Of

In economic applications, model averaging has found principal use in examining the validity of various theories related to observed heterogeneity in outcomes such as growth, development, and trade. Though often easy to articulate, these theories are imperfectly captured quantitatively. A number of different proxies are often collected for a given theory and the uneven nature of this collection requires care when employing model averaging. Furthermore, if valid, these theories ought to be relevant outside of any single narrowly focused outcome equation. We propose a methodology which treats theories as represented by latent indices, these latent processes controlled by model averaging on the proxy level. To achieve generalizability of the theory index our framework assumes a collection of outcome equations. We accommodate a flexible set of generalized additive models, enabling non-Gaussian outcomes to be included. Furthermore, selection of relevant theories also occurs on the outcome level, allowing for theories to be differentially valid. Our focus is on creating a set of theory-based indices directed at understanding a country’s potential risk of macroeconomic collapse. These Sovereign Risk Indices are calibrated across a set of different “collapse” criteria, including default on sovereign debt, heightened potential for high unemployment or inflation and dramatic swings in foreign exchange values. The goal of this exercise is to render a portable set of country/year theory indices which can find more general use in the research community.

Download Full-text

Using generalized additive models to analyze biomedical non-linear longitudinal data

10.1101/2021.06.10.447970 ◽

2021 ◽

Author(s):

Ariel I. Mundo ◽

John R. Tipton ◽

Timothy J. Muldoon

Keyword(s):

Longitudinal Data ◽

Biomedical Research ◽

Repeated Measures ◽

Generalized Additive Models ◽

Simulated Data ◽

Additive Models ◽

Biomedical Literature ◽

Missing Observations ◽

Non Linear ◽

Biased Estimates

In biomedical research, the outcome of longitudinal studies has been traditionally analyzed using the repeated measures analysis of variance (rm-ANOVA) or more recently, linear mixed models (LMEMs). Although LMEMs are less restrictive than rm-ANOVA in terms of correlation and missing observations, both methodologies share an assumption of linearity in the measured response, which results in biased estimates and unreliable inference when they are used to analyze data where the trends are non-linear, which is a common occurrence in biomedical research. In contrast, generalized additive models (GAMs) relax the linearity assumption, and allow the data to determine the fit of the model while permitting missing observations and different correlation structures. Therefore, GAMs present an excellent choice to analyze non-linear longitudinal data in the context of biomedical research. This paper summarizes the limitations of rm-ANOVA and LMEMs and uses simulated data to visually show how both methods produce biased estimates when used on non-linear data. We also present the basic theory of GAMs, and using trends of oxygen saturation in tumors reported in the biomedical literature, we simulate example longitudinal data (2 treatment groups, 10 subjects per group, 6 repeated measures for each group) to demonstrate how these models can be computationally implemented. We show that GAMs are able to produce estimates that are consistent with the trends of biomedical non-linear data even in the case when missing observations exist (with 40% of the observations missing), allowing reliable inference from the data. To make this work reproducible, the code and data used in this paper are available at: https://github.com/aimundo/GAMs-biomedical-research.

Download Full-text

Larger Democratic swings in United States counties with Black Lives Matter protests in preliminary results of the 2020 presidential election

10.31235/osf.io/92qeg ◽

2021 ◽

Author(s):

Drew Thomas

Keyword(s):

Presidential Election ◽

Regression Models ◽

Generalized Additive Models ◽

Additive Models ◽

Democratic Candidate ◽

Black Lives Matter ◽

Percentage Points ◽

Voting Age ◽

Us Presidential Election ◽

Hispanic Voters

Media commentary has suggested that recent Black Lives Matter (BLM) protests, particularly riots, drove voters, particularly Hispanic voters, away from Democratic candidate Joe Biden in the 2020 US presidential election. I test these hypotheses with county-level regression models of 2016-to-2020 swing towards the Democratic presidential candidate, using the presence and intensity of BLM non-riot protests and riots as regressors, controlling for state and many background demographic factors (population density, household size, racial composition, etc.). The models (generalized additive models) that control most aggressively for background factors find small and positive associations between BLM protests and Democratic swing: counties with non-riot BLM protests swung more towards Joe Biden by 0.2 percentage points, and counties with BLM-associated riots swung more towards Joe Biden by (a statistically insignificant) 0.1 percentage points. The extra BLM-protest swing was not statistically significantly different in counties with relatively many Hispanic voting-age citizens, although it was weaker in counties with relatively many Asian voting-age citizens. Inasmuch as these results reflect causal impacts of BLM protests, the protests enhanced the Democratic swing but were probably not electorally decisive. My most elaborate model suggests that a lack of BLM protests in 2020 would have flipped only one state: Biden might have narrowly lost Arizona.

Download Full-text

A general framework for functional regression modelling

Statistical Modelling ◽

10.1177/1471082x16681317 ◽

2017 ◽

Vol 17 (1-2) ◽

pp. 1-35 ◽

Cited By ~ 32

Author(s):

Sonja Greven ◽

Fabian Scheipl

Keyword(s):

Functional Data ◽

Regression Models ◽

Generalized Additive Models ◽

Large Body ◽

Imaging Study ◽

Additive Models ◽

Gradient Boosting ◽

Functional Regression ◽

Functional Responses ◽

Additive Mixed Models

Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for ‘generalized’ functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases—particularly splines and functional principal components—and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost , respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online.

Download Full-text

Exploring the non-linear relationship between various categories of Crimes and GDP: A case study using Generalized Additive Models

2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI) ◽

10.1109/la-cci.2018.8625254 ◽

2018 ◽

Author(s):

Satyakama Paul ◽

Ali N Hasan ◽

Bruno Mario

Keyword(s):

Linear Relationship ◽

Generalized Additive Models ◽

Additive Models ◽

Non Linear

Download Full-text