scholarly journals Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

2018 ◽  
Vol 9 ◽  
Author(s):  
Christophe Coupé
Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 299
Author(s):  
Jaime Pinilla ◽  
Miguel Negrín

The interrupted time series analysis is a quasi-experimental design used to evaluate the effectiveness of an intervention. Segmented linear regression models have been the most used models to carry out this analysis. However, they assume a linear trend that may not be appropriate in many situations. In this paper, we show how generalized additive models (GAMs), a non-parametric regression-based method, can be useful to accommodate nonlinear trends. An analysis with simulated data is carried out to assess the performance of both models. Data were simulated from linear and non-linear (quadratic and cubic) functions. The results of this analysis show how GAMs improve on segmented linear regression models when the trend is non-linear, but they also show a good performance when the trend is linear. A real-life application where the impact of the 2012 Spanish cost-sharing reforms on pharmaceutical prescription is also analyzed. Seasonality and an indicator variable for the stockpiling effect are included as explanatory variables. The segmented linear regression model shows good fit of the data. However, the GAM concludes that the hypothesis of linear trend is rejected. The estimated level shift is similar for both models but the cumulative absolute effect on the number of prescriptions is lower in GAM.


2021 ◽  
Author(s):  
Richard Dinga ◽  
Charlotte J. Fraza ◽  
Johanna M. M. Bayer ◽  
Seyed M. Kia ◽  
Christian F. Beckmann ◽  
...  

Normative modeling aims to quantify the degree to which an individual's brain deviates from a reference sample with respect to one or more variables, which can be used as a potential biomarker of a healthy brain and as a tool to study heterogeneity of psychiatric disorders. The application of normative models is hindered by methodological challenges and lacks standards for the usage and evaluation of normative models. In this paper, we present generalized additive models for location scale and shape (GAMLSS) for normative modeling of neuroimaging data, a flexible modeling framework that can model heteroskedasticity, non-linear effects of variables, and hierarchical structure of the data. It can model non-Gaussian distributions, and it allows for an automatic model order selection, thus improving the accuracy of normative models while mitigating problems of overfitting. Furthermore, we describe measures and diagnostic tools suitable for evaluating normative models and step-by-step examples of normative modeling, including fitting several candidate models, selecting the best models, and transferring them to new scan sites.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 469
Author(s):  
Thiago G. Ramires ◽  
Luiz R. Nakamura ◽  
Ana J. Righetto ◽  
Renan J. Carvalho ◽  
Lucas A. Vieira ◽  
...  

This paper presents a discussion regarding regression models, especially those belonging to the location class. Our main motivation is that, with simple distributions having simple interpretations, in some cases, one gets better results than the ones obtained with overly complex distributions. For instance, with the reverse Gumbel (RG) distribution, it is possible to explain response variables by making use of the generalized additive models for location, scale, and shape (GAMLSS) framework, which allows the fitting of several parameters (characteristics) of the probabilistic distributions, like mean, mode, variance, and others. Three real data applications are used to compare several location models against the RG under the GAMLSS framework. The intention is to show that the use of a simple distribution (e.g., RG) based on a more sophisticated regression structure may be preferable than using a more complex location model.


2021 ◽  
pp. 1471082X2110073
Author(s):  
Stanislaus Stadlmann ◽  
Thomas Kneib

A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 651
Author(s):  
Hao Deng ◽  
Jianghong Chen ◽  
Biqin Song ◽  
Zhibin Pan

Due to their flexibility and interpretability, additive models are powerful tools for high-dimensional mean regression and variable selection. However, the least-squares loss-based mean regression models suffer from sensitivity to non-Gaussian noises, and there is also a need to improve the model’s robustness. This paper considers the estimation and variable selection via modal regression in reproducing kernel Hilbert spaces (RKHSs). Based on the mode-induced metric and two-fold Lasso-type regularizer, we proposed a sparse modal regression algorithm and gave the excess generalization error. The experimental results demonstrated the effectiveness of the proposed model.


Econometrics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 22
Author(s):  
Alex Lenkoski ◽  
Fredrik L. Aanes

In economic applications, model averaging has found principal use in examining the validity of various theories related to observed heterogeneity in outcomes such as growth, development, and trade. Though often easy to articulate, these theories are imperfectly captured quantitatively. A number of different proxies are often collected for a given theory and the uneven nature of this collection requires care when employing model averaging. Furthermore, if valid, these theories ought to be relevant outside of any single narrowly focused outcome equation. We propose a methodology which treats theories as represented by latent indices, these latent processes controlled by model averaging on the proxy level. To achieve generalizability of the theory index our framework assumes a collection of outcome equations. We accommodate a flexible set of generalized additive models, enabling non-Gaussian outcomes to be included. Furthermore, selection of relevant theories also occurs on the outcome level, allowing for theories to be differentially valid. Our focus is on creating a set of theory-based indices directed at understanding a country’s potential risk of macroeconomic collapse. These Sovereign Risk Indices are calibrated across a set of different “collapse” criteria, including default on sovereign debt, heightened potential for high unemployment or inflation and dramatic swings in foreign exchange values. The goal of this exercise is to render a portable set of country/year theory indices which can find more general use in the research community.


2021 ◽  
Author(s):  
Ariel I. Mundo ◽  
John R. Tipton ◽  
Timothy J. Muldoon

In biomedical research, the outcome of longitudinal studies has been traditionally analyzed using the repeated measures analysis of variance (rm-ANOVA) or more recently, linear mixed models (LMEMs). Although LMEMs are less restrictive than rm-ANOVA in terms of correlation and missing observations, both methodologies share an assumption of linearity in the measured response, which results in biased estimates and unreliable inference when they are used to analyze data where the trends are non-linear, which is a common occurrence in biomedical research. In contrast, generalized additive models (GAMs) relax the linearity assumption, and allow the data to determine the fit of the model while permitting missing observations and different correlation structures. Therefore, GAMs present an excellent choice to analyze non-linear longitudinal data in the context of biomedical research. This paper summarizes the limitations of rm-ANOVA and LMEMs and uses simulated data to visually show how both methods produce biased estimates when used on non-linear data. We also present the basic theory of GAMs, and using trends of oxygen saturation in tumors reported in the biomedical literature, we simulate example longitudinal data (2 treatment groups, 10 subjects per group, 6 repeated measures for each group) to demonstrate how these models can be computationally implemented. We show that GAMs are able to produce estimates that are consistent with the trends of biomedical non-linear data even in the case when missing observations exist (with 40% of the observations missing), allowing reliable inference from the data. To make this work reproducible, the code and data used in this paper are available at: https://github.com/aimundo/GAMs-biomedical-research.


2021 ◽  
Author(s):  
Drew Thomas

Media commentary has suggested that recent Black Lives Matter (BLM) protests, particularly riots, drove voters, particularly Hispanic voters, away from Democratic candidate Joe Biden in the 2020 US presidential election. I test these hypotheses with county-level regression models of 2016-to-2020 swing towards the Democratic presidential candidate, using the presence and intensity of BLM non-riot protests and riots as regressors, controlling for state and many background demographic factors (population density, household size, racial composition, etc.). The models (generalized additive models) that control most aggressively for background factors find small and positive associations between BLM protests and Democratic swing: counties with non-riot BLM protests swung more towards Joe Biden by 0.2 percentage points, and counties with BLM-associated riots swung more towards Joe Biden by (a statistically insignificant) 0.1 percentage points. The extra BLM-protest swing was not statistically significantly different in counties with relatively many Hispanic voting-age citizens, although it was weaker in counties with relatively many Asian voting-age citizens. Inasmuch as these results reflect causal impacts of BLM protests, the protests enhanced the Democratic swing but were probably not electorally decisive. My most elaborate model suggests that a lack of BLM protests in 2020 would have flipped only one state: Biden might have narrowly lost Arizona.


2017 ◽  
Vol 17 (1-2) ◽  
pp. 1-35 ◽  
Author(s):  
Sonja Greven ◽  
Fabian Scheipl

Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for ‘generalized’ functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases—particularly splines and functional principal components—and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost , respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online.


Sign in / Sign up

Export Citation Format

Share Document