scholarly journals Perils and pitfalls of mixed-effects regression models in biology

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9522 ◽  
Author(s):  
Matthew J. Silk ◽  
Xavier A. Harrison ◽  
David J. Hodgson

Biological systems, at all scales of organisation from nucleic acids to ecosystems, are inherently complex and variable. Biologists therefore use statistical analyses to detect signal among this systemic noise. Statistical models infer trends, find functional relationships and detect differences that exist among groups or are caused by experimental manipulations. They also use statistical relationships to help predict uncertain futures. All branches of the biological sciences now embrace the possibilities of mixed-effects modelling and its flexible toolkit for partitioning noise and signal. The mixed-effects model is not, however, a panacea for poor experimental design, and should be used with caution when inferring or deducing the importance of both fixed and random effects. Here we describe a selection of the perils and pitfalls that are widespread in the biological literature, but can be avoided by careful reflection, modelling and model-checking. We focus on situations where incautious modelling risks exposure to these pitfalls and the drawing of incorrect conclusions. Our stance is that statements of significance, information content or credibility all have their place in biological research, as long as these statements are cautious and well-informed by checks on the validity of assumptions. Our intention is to reveal potential perils and pitfalls in mixed model estimation so that researchers can use these powerful approaches with greater awareness and confidence. Our examples are ecological, but translate easily to all branches of biology.

2016 ◽  
Vol 25 (2) ◽  
pp. 225-230
Author(s):  
Cristina Fernandes do Amarante ◽  
Wagner de Souza Tassinari ◽  
Jose Luis Luque ◽  
Maria Julia Salim Pereira

Abstract The present study used regression models to evaluate the existence of factors that may influence the numerical parasite dominance with an epidemiological approximation. A database including 3,746 fish specimens and their respective parasites were used to evaluate the relationship between parasite dominance and biotic characteristics inherent to the studied hosts and the parasite taxa. Multivariate, classical, and mixed effects linear regression models were fitted. The calculations were performed using R software (95% CI). In the fitting of the classical multiple linear regression model, freshwater and planktivorous fish species and body length, as well as the species of the taxa Trematoda, Monogenea, and Hirudinea, were associated with parasite dominance. However, the fitting of the mixed effects model showed that the body length of the host and the species of the taxa Nematoda, Trematoda, Monogenea, Hirudinea, and Crustacea were significantly associated with parasite dominance. Studies that consider specific biological aspects of the hosts and parasites should expand the knowledge regarding factors that influence the numerical dominance of fish in Brazil. The use of a mixed model shows, once again, the importance of the appropriate use of a model correlated with the characteristics of the data to obtain consistent results.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Maud Delattre ◽  
Marie-Anne Poursat

AbstractWe consider joint selection of fixed and random effects in general mixed-effects models. The interpretation of estimated mixed-effects models is challenging since changing the structure of one set of effects can lead to different choices of important covariates in the model. We propose a stepwise selection algorithm to perform simultaneous selection of the fixed and random effects. It is based on Bayesian Information criteria whose penalties are adapted to mixed-effects models. The proposed procedure performs model selection in both linear and nonlinear models. It should be used in the low-dimension setting where the number of ovariates and the number of random effects are moderate with respect to the total number of observations. The performance of the algorithm is assessed via a simulation study, which includes also a comparative study with alternatives when available in the literature. The use of the method is illustrated in the clinical study of an antibiotic agent kinetics.


2019 ◽  
Vol 42 (1) ◽  
pp. 81-99
Author(s):  
Marta Lucia Corrales ◽  
Edilberto Cepeda-Cuervo

Gamma regression models are a suitable choice to model continuous variables that take positive real values. This paper presents a gamma regression model with mixed effects from a Bayesian approach. We use the parametrisation of the gamma distribution in terms of the mean and the shape parameter, both of which are modelled through regression structures that may involve fixed and random effects.  A computational implementation via Gibbs sampling is provided and illustrative examples (simulated and real data) are presented.


Methodology ◽  
2017 ◽  
Vol 13 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Pablo Livacic-Rojas ◽  
Guillermo Vallejo ◽  
Paula Fernández ◽  
Ellián Tuero-Herrero

Abstract. Low precision of the inferences of data analyzed with univariate or multivariate models of the Analysis of Variance (ANOVA) in repeated-measures design is associated to the absence of normality distribution of data, nonspherical covariance structures and free variation of the variance and covariance, the lack of knowledge of the error structure underlying the data, and the wrong choice of covariance structure from different selectors. In this study, levels of statistical power presented the Modified Brown Forsythe (MBF) and two procedures with the Mixed-Model Approaches (the Akaike’s Criterion, the Correctly Identified Model [CIM]) are compared. The data were analyzed using Monte Carlo simulation method with the statistical package SAS 9.2, a split-plot design, and considering six manipulated variables. The results show that the procedures exhibit high statistical power levels for within and interactional effects, and moderate and low levels for the between-groups effects under the different conditions analyzed. For the latter, only the Modified Brown Forsythe shows high level of power mainly for groups with 30 cases and Unstructured (UN) and Autoregressive Heterogeneity (ARH) matrices. For this reason, we recommend using this procedure since it exhibits higher levels of power for all effects and does not require a matrix type that underlies the structure of the data. Future research needs to be done in order to compare the power with corrected selectors using single-level and multilevel designs for fixed and random effects.


2020 ◽  
Vol 15 ◽  
Author(s):  
Deeksha Saxena ◽  
Mohammed Haris Siddiqui ◽  
Rajnish Kumar

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.


2021 ◽  
Vol 213 ◽  
pp. 106676
Author(s):  
Saeed Mohammadiun ◽  
Guangji Hu ◽  
Abdorreza Alavi Gharahbagh ◽  
Reza Mirshahi ◽  
Jianbing Li ◽  
...  

2014 ◽  
Vol 14 (2) ◽  
pp. 94-101 ◽  
Author(s):  
Sonia Maria Lima Salgado ◽  
Juliana Costa de Rezende ◽  
José Airton Rodrigues Nunes

The purpose of this study was to select Coffea arabica progenies for resistance to M. paranaensis in an infested coffee growing area using Henderson's mixed model methodology. Forty-one genotypes were selected at the Coffee Active Germplasm Bank of Minas Gerais, and evaluated in regard to stem diameter, number of plagiotropic branches, reaction to the nematode, and yield per plant. There was genetic variability among the genotypes studied for all the traits evaluated, and among the populations studied for yield and reaction to the nematode, indicating possibilities for obtaining genetic gains through selection in this population. There was high rate of genotypic association between all the traits studied. Coffee plants of Timor Hybrid UFV408-01 population, and F3 progenies derived from crossing Catuaí Vermelho and Amphillo MR 2161 were the most promising in the area infested by M. paranaensis.


Sign in / Sign up

Export Citation Format

Share Document