scholarly journals Normalized Information Criteria and Model Selection in the Presence of Missing Data

Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2474
Author(s):  
Nitzan Cohen ◽  
Yakir Berchenko

Information criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are commonly used for model selection. However, the current theory does not support unconventional data, so naive use of these criteria is not suitable for data with missing values. Imputation, at the core of most alternative methods, is both distorted as well as computationally demanding. We propose a new approach that enables the use of classic well-known information criteria for model selection when there are missing data. We adapt the current theory of information criteria through normalization, accounting for the different sample sizes used for each candidate model (focusing on AIC and BIC). Interestingly, when the sample sizes are different, our theoretical analysis finds that AICj/nj is the proper correction for AICj that we need to optimize (where nj is the sample size available to the jth model) while −(BICj−BICi)/(nj−ni) is the correction of BIC. Furthermore, we find that the computational complexity of normalized information criteria methods is exponentially better than that of imputation methods. In a series of simulation studies, we find that normalized-AIC and normalized-BIC outperform previous methods (i.e., normalized-AIC is more efficient, and normalized BIC includes only important variables, although it tends to exclude some of them in cases of large correlation). We propose three additional methods aimed at increasing the statistical efficiency of normalized-AIC: post-selection imputation, Akaike sub-model averaging, and minimum-variance averaging. The latter succeeds in increasing efficiency further.

Economies ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 49 ◽  
Author(s):  
Waqar Badshah ◽  
Mehmet Bulut

Only unstructured single-path model selection techniques, i.e., Information Criteria, are used by Bounds test of cointegration for model selection. The aim of this paper was twofold; one was to evaluate the performance of these five routinely used information criteria {Akaike Information Criterion (AIC), Akaike Information Criterion Corrected (AICC), Schwarz/Bayesian Information Criterion (SIC/BIC), Schwarz/Bayesian Information Criterion Corrected (SICC/BICC), and Hannan and Quinn Information Criterion (HQC)} and three structured approaches (Forward Selection, Backward Elimination, and Stepwise) by assessing their size and power properties at different sample sizes based on Monte Carlo simulations, and second was the assessment of the same based on real economic data. The second aim was achieved by the evaluation of the long-run relationship between three pairs of macroeconomic variables, i.e., Energy Consumption and GDP, Oil Price and GDP, and Broad Money and GDP for BRICS (Brazil, Russia, India, China and South Africa) countries using Bounds cointegration test. It was found that information criteria and structured procedures have the same powers for a sample size of 50 or greater. However, BICC and Stepwise are better at small sample sizes. In the light of simulation and real data results, a modified Bounds test with Stepwise model selection procedure may be used as it is strongly theoretically supported and avoids noise in the model selection process.


2021 ◽  
Vol 20 (3) ◽  
pp. 450-461
Author(s):  
Stanley L. Sclove

AbstractThe use of information criteria, especially AIC (Akaike’s information criterion) and BIC (Bayesian information criterion), for choosing an adequate number of principal components is illustrated.


2012 ◽  
Vol 18 (65) ◽  
pp. 323
Author(s):  
جنان عباس ناصر

In this study, we compare between the traditional Information Criteria (AIC, SIC, HQ, FPE) with The Modified Divergence Information Criterion (MDIC) which used to determine the order of Autoregressive model (AR) for the data generating process, by using the simulation by generating data from several of Autoregressive models, when the error term is normally distributed with different values for its parameters (the mean and the variance),and for different sample  sizes.


2019 ◽  
Vol 37 (2) ◽  
pp. 549-562 ◽  
Author(s):  
Edward Susko ◽  
Andrew J Roger

Abstract The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection in phylogenetics, however, their theoretical justification and performance have not been carefully examined in this setting. Here, we investigate these methods under simple and complex phylogenetic models. We show that AIC can give a biased estimate of its intended target, the expected predictive log likelihood (EPLnL) or, equivalently, expected Kullback–Leibler divergence between the estimated model and the true distribution for the data. Reasons for bias include commonly occurring issues such as small edge-lengths or, in mixture models, small weights. The use of partitioned models is another issue that can cause problems with information criteria. We show that for partitioned models, a different BIC correction is required for it to be a valid approximation to a Bayes factor. The commonly used AICc correction is not clearly defined in partitioned models and can actually create a substantial bias when the number of parameters gets large as is the case with larger trees and partitioned models. Bias-corrected cross-validation corrections are shown to provide better approximations to EPLnL than AIC. We also illustrate how EPLnL, the estimation target of AIC, can sometimes favor an incorrect model and give reasons for why selection of incorrectly under-partitioned models might be desirable in partitioned model settings.


2006 ◽  
Vol 27 (2) ◽  
pp. 169-180 ◽  
Author(s):  
Marc Mazerolle

AbstractIn ecology, researchers frequently use observational studies to explain a given pattern, such as the number of individuals in a habitat patch, with a large number of explanatory (i.e., independent) variables. To elucidate such relationships, ecologists have long relied on hypothesis testing to include or exclude variables in regression models, although the conclusions often depend on the approach used (e.g., forward, backward, stepwise selection). Though better tools have surfaced in the mid 1970's, they are still underutilized in certain fields, particularly in herpetology. This is the case of the Akaike information criterion (AIC) which is remarkably superior in model selection (i.e., variable selection) than hypothesis-based approaches. It is simple to compute and easy to understand, but more importantly, for a given data set, it provides a measure of the strength of evidence for each model that represents a plausible biological hypothesis relative to the entire set of models considered. Using this approach, one can then compute a weighted average of the estimate and standard error for any given variable of interest across all the models considered. This procedure, termed model-averaging or multimodel inference, yields precise and robust estimates. In this paper, I illustrate the use of the AIC in model selection and inference, as well as the interpretation of results analysed in this framework with two real herpetological data sets. The AIC and measures derived from it is should be routinely adopted by herpetologists.


2006 ◽  
Vol 45 (01) ◽  
pp. 44-50 ◽  
Author(s):  
N. H. Augustin ◽  
W. Sauerbrei ◽  
N. Holländer

Summary Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods. Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response. Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors. Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.


Sign in / Sign up

Export Citation Format

Share Document