scholarly journals The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations

Author(s):  
Hannes Leeb
1987 ◽  
Vol 3 (3) ◽  
pp. 387-408 ◽  
Author(s):  
J.C. Nankervis ◽  
N.E. Savin

The distributions of the test statistics are investigated in the context of an AR(1) model where the root is unity or near unity and where the exogenous process is a stable process, a random walk or a time trend. The finite sample distributions are estimated by Monte Carlo methods assuming normal disturbances. The sensitivity of the distributions to both the values of the parameters of the AR(1) model and the process generating the exogenous time series is examined. The Monte Carlo results motivate several theorems which describe the exact sampling behavior of the test statistics. The analytical and empirical results present a mixed picture with respect to the accuracy of the relevant asymptotic approximations.


2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S210-S210
Author(s):  
Mary T Caserta ◽  
Lu Wang ◽  
Chin-Yi Chu ◽  
Christopher Slaunwhite ◽  
Jeanne Holden-Wiltse ◽  
...  

Abstract Background RSV infection is common in infants with a majority of those affected displaying mild clinical symptoms. However, a substantial number develop severe symptoms requiring hospitalization. We currently lack sensitive and specific predictors to identify a majority of those who develop severe disease. Methods High throughput RNA sequencing (RNAseq) of nasal epithelial cells defined airway gene expression patterns in RSV-infected subjects. Using multivariate linear regression analysis with AIC-based model selection, we built a sparse linear predictor of RSV disease severity, the Nasal Gene Severity Score-NGSS1. Using a similar statistical approach, we built an alternate predictor based upon genes displaying stable expression over time (NGSS2). We evaluated predictive performance of both models using leave-one-out cross-validation analyses. Results We defined comprehensive airway gene expression profiles from 106 full-tem previously healthy RSV-infected subjects with a range of RSV disease severity prospectively enrolled in the AsPIRES study. Nasal samples were obtained during acute infection (day 1–10 of illness; 106 samples), and convalescence (day 14–28 of illness; 69 samples). All subjects had a primary infection and were assigned a cumulative clinical illness severity score (GRSS) (Table 1). From the RNA seq data 41 genes were identified as the NGSS1 which is strongly correlated with disease severity (GRSS) in both the naive (ρ=0.935) and cross-validated analysis (ρ of 0.813). As a binary classifier (mild vs. severe), NGSS1 correctly classifies 89.6% of the subjects following cross-validation (Figure 1). Next, we evaluated genes that were stably expressed in both acute illness and convalescence samples in 54 subjects with data from both time points. Repeating the regression based step wise model selection identified 13 genes as NGSS2, which was significantly correlated with GRSS (ρ = 0.741). This model has slightly less, but comparable, prediction accuracy with a cross-validated correlation of 0.741 and cross-validated classification accuracy of 84.0% (Figure 2). Conclusion Airway gene expression patterns, obtained following a minimally-invasive nasal procedure, have potential utility as prognostic biomarkers for severe infant RSV infections. Disclosures All authors: No reported disclosures.


2019 ◽  
Vol 1 (1) ◽  
pp. 427-449
Author(s):  
Patrícia Espinheira ◽  
Luana da Silva ◽  
Alisson Silva ◽  
Raydonal Ospina

Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the leverage, residuals, and influence of the observations, both to systematic linear and nonlinear components. To that end, we propose a Predictive Residual Sum of Squares (PRESS)-like machine learning tool and a prediction coefficient, namely P 2 statistic, as a computational procedure. Monte Carlo simulation results on the finite sample behavior of prediction-based model selection criteria P 2 are provided. We also evaluated two versions of the R 2 criterion. Finally, applications to real data are presented. The new criterion proved to be crucial to choose models taking into account the robustness of the maximum likelihood estimation procedure in the presence of influential cases.


2011 ◽  
Vol 14 (2) ◽  
pp. 443-463 ◽  
Author(s):  
Saket Pande ◽  
Luis A. Bastidas ◽  
Sandjai Bhulai ◽  
Mac McKee

We provide analytical bounds on convergence rates for a class of hydrologic models and consequently derive a complexity measure based on the Vapnik–Chervonenkis (VC) generalization theory. The class of hydrologic models is a spatially explicit interconnected set of linear reservoirs with the aim of representing globally nonlinear hydrologic behavior by locally linear models. Here, by convergence rate, we mean convergence of the empirical risk to the expected risk. The derived measure of complexity measures a model's propensity to overfit data. We explore how data finiteness can affect model selection for this class of hydrologic model and provide theoretical results on how model performance on a finite sample converges to its expected performance as data size approaches infinity. These bounds can then be used for model selection, as the bounds provide a tradeoff between model complexity and model performance on finite data. The convergence bounds for the considered hydrologic models depend on the magnitude of their parameters, which are the recession parameters of constituting linear reservoirs. Further, the complexity of hydrologic models not only varies with the magnitude of their parameters but also depends on the network structure of the models (in terms of the spatial heterogeneity of parameters and the nature of hydrologic connectivity).


2020 ◽  
Vol 11 (3) ◽  
pp. 983-1017
Author(s):  
Zhipeng Liao ◽  
Xiaoxia Shi

This paper proposes a new model selection test for the statistical comparison of semi/non‐parametric models based on a general quasi‐likelihood ratio criterion. An important feature of the new test is its uniformly exact asymptotic size in the overlapping nonnested case, as well as in the easier nested and strictly nonnested cases. The uniform size control is achieved without using pretesting, sample‐splitting, or simulated critical values. We also show that the test has nontrivial power against all n ‐local alternatives and against some local alternatives that converge to the null faster than n . Finally, we provide a framework for conducting uniformly valid post model selection inference for model parameters. The finite sample performance of the nondegenerate test and that of the post model selection inference procedure are illustrated in a mean‐regression example by Monte Carlo.


Sign in / Sign up

Export Citation Format

Share Document