The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations

The distribution of a linear predictor after model selection: conditional finite-sample distributions and asymptotic approximations

Journal of Statistical Planning and Inference ◽

10.1016/j.jspi.2004.04.005 ◽

2005 ◽

Vol 134 (1) ◽

pp. 64-89 ◽

Cited By ~ 8

Author(s):

Hannes Leeb

Keyword(s):

Model Selection ◽

Asymptotic Approximations ◽

Finite Sample ◽

Linear Predictor

Download Full-text

Finite Sample Distributions of t and F Statistics in an AR(1) Model with Anexogenous Variable

Econometric Theory ◽

10.1017/s026646660001046x ◽

1987 ◽

Vol 3 (3) ◽

pp. 387-408 ◽

Cited By ~ 31

Author(s):

J.C. Nankervis ◽

N.E. Savin

Keyword(s):

Monte Carlo ◽

Time Trend ◽

Stable Process ◽

Asymptotic Approximations ◽

Test Statistics ◽

Finite Sample ◽

Sampling Behavior ◽

Exogenous Process ◽

Exogenous Time Series ◽

F Statistics

The distributions of the test statistics are investigated in the context of an AR(1) model where the root is unity or near unity and where the exogenous process is a stable process, a random walk or a time trend. The finite sample distributions are estimated by Monte Carlo methods assuming normal disturbances. The sensitivity of the distributions to both the values of the parameters of the AR(1) model and the process generating the exogenous time series is examined. The Monte Carlo results motivate several theorems which describe the exact sampling behavior of the test statistics. The analytical and empirical results present a mixed picture with respect to the accuracy of the relevant asymptotic approximations.

Download Full-text

415. Airway Gene-Expression Classifiers for Respiratory Syncytial Virus (RSV) Disease Severity in Infants

Open Forum Infectious Diseases ◽

10.1093/ofid/ofz360.488 ◽

2019 ◽

Vol 6 (Supplement_2) ◽

pp. S210-S210

Author(s):

Mary T Caserta ◽

Lu Wang ◽

Chin-Yi Chu ◽

Christopher Slaunwhite ◽

Jeanne Holden-Wiltse ◽

...

Keyword(s):

Gene Expression ◽

Model Selection ◽

Disease Severity ◽

Severity Score ◽

Cross Validation ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Patterns ◽

Multivariate Linear Regression Analysis ◽

Linear Predictor

Abstract Background RSV infection is common in infants with a majority of those affected displaying mild clinical symptoms. However, a substantial number develop severe symptoms requiring hospitalization. We currently lack sensitive and specific predictors to identify a majority of those who develop severe disease. Methods High throughput RNA sequencing (RNAseq) of nasal epithelial cells defined airway gene expression patterns in RSV-infected subjects. Using multivariate linear regression analysis with AIC-based model selection, we built a sparse linear predictor of RSV disease severity, the Nasal Gene Severity Score-NGSS1. Using a similar statistical approach, we built an alternate predictor based upon genes displaying stable expression over time (NGSS2). We evaluated predictive performance of both models using leave-one-out cross-validation analyses. Results We defined comprehensive airway gene expression profiles from 106 full-tem previously healthy RSV-infected subjects with a range of RSV disease severity prospectively enrolled in the AsPIRES study. Nasal samples were obtained during acute infection (day 1–10 of illness; 106 samples), and convalescence (day 14–28 of illness; 69 samples). All subjects had a primary infection and were assigned a cumulative clinical illness severity score (GRSS) (Table 1). From the RNA seq data 41 genes were identified as the NGSS1 which is strongly correlated with disease severity (GRSS) in both the naive (ρ=0.935) and cross-validated analysis (ρ of 0.813). As a binary classifier (mild vs. severe), NGSS1 correctly classifies 89.6% of the subjects following cross-validation (Figure 1). Next, we evaluated genes that were stably expressed in both acute illness and convalescence samples in 54 subjects with data from both time points. Repeating the regression based step wise model selection identified 13 genes as NGSS2, which was significantly correlated with GRSS (ρ = 0.741). This model has slightly less, but comparable, prediction accuracy with a cross-validated correlation of 0.741 and cross-validated classification accuracy of 84.0% (Figure 2). Conclusion Airway gene expression patterns, obtained following a minimally-invasive nasal procedure, have potential utility as prognostic biomarkers for severe infant RSV infections. Disclosures All authors: No reported disclosures.

Download Full-text

Model Selection Criteria on Beta Regression for Machine Learning

Machine Learning and Knowledge Extraction ◽

10.3390/make1010026 ◽

2019 ◽

Vol 1 (1) ◽

pp. 427-449

Author(s):

Patrícia Espinheira ◽

Luana da Silva ◽

Alisson Silva ◽

Raydonal Ospina

Keyword(s):

Machine Learning ◽

Model Selection ◽

Selection Criteria ◽

Real Data ◽

Estimation Procedure ◽

Computational Procedure ◽

Information Criteria ◽

Beta Regression ◽

Finite Sample ◽

Model Selection Criteria

Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the leverage, residuals, and influence of the observations, both to systematic linear and nonlinear components. To that end, we propose a Predictive Residual Sum of Squares (PRESS)-like machine learning tool and a prediction coefficient, namely P 2 statistic, as a computational procedure. Monte Carlo simulation results on the finite sample behavior of prediction-based model selection criteria P 2 are provided. We also evaluated two versions of the R 2 criterion. Finally, applications to real data are presented. The new criterion proved to be crucial to choose models taking into account the robustness of the maximum likelihood estimation procedure in the presence of influential cases.

Download Full-text

A Finite Sample Comparison of Automatic Model Selection Methods

IFAC Proceedings Volumes ◽

10.1016/s1474-6670(17)35013-9 ◽

2003 ◽

Vol 36 (16) ◽

pp. 1753-1758

Author(s):

Dietmar Bauer ◽

Stijn de Waele

Keyword(s):

Model Selection ◽

Selection Methods ◽

Finite Sample ◽

Automatic Model Selection

Download Full-text

Finite Sample Performances of the Model Selection Approach in Nonparametric Model Specification for Time Series

Communication in Statistics- Theory and Methods ◽

10.1080/03610920802531314 ◽

2009 ◽

Vol 38 (14) ◽

pp. 2302-2320

Author(s):

Zijun Wang

Keyword(s):

Time Series ◽

Model Selection ◽

Model Specification ◽

Nonparametric Model ◽

Finite Sample ◽

Model Selection Approach ◽

Selection Approach

Download Full-text

Finite sample performance of the model selection approach in co-integration analysis

Journal of Statistical Computation and Simulation ◽

10.1080/00949650701766962 ◽

2009 ◽

Vol 79 (4) ◽

pp. 349-360 ◽

Cited By ~ 4

Author(s):

Zijun Wang ◽

David A. Bessler

Keyword(s):

Model Selection ◽

Finite Sample ◽

Model Selection Approach ◽

Selection Approach ◽

Integration Analysis

Download Full-text

Parameter-dependent convergence bounds and complexity measure for a class of conceptual hydrological models

Journal of Hydroinformatics ◽

10.2166/hydro.2011.005 ◽

2011 ◽

Vol 14 (2) ◽

pp. 443-463 ◽

Cited By ~ 8

Author(s):

Saket Pande ◽

Luis A. Bastidas ◽

Sandjai Bhulai ◽

Mac McKee

Keyword(s):

Model Selection ◽

Linear Models ◽

Convergence Rates ◽

Model Performance ◽

Hydrologic Model ◽

Complexity Measure ◽

Model Complexity ◽

Finite Sample ◽

Hydrologic Models ◽

Complexity Measures

We provide analytical bounds on convergence rates for a class of hydrologic models and consequently derive a complexity measure based on the Vapnik–Chervonenkis (VC) generalization theory. The class of hydrologic models is a spatially explicit interconnected set of linear reservoirs with the aim of representing globally nonlinear hydrologic behavior by locally linear models. Here, by convergence rate, we mean convergence of the empirical risk to the expected risk. The derived measure of complexity measures a model's propensity to overfit data. We explore how data finiteness can affect model selection for this class of hydrologic model and provide theoretical results on how model performance on a finite sample converges to its expected performance as data size approaches infinity. These bounds can then be used for model selection, as the bounds provide a tradeoff between model complexity and model performance on finite data. The convergence bounds for the considered hydrologic models depend on the magnitude of their parameters, which are the recession parameters of constituting linear reservoirs. Further, the complexity of hydrologic models not only varies with the magnitude of their parameters but also depends on the network structure of the models (in terms of the spatial heterogeneity of parameters and the nature of hydrologic connectivity).

Download Full-text

The Finite-Sample Distribution of Post-Model-Selection Estimators, and Uniform Versus Non-Uniform Approximations

SSRN Electronic Journal ◽

10.2139/ssrn.221188 ◽

2000 ◽

Cited By ~ 4

Author(s):

Hannes Leeb ◽

Benedikt M. Poetscher

Keyword(s):

Model Selection ◽

Finite Sample ◽

Sample Distribution ◽

Uniform Approximations ◽

Finite Sample Distribution

Download Full-text

A nondegenerate Vuong test and post selection confidence intervals for semi/nonparametric models

Quantitative Economics ◽

10.3982/qe1312 ◽

2020 ◽

Vol 11 (3) ◽

pp. 983-1017

Author(s):

Zhipeng Liao ◽

Xiaoxia Shi

Keyword(s):

Model Selection ◽

Size Control ◽

Parametric Models ◽

Model Parameters ◽

Local Alternatives ◽

Finite Sample ◽

Inference Procedure ◽

Uniform Size ◽

Nonparametric Models ◽

Asymptotic Size

This paper proposes a new model selection test for the statistical comparison of semi/non‐parametric models based on a general quasi‐likelihood ratio criterion. An important feature of the new test is its uniformly exact asymptotic size in the overlapping nonnested case, as well as in the easier nested and strictly nonnested cases. The uniform size control is achieved without using pretesting, sample‐splitting, or simulated critical values. We also show that the test has nontrivial power against all n ‐local alternatives and against some local alternatives that converge to the null faster than n . Finally, we provide a framework for conducting uniformly valid post model selection inference for model parameters. The finite sample performance of the nondegenerate test and that of the post model selection inference procedure are illustrated in a mean‐regression example by Monte Carlo.

Download Full-text