Longitudinal variable selection by cross-validation in the case of many covariates

2007 ◽  
Vol 26 (4) ◽  
pp. 919-930 ◽  
Author(s):  
E. Cantoni ◽  
C. Field ◽  
J. Mills Flemming ◽  
E. Ronchetti
2014 ◽  
Vol 70 (5) ◽  
Author(s):  
Nor Fazila Rasaruddin ◽  
Mas Ezatul Nadia Mohd Ruah ◽  
Mohamed Noor Hasan ◽  
Mohd Zuli Jaafar

This paper shows the determination of iodine value (IV) of pure and frying palm oils using Partial Least Squares (PLS) regression with application of variable selection. A total of 28 samples consisting of pure and frying palm oils which acquired from markets. Seven of them were considered as high-priced palm oils while the remaining was low-priced. PLS regression models were developed for the determination of IV using Fourier Transform Infrared (FTIR) spectra data in absorbance mode in the range from 650 cm-1 to 4000 cm-1. Savitzky Golay derivative was applied before developing the prediction models. The models were constructed using wavelength selected in the FTIR region by adopting selectivity ratio (SR) plot and correlation coefficient to the IV parameter. Each model was validated through Root Mean Square Error Cross Validation, RMSECV and cross validation correlation coefficient, R2cv. The best model using SR plot was the model with mean centring for pure sample and model with a combination of row scaling and standardization of frying sample. The best model with the application of the correlation coefficient variable selection was the model with a combination of row scaling and standardization of pure sample and model with mean centering data pre-processing for frying sample. It is not necessary to row scaled the variables to develop the model since the effect of row scaling on model quality is insignificant.


2014 ◽  
Vol 23 (7) ◽  
pp. 811-820 ◽  
Author(s):  
Kévin Le Rest ◽  
David Pinaud ◽  
Pascal Monestiez ◽  
Joël Chadoeuf ◽  
Vincent Bretagnolle

Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 868-892
Author(s):  
Yuchen Chen ◽  
Yuhong Yang

Previous research provided a lot of discussion on the selection of regularization parameters when it comes to the application of regularization methods for high-dimensional regression. The popular “One Standard Error Rule” (1se rule) used with cross validation (CV) is to select the most parsimonious model whose prediction error is not much worse than the minimum CV error. This paper examines the validity of the 1se rule from a theoretical angle and also studies its estimation accuracy and performances in applications of regression estimation and variable selection, particularly for Lasso in a regression framework. Our theoretical result shows that when a regression procedure produces the regression estimator converging relatively fast to the true regression function, the standard error estimation formula in the 1se rule is justified asymptotically. The numerical results show the following: 1. the 1se rule in general does not necessarily provide a good estimation for the intended standard deviation of the cross validation error. The estimation bias can be 50–100% upwards or downwards in various situations; 2. the results tend to support that 1se rule usually outperforms the regular CV in sparse variable selection and alleviates the over-selection tendency of Lasso; 3. in regression estimation or prediction, the 1se rule often performs worse. In addition, comparisons are made over two real data sets: Boston Housing Prices (large sample size n, small/moderate number of variables p) and Bardet–Biedl data (large p, small n). Data guided simulations are done to provide insight on the relative performances of the 1se rule and the regular CV.


2008 ◽  
Vol 48 (2) ◽  
pp. 370-383 ◽  
Author(s):  
Dmitry A. Konovalov ◽  
Nigel Sim ◽  
Eric Deconinck ◽  
Yvan Vander Heyden ◽  
Danny Coomans

2013 ◽  
Vol 03 (02) ◽  
pp. 79-102 ◽  
Author(s):  
Hans C. van Houwelingen ◽  
Willi Sauerbrei

Sign in / Sign up

Export Citation Format

Share Document