Using variable combination population analysis for variable selection in multivariate calibration

2015 ◽  
Vol 862 ◽  
pp. 14-23 ◽  
Author(s):  
Yong-Huan Yun ◽  
Wei-Ting Wang ◽  
Bai-Chuan Deng ◽  
Guang-Bi Lai ◽  
Xin-bo Liu ◽  
...  
Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8051
Author(s):  
Chunwang Dong ◽  
Chongshan Yang ◽  
Zhongyuan Liu ◽  
Rentian Zhang ◽  
Peng Yan ◽  
...  

Catechin is a major reactive substance involved in black tea fermentation. It has a determinant effect on the final quality and taste of made teas. In this study, we applied hyperspectral technology with the chemometrics method and used different pretreatment and variable filtering algorithms to reduce noise interference. After reduction of the spectral data dimensions by principal component analysis (PCA), an optimal prediction model for catechin content was constructed, followed by visual analysis of catechin content when fermenting leaves for different periods of time. The results showed that zero mean normalization (Z-score), multiplicative scatter correction (MSC), and standard normal variate (SNV) can effectively improve model accuracy; while the shuffled frog leaping algorithm (SFLA), the variable combination population analysis genetic algorithm (VCPA-GA), and variable combination population analysis iteratively retaining informative variables (VCPA-IRIV) can significantly reduce spectral data and enhance the calculation speed of the model. We found that nonlinear models performed better than linear ones. The prediction accuracy for the total amount of catechins and for epicatechin gallate (ECG) of the extreme learning machine (ELM), based on optimal variables, reached 0.989 and 0.994, respectively, and the prediction accuracy for EGC, C, EC, and EGCG of the content support vector regression (SVR) models reached 0.972, 0.993, 0.990, and 0.994, respectively. The optimal model offers accurate prediction, and visual analysis can determine the distribution of the catechin content when fermenting leaves for different fermentation periods. The findings provide significant reference material for intelligent digital assessment of black tea during processing.


2019 ◽  
Vol 81 ◽  
pp. 213-222 ◽  
Author(s):  
Lauro C.M. de Paula ◽  
Anderson S. Soares ◽  
Telma W. Soares ◽  
Celso G.C. Junior ◽  
Clarimar J. Coelho ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7248
Author(s):  
Fugen Jiang ◽  
Mykola Kutia ◽  
Arbi J. Sarkissian ◽  
Hui Lin ◽  
Jiangping Long ◽  
...  

Forest growing stem volume (GSV) reflects the richness of forest resources as well as the quality of forest ecosystems. Remote sensing technology enables robust and efficient GSV estimation as it greatly reduces the survey time and cost while facilitating periodic monitoring. Given its red edge bands and a short revisit time period, Sentinel-2 images were selected for the GSV estimation in Wangyedian forest farm, Inner Mongolia, China. The variable combination was shown to significantly affect the accuracy of the estimation model. After extracting spectral variables, texture features, and topographic factors, a stepwise random forest (SRF) method was proposed to select variable combinations and establish random forest regressions (RFR) for GSV estimation. The linear stepwise regression (LSR), Boruta, Variable Selection Using Random Forests (VSURF), and random forest (RF) methods were then used as references for comparison with the proposed SRF for selection of predictors and GSV estimation. Combined with the observed GSV data and the Sentinel-2 images, the distributions of GSV were generated by the RFR models with the variable combinations determined by the LSR, RF, Boruta, VSURF, and SRF. The results show that the texture features of Sentinel-2’s red edge bands can significantly improve the accuracy of GSV estimation. The SRF method can effectively select the optimal variable combination, and the SRF-based model results in the highest estimation accuracy with the decreases of relative root mean square error by 16.4%, 14.4%, 16.3%, and 10.6% compared with those from the LSR-, RF-, Boruta-, and VSURF-based models, respectively. The GSV distribution generated by the SRF-based model matched that of the field observations well. The results of this study are expected to provide a reference for GSV estimation of coniferous plantations.


2018 ◽  
Vol 34 (5) ◽  
pp. 789-798 ◽  
Author(s):  
Yuechun Zhang ◽  
Jun Sun ◽  
Junyan Li ◽  
Xiaohong Wu ◽  
Chunmei Dai

Abstract.In order to ensure that safe and healthy tomatoes can be provided to people, a method for quantitative determination of cadmium content in tomato leaves based on hyperspectral imaging technology was put forward in this study. Tomato leaves with seven cadmium stress gradients were studied. Hyperspectral images of all samples were firstly acquired by the hyperspectral imaging system, then the spectral data were extracted from the hyperspectral images. To simplify the model, three algorithms of competitive adaptive reweighted sampling (CARS), variable combination population analysis (VCPA) and bootstrapping soft shrinkage (BOSS) were used to select the feature wavelengths ranging from 431 to 962 nm. Final results showed that BOSS can improve prediction performance and greatly reduce features when compared with the other two selection methods. The BOSS model got the best accuracy in calibration and prediction with R2c of 0.9907 and RMSEC of 0.4257mg/kg, R2p of 0.9821, and RMSEP of 0.6461 mg/kg. Hence, the method of hyperspectral technology combined with the BOSS feature selection is feasible for detecting the cadmium content of tomato leaves, which can potentially provide a new method and thought for cadmium content detection of other crops. Keywords: Feature selection, Hyperspectral image technology, Non-destructive analysis, Regression model, Tomato leaves.


Sign in / Sign up

Export Citation Format

Share Document