scholarly journals High-Dimensional, Small-Sample Product Quality Prediction Method Based on MIC-Stacking Ensemble Learning

2021 ◽  
Vol 12 (1) ◽  
pp. 23
Author(s):  
Jiahao Yu ◽  
Rongshun Pan ◽  
Yongman Zhao

Accurate quality prediction can find and eliminate quality hazards. It is difficult to construct an accurate quality mathematical model for the production of small samples with high dimensionality due to the influence of quality characteristics and the complex mechanism of action. In addition, overfitting scenarios are prone to occur in high-dimensional, small-sample industrial product quality prediction. This paper proposes an ensemble learning and measurement model based on stacking and selects eight algorithms as the base learning model. The maximal information coefficient (MIC) is used to obtain the correlation between the base learning models. Models with low correlation and strong predictive power were chosen to build stacking ensemble models, which effectively avoids overfitting and obtains better predictive performance. To improve the prediction performance as the optimization goal, in the data preprocessing stage, boxplots, ordinary least squares (OLS), and multivariate imputation by chained equations (MICE) are used to detect and replace outliers. The CatBoost algorithm is used to construct combined features. Strong combination features were selected to construct a new feature set. Concrete slump data from the University of California Irvine (UCI) machine learning library were used to conduct comprehensive verification experiments. The experimental results show that, compared with the optimal single model, the minimum correlation stacking ensemble learning model has higher precision and stronger robustness, and a new method is provided to guarantee the accuracy of final product quality prediction.

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Junying Hu ◽  
Xiaofei Qian ◽  
Jun Pei ◽  
Changchun Tan ◽  
Panos M. Pardalos ◽  
...  

1994 ◽  
Vol 33 (02) ◽  
pp. 180-186 ◽  
Author(s):  
H. Brenner ◽  
O. Gefeller

Abstract:The traditional concept of describing the validity of a diagnostic test neglects the presence of chance agreement between test result and true (disease) status. Sensitivity and specificity, as the fundamental measures of validity, can thus only be considered in conjunction with each other to provide an appropriate basis for the evaluation of the capacity of the test to discriminate truly diseased from truly undiseased subjects. In this paper, chance-corrected analogues of sensitivity and specificity are presented as supplemental measures of validity, which pay attention to the problem of chance agreement and offer the opportunity to be interpreted separately. While recent proposals of chance-correction techniques, suggested by several authors in this context, lead to measures which are dependent on disease prevalence, our method does not share this major disadvantage. We discuss the extension of the conventional ROC-curve approach to chance-corrected measures of sensitivity and specificity. Furthermore, point and asymptotic interval estimates of the parameters of interest are derived under different sampling frameworks for validation studies. The small sample behavior of the estimates is investigated in a simulation study, leading to a logarithmic modification of the interval estimate in order to hold the nominal confidence level for small samples.


Author(s):  
Longxiang Li ◽  
Annelise J. Blomberg ◽  
Rebecca A. Stern ◽  
Choong-Min Kang ◽  
Stefania Papatheodorou ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3663
Author(s):  
Zun Shen ◽  
Qingfeng Wu ◽  
Zhi Wang ◽  
Guoyi Chen ◽  
Bin Lin

(1) Background: Diabetic retinopathy, one of the most serious complications of diabetes, is the primary cause of blindness in developed countries. Therefore, the prediction of diabetic retinopathy has a positive impact on its early detection and treatment. The prediction of diabetic retinopathy based on high-dimensional and small-sample-structured datasets (such as biochemical data and physical data) was the problem to be solved in this study. (2) Methods: This study proposed the XGB-Stacking model with the foundation of XGBoost and stacking. First, a wrapped feature selection algorithm, XGBIBS (Improved Backward Search Based on XGBoost), was used to reduce data feature redundancy and improve the effect of a single ensemble learning classifier. Second, in view of the slight limitation of a single classifier, a stacking model fusion method, Sel-Stacking (Select-Stacking), which keeps Label-Proba as the input matrix of meta-classifier and determines the optimal combination of learners by a global search, was used in the XGB-Stacking model. (3) Results: XGBIBS greatly improved the prediction accuracy and the feature reduction rate of a single classifier. Compared to a single classifier, the accuracy of the Sel-Stacking model was improved to varying degrees. Experiments proved that the prediction model of XGB-Stacking based on the XGBIBS algorithm and the Sel-Stacking method made effective predictions on diabetes retinopathy. (4) Conclusion: The XGB-Stacking prediction model of diabetic retinopathy based on biochemical and physical data had outstanding performance. This is highly significant to improve the screening efficiency of diabetes retinopathy and reduce the cost of diagnosis.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 743
Author(s):  
Xi Liu ◽  
Shuhang Chen ◽  
Xiang Shen ◽  
Xiang Zhang ◽  
Yiwen Wang

Neural signal decoding is a critical technology in brain machine interface (BMI) to interpret movement intention from multi-neural activity collected from paralyzed patients. As a commonly-used decoding algorithm, the Kalman filter is often applied to derive the movement states from high-dimensional neural firing observation. However, its performance is limited and less effective for noisy nonlinear neural systems with high-dimensional measurements. In this paper, we propose a nonlinear maximum correntropy information filter, aiming at better state estimation in the filtering process for a noisy high-dimensional measurement system. We reconstruct the measurement model between the high-dimensional measurements and low-dimensional states using the neural network, and derive the state estimation using the correntropy criterion to cope with the non-Gaussian noise and eliminate large initial uncertainty. Moreover, analyses of convergence and robustness are given. The effectiveness of the proposed algorithm is evaluated by applying it on multiple segments of neural spiking data from two rats to interpret the movement states when the subjects perform a two-lever discrimination task. Our results demonstrate better and more robust state estimation performance when compared with other filters.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.


Sign in / Sign up

Export Citation Format

Share Document