scholarly journals Semi-supervised Feature Importance Evaluation with Ensemble Learning

Author(s):  
Hasna Barkia ◽  
Haytham Elghazel ◽  
Alex Aussem
Author(s):  
Ameni Filali ◽  
Chiraz Jlassi ◽  
Najet Arous

To uncover an appropriate latent subspace for data representation, we propose in this paper a new extension of the random forests method which leads to the unsupervised feature selection called Feature Selection with Random Forests (RFS) based on SOM variants that evaluates the out-of-bag feature importance from a set of partitions. Every partition is created using a several bootstrap samples and a random features subset. We obtain empirical results on 19 benchmark datasets specifying that RFS, boosted with a recursive feature elimination (RFE) method, can lead to important enhancement in terms of clustering accuracy with a very restricted subset of features. Simulations are performed on nine different benchmarks, including face data, handwritten digit data, and document data. Promising experimental results and theoretical analysis prove the efficiency and effectiveness of the proposed method for feature selection in comparison with competitive representative algorithms.


Author(s):  
Much Aziz Muslim ◽  
Yosza Dasril

<span>Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%.</span>


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Haowen Deng ◽  
Youyou Zhou ◽  
Lin Wang ◽  
Cheng Zhang

Abstract Background Neonatal jaundice may cause severe neurological damage if poorly evaluated and diagnosed when high bilirubin occurs. The study explored how to effectively integrate high-dimensional genetic features into predicting neonatal jaundice. Methods This study recruited 984 neonates from the Suzhou Municipal Central Hospital in China, and applied an ensemble learning approach to enhance the prediction of high-dimensional genetic features and clinical risk factors (CRF) for physiological neonatal jaundice of full-term newborns within 1-week after birth. Further, sigmoid recalibration was applied for validating the reliability of our methods. Results The maximum accuracy of prediction reached 79.5% Area Under Curve (AUC) by CRF and could be marginally improved by 3.5% by including genetic variant (GV). Feature importance illustrated that 36 GVs contributed 55.5% in predicting neonatal jaundice in terms of gain from splits. Further analysis revealed that the main contribution of GV was to reduce the false-positive rate, i.e., to increase the specificity in the prediction. Conclusions Our study shed light on the theoretical and practical value of GV in the prediction of neonatal jaundice.


Author(s):  
Ruijie Du ◽  
Shuangcheng Wang ◽  
Cuiping Leng ◽  
Yunbin Fu

Sign in / Sign up

Export Citation Format

Share Document