Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems

Author(s):  
Malak Alshawabkeh ◽  
Javed A. Aslam ◽  
Jennifer Dy ◽  
David Kaeli
Author(s):  
Feras Namous ◽  
Hossam Faris ◽  
Ali Asghar Heidari ◽  
Monther Khalafat ◽  
Rami S. Alkhawaldeh ◽  
...  

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.


2020 ◽  
Vol 8 (5) ◽  
pp. 3436-3440

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.


Sign in / Sign up

Export Citation Format

Share Document