A cluster-based hybrid sampling approach for imbalanced data classification

2020 ◽  
Vol 91 (5) ◽  
pp. 055101
Author(s):  
Shou Feng ◽  
Chunhui Zhao ◽  
Ping Fu
2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Qiang Wang

Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.


Sign in / Sign up

Export Citation Format

Share Document