scholarly journals A Hybrid Sampling SVM Approach to Imbalanced Data Classification

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Qiang Wang

Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.

2013 ◽  
Vol 443 ◽  
pp. 741-745
Author(s):  
Hu Li ◽  
Peng Zou ◽  
Wei Hong Han ◽  
Rong Ze Xia

Many real world data is imbalanced, i.e. one category contains significantly more samples than other categories. Traditional classification methods take different categories equally and are often ineffective. Based on the comprehensive analysis of existing researches, we propose a new imbalanced data classification method based on clustering. The method clusters both majority class and minority class at first. Then, clustered minority class will be over-sampled by SMOTE while clustered majority class be under-sampled randomly. Through clustering, the proposed method can avoid the loss of useful information while resampling. Experiments on several UCI datasets show that the proposed method can effectively improve the classification results on imbalanced data.


This is an attempt to address the various challenges opportunities and scope for formulating and designing new procedure in imbalanced classification problem which poses a challenge to a predictive modelling as many of AI ML n DL algorithms which are extensively used for classification are always designed from the perspective of with majority of focus on assuming equal number of examples for a class. It leads to poor efficiency and performance especially in minority class. As Minority class is always very crucial and sensitive to classification errors and also its utmost important in imbalanced classification. This chapter discusses addresses and gives novel as well as deep insights with unequal distribution of classes in training datasets. Largely real time and real world classifications are comprising imbalanced distribution so need specialized techniques for more challenging and sophisticated models with minimal errors and improved performance.


Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.


2020 ◽  
Vol 8 (5) ◽  
pp. 3436-3440

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.


Sign in / Sign up

Export Citation Format

Share Document