A cluster-based hybrid sampling approach for imbalanced data classification

Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.

Download Full-text

Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach

Progress in Artificial Intelligence and Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01132-1_32 ◽

2018 ◽

pp. 280-287

Author(s):

Keider Hoyos ◽

Jorge Fernández ◽

Beatriz Martinez ◽

Óscar Henao ◽

Álvaro Orozco ◽

...

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Relevant Information ◽

Imbalanced Data Classification ◽

Sampling Approach

Download Full-text

Imbalanced Data Classification: A Novel Re-sampling Approach Combining Versatile Improved SMOTE and Rough Sets

Computer Information Systems and Industrial Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-45378-1_4 ◽

2016 ◽

pp. 31-42 ◽

Cited By ~ 3

Author(s):

Katarzyna Borowska ◽

Jarosław Stepaniuk

Keyword(s):

Rough Sets ◽

Imbalanced Data ◽

Data Classification ◽

Imbalanced Data Classification ◽

Sampling Approach

Download Full-text

A novel imbalanced data classification approach for suicidal ideation detection on social media

Computing ◽

10.1007/s00607-021-00984-0 ◽

2021 ◽

Author(s):

Mohamed Ali Ben Hassine ◽

Safa Abdellatif ◽

Sadok Ben Yahia

Keyword(s):

Social Media ◽

Suicidal Ideation ◽

Imbalanced Data ◽

Data Classification ◽

Classification Approach ◽

Imbalanced Data Classification

Download Full-text

Radial-Based Undersampling for imbalanced data classification

Pattern Recognition ◽

10.1016/j.patcog.2020.107262 ◽

2020 ◽

Vol 102 ◽

pp. 107262 ◽

Cited By ~ 7

Author(s):

Michał Koziarski

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Imbalanced Data Classification

Download Full-text

Research of Medical High-Dimensional Imbalanced Data Classification Ensemble Feature Selection Algorithm with Random Forest

2017 International Conference on Smart Grid and Electrical Automation (ICSGEA) ◽

10.1109/icsgea.2017.158 ◽

2017 ◽

Cited By ~ 2

Author(s):

Min Zhu ◽

Bo Su ◽

Gangmin Ning

Keyword(s):

Feature Selection ◽

Random Forest ◽

Imbalanced Data ◽

Data Classification ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Imbalanced Data Classification

Download Full-text

Data reduction and stacking for imbalanced data classification

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179335 ◽

2019 ◽

Vol 37 (6) ◽

pp. 7239-7249

Author(s):

Ireneusz Czarnowski ◽

Piotr Jędrzejowicz

Keyword(s):

Data Reduction ◽

Imbalanced Data ◽

Data Classification ◽

Imbalanced Data Classification

Download Full-text