Class Imbalance Learning

Mapping Intimacies ◽

10.34048/2017.1.f1 ◽

2017 ◽

Author(s):

Sudarsun Santhiappan ◽

Balaraman Ravindran

Keyword(s):

Machine Learning ◽

Real World ◽

Class Imbalance ◽

Classification Problem ◽

Classification Algorithms ◽

Challenges And Opportunities ◽

Data Points ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Real World Problems

Data classiﬁcation task assigns labels to data points using a model that is learned from a collection of pre-labeled data points. The Class Imbalance Learning (CIL) problem is concerned with the performance of classiﬁcation algorithms in the presence of under-represented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced datasets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data effciently into information and knowledge representation. It is important to study CIL because it is rare to ﬁnd a classiﬁcation problem in real world scenarios that follows balanced class distributions. In this article, we have presented how machine learning has become the integral part of modern lifestyle and how some of the real world problems are modeled as CIL problems. We have also provided a detailed survey on the fundamentals and solutions to class imbalance learning. We conclude the survey by presenting some of the challenges and opportunities with class imbalance learning.

Download Full-text

An Empirical Study of Boosting Methods on Severely Imbalanced Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2510 ◽

2014 ◽

Vol 513-517 ◽

pp. 2510-2513 ◽

Cited By ~ 1

Author(s):

Xu Ying Liu

Keyword(s):

Empirical Study ◽

Real World ◽

Class Imbalance ◽

Imbalanced Data ◽

Real World Applications ◽

Under Sampling ◽

The Difference ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

F Measure

Nowadays there are large volumes of data in real-world applications, which poses great challenge to class-imbalance learning: the large amount of the majority class examples and severe class-imbalance. Previous studies on class-imbalance learning mainly focused on relatively small or moderate class-imbalance. In this paper we conduct an empirical study to explore the difference between learning with small or moderate class-imbalance and learning with severe class-imbalance. The experimental results show that: (1) Traditional methods cannot handle severe class-imbalance effectively. (2) AUC, G-mean and F-measure can be very inconsistent for severe class-imbalance, which seldom appears when class-imbalance is moderate. And G-mean is not appropriate for severe class-imbalance learning because it is not sensitive to the change of imbalance ratio. (3) When AUC and G-mean are evaluation metrics, EasyEnsemble is the best method, followed by BalanceCascade and under-sampling. (4) A little under-full balance is better for under-sampling to handle severe class-imbalance. And it is important to handle false positives when design methods for severe class-imbalance.

Download Full-text

A Method for Class-Imbalance Learning in Android Malware Detection

Electronics ◽

10.3390/electronics10243124 ◽

2021 ◽

Vol 10 (24) ◽

pp. 3124

Author(s):

Jun Guan ◽

Xu Jiang ◽

Baolei Mao

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Computational Cost ◽

Class Imbalance ◽

Sampling Technique ◽

Minority Class ◽

Android Malware ◽

Android Malware Detection ◽

Imbalance Learning ◽

Class Imbalance Learning

More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance in Android malware detection based on machine learning. Hence, the researchers have focused on how to solve class-imbalance to improve the performance of Android malware detection. However, the disadvantages of the existing class-imbalance learning are mainly the loss of valuable samples and the computational cost. In this paper, we propose a method of Class-Imbalance Learning (CIL), which first selects representative features, uses the clustering K-Means algorithm and under-sampling to retain the important samples of the majority class while reducing the number of samples of the majority class. After that, we use the Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to generate minority class samples for data balance, and finally use the Random Forest (RF) algorithm to build a malware detection model. The result of experiments indicates that CIL effectively improves the performance of Android malware detection based on machine learning, especially for class imbalance. Compared with existing class-imbalance learning methods, CIL is also effective for the Machine Learning Repository from the University of California, Irvine (UCI) and has better performance in some data sets.

Download Full-text

AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning

Scientific Programming ◽

10.1155/2021/9947621 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Jia-Bao Wang ◽

Chun-An Zou ◽

Guang-Hui Fu

Keyword(s):

Real World ◽

Class Imbalance ◽

Sample Space ◽

Minority Class ◽

Adaptive Weighting ◽

Variable Space ◽

Real World Datasets ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Data Level

In class-imbalance learning, Synthetic Minority Oversampling Technique (SMOTE) is a widely used technique to tackle class-imbalance problems from the data level, whereas SMOTE blindly selects neighboring minority class points when performing an interpolation among them and inevitably brings collinearity between the generated new points and the original ones. To combat these problems, we propose in this study an adaptive-weighting SMOTE method, termed as AWSMOTE. AWSMOTE applies two types of SVM-based weights into SMOTE. A kind of weight is used in variable space to combat the drawbacks of collinearity, while another weight is utilized in sample space to purposefully choose those support vectors from the minority class as the neighboring points in the interpolation. AWSMOTE is compared with SMOTE and its improved versions with six simulated datasets and 22 real-world datasets. The results demonstrate the effectiveness and advantages of the proposed approach.

Download Full-text

SMOTEMultiBoost: Leveraging the SMOTE with MultiBoost to Confront the Class Imbalance in Supervised Learning

Journal of Information Communication Technologies and Robotic Applications ◽

10.51239/jictra.v0i0.227 ◽

2020 ◽

Author(s):

Naveed Ahmad Khan Jhamat ◽

Ghulam Mustafa ◽

Zhendong Niu

Keyword(s):

False Negative ◽

Class Imbalance ◽

Sampling Technique ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Limiting Error ◽

Improved Performance ◽

Imbalance Learning ◽

Class Imbalance Learning

Class imbalance problem is being manifoldly confronted by researchers due to the increasing amount of complicated data. Common classification algorithms are impoverished to perform effectively on imbalanced datasets. Larger class cases typically outbalance smaller class cases in class imbalance learning. Common classification algorithms raise larger class performance owing to class imbalance in data and overall improvement in accuracy as their goal while lowering performance on smaller class. Furthermore, these algorithms deal false positive and false negative in an even way and regard equal cost of misclassifying cases. Meanwhile, different ensemble solutions have been proposed over the years for class imbalance learning but these approaches hamper the performance of larger class as emphasizing on the small class cases. The intuition of this overall degraded outcome would be the low diversity in ensemble solutions and overfitting or underfitting in data resampling techniques. To overcome these problems, we suggest a hybrid ensemble method by leveraging MultiBoost ensemble and Synthetic Minority Over-sampling TEchnique (SMOTE). Our suggested solution leverage the effectiveness of its elements. Therefore, it improves the outcome of the smaller class by reinforcing its space and limiting error in prediction. The proposed method shows improved performance as compare to numerous other algorithms and techniques in experiments.

Download Full-text