AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning

Scientific Programming ◽

10.1155/2021/9947621 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Jia-Bao Wang ◽

Chun-An Zou ◽

Guang-Hui Fu

Keyword(s):

Real World ◽

Class Imbalance ◽

Sample Space ◽

Minority Class ◽

Adaptive Weighting ◽

Variable Space ◽

Real World Datasets ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Data Level

In class-imbalance learning, Synthetic Minority Oversampling Technique (SMOTE) is a widely used technique to tackle class-imbalance problems from the data level, whereas SMOTE blindly selects neighboring minority class points when performing an interpolation among them and inevitably brings collinearity between the generated new points and the original ones. To combat these problems, we propose in this study an adaptive-weighting SMOTE method, termed as AWSMOTE. AWSMOTE applies two types of SVM-based weights into SMOTE. A kind of weight is used in variable space to combat the drawbacks of collinearity, while another weight is utilized in sample space to purposefully choose those support vectors from the minority class as the neighboring points in the interpolation. AWSMOTE is compared with SMOTE and its improved versions with six simulated datasets and 22 real-world datasets. The results demonstrate the effectiveness and advantages of the proposed approach.

Download Full-text

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

Frontiers of Computer Science ◽

10.1007/s11704-018-7182-1 ◽

2019 ◽

Vol 13 (5) ◽

pp. 996-1009 ◽

Cited By ~ 1

Author(s):

Xu-Ying Liu ◽

Sheng-Tao Wang ◽

Min-Ling Zhang

Keyword(s):

Class Imbalance ◽

Minority Class ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

An Empirical Study of Boosting Methods on Severely Imbalanced Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2510 ◽

2014 ◽

Vol 513-517 ◽

pp. 2510-2513 ◽

Cited By ~ 1

Author(s):

Xu Ying Liu

Keyword(s):

Empirical Study ◽

Real World ◽

Class Imbalance ◽

Imbalanced Data ◽

Real World Applications ◽

Under Sampling ◽

The Difference ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

F Measure

Nowadays there are large volumes of data in real-world applications, which poses great challenge to class-imbalance learning: the large amount of the majority class examples and severe class-imbalance. Previous studies on class-imbalance learning mainly focused on relatively small or moderate class-imbalance. In this paper we conduct an empirical study to explore the difference between learning with small or moderate class-imbalance and learning with severe class-imbalance. The experimental results show that: (1) Traditional methods cannot handle severe class-imbalance effectively. (2) AUC, G-mean and F-measure can be very inconsistent for severe class-imbalance, which seldom appears when class-imbalance is moderate. And G-mean is not appropriate for severe class-imbalance learning because it is not sensitive to the change of imbalance ratio. (3) When AUC and G-mean are evaluation metrics, EasyEnsemble is the best method, followed by BalanceCascade and under-sampling. (4) A little under-full balance is better for under-sampling to handle severe class-imbalance. And it is important to handle false positives when design methods for severe class-imbalance.

Download Full-text

VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00786-0 ◽

2021 ◽

Author(s):

Alessio Bernardo ◽

Emanuele Della Valle

Keyword(s):

Data Streams ◽

Concept Drift ◽

Class Imbalance ◽

Imbalanced Data ◽

Real Data ◽

Minority Class ◽

Machine Learning Classification ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Better Than

AbstractThe world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline-Smote inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.

Download Full-text

Class Imbalanced Learning Menggunakan Algoritma Synthetic Minority Over-sampling Technique – Nominal (SMOTE-N) pada Dataset Tuberculosis Anak

Jurnal Buana Informatika ◽

10.24002/jbi.v10i2.2441 ◽

2019 ◽

Vol 10 (2) ◽

pp. 134

Author(s):

Yulia Ery Kurniawati

Keyword(s):

Naive Bayes ◽

Class Imbalance ◽

Sampling Technique ◽

Naïve Bayes ◽

Bayes Classifier ◽

Imbalanced Learning ◽

Naïve Bayes Classifier ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Data Level

Class Imbalance Learning (CIL) merupakan proses pembelajaran untuk representasi data dan ekstraksi informasi dengan distribusi data yang buruk untuk mendukung pembuatan keputusan yang efektif dalam proses pengambilan keputusan. SMOTE-N adalah salah satu pendekatan data-level dalam CIL mengunakan metode over-sampling. SMOTE-N menghasilkan instance sintesis untuk menyeimbangkan jumlah instance pada kelas minoritasnya. Penelitian ini mengaplikasikan SMOTE-N pada dataset Tuberculosis Anak (TB Anak) yang memiliki ketidakseimbangan kelas. Metode over-sampling dipilih untuk menghindari kehilangan informasi yang penting dikarenakan dataset TB Anak memiliki jumlah instance yang sedikit. Naïve Bayes Classifier digunakan untuk mengevaluasi model dari dataset yang sudah seimbang. Hasilnya menunjukkan bahwa SMOTE-N dapat meningkatkan kinerja pada CIL.

Download Full-text

A Method for Class-Imbalance Learning in Android Malware Detection

Electronics ◽

10.3390/electronics10243124 ◽

2021 ◽

Vol 10 (24) ◽

pp. 3124

Author(s):

Jun Guan ◽

Xu Jiang ◽

Baolei Mao

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Computational Cost ◽

Class Imbalance ◽

Sampling Technique ◽

Minority Class ◽

Android Malware ◽

Android Malware Detection ◽

Imbalance Learning ◽

Class Imbalance Learning

More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance in Android malware detection based on machine learning. Hence, the researchers have focused on how to solve class-imbalance to improve the performance of Android malware detection. However, the disadvantages of the existing class-imbalance learning are mainly the loss of valuable samples and the computational cost. In this paper, we propose a method of Class-Imbalance Learning (CIL), which first selects representative features, uses the clustering K-Means algorithm and under-sampling to retain the important samples of the majority class while reducing the number of samples of the majority class. After that, we use the Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to generate minority class samples for data balance, and finally use the Random Forest (RF) algorithm to build a malware detection model. The result of experiments indicates that CIL effectively improves the performance of Android malware detection based on machine learning, especially for class imbalance. Compared with existing class-imbalance learning methods, CIL is also effective for the Machine Learning Repository from the University of California, Irvine (UCI) and has better performance in some data sets.

Download Full-text

Class Imbalance Learning

10.34048/2017.1.f1 ◽

2017 ◽

Author(s):

Sudarsun Santhiappan ◽

Balaraman Ravindran

Keyword(s):

Machine Learning ◽

Real World ◽

Class Imbalance ◽

Classification Problem ◽

Classification Algorithms ◽

Challenges And Opportunities ◽

Data Points ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Real World Problems

Data classiﬁcation task assigns labels to data points using a model that is learned from a collection of pre-labeled data points. The Class Imbalance Learning (CIL) problem is concerned with the performance of classiﬁcation algorithms in the presence of under-represented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced datasets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data effciently into information and knowledge representation. It is important to study CIL because it is rare to ﬁnd a classiﬁcation problem in real world scenarios that follows balanced class distributions. In this article, we have presented how machine learning has become the integral part of modern lifestyle and how some of the real world problems are modeled as CIL problems. We have also provided a detailed survey on the fundamentals and solutions to class imbalance learning. We conclude the survey by presenting some of the challenges and opportunities with class imbalance learning.

Download Full-text