The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study

Class imbalance is one of the challenging problems for machine-learning in many real-world applications. Many methods have been proposed to address and attempt to solve the problem, including sampling and cost-sensitive learning. The latter has attracted significant attention in recent years to solve the problem, but it is difficult to determine the precise misclassification costs in practice. There are also other factors that influence the performance of the classification including the input feature subset and the intrinsic parameters of the classifier. This chapter presents an effective wrapper framework incorporating the evaluation measure (AUC and G-mean) into the objective function of cost sensitive learning directly to improve the performance of classification by simultaneously optimizing the best pair of feature subset, intrinsic parameters, and misclassification cost parameter. The optimization is based on Particle Swarm Optimization (PSO). The authors use two different common methods, support vector machine and feed forward neural networks, to evaluate the proposed framework. Experimental results on various standard benchmark datasets with different ratios of imbalance and a real-world problem show that the proposed method is effective in comparison with commonly used sampling techniques.

Download Full-text

An Empirical Study for the Multi-class Imbalance Problem with Neural Networks

Lecture Notes in Computer Science - Progress in Pattern Recognition, Image Analysis and Applications ◽

10.1007/978-3-540-85920-8_59 ◽

2008 ◽

pp. 479-486 ◽

Cited By ~ 8

Author(s):

R. Alejo ◽

J. M. Sotoca ◽

G. A. Casañ

Keyword(s):

Neural Networks ◽

Empirical Study ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

Effects of Class Imbalance in Test Suites: An Empirical Study of Spectrum-Based Fault Localization

2012 IEEE 36th Annual Computer Software and Applications Conference Workshops ◽

10.1109/compsacw.2012.89 ◽

2012 ◽

Cited By ~ 7

Author(s):

Cheng Gong ◽

Zheng Zheng ◽

Wei Li ◽

Peng Hao

Keyword(s):

Empirical Study ◽

Fault Localization ◽

Class Imbalance ◽

Test Suites

Download Full-text

An Empirical Study of Boosting Methods on Severely Imbalanced Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2510 ◽

2014 ◽

Vol 513-517 ◽

pp. 2510-2513 ◽

Cited By ~ 1

Author(s):

Xu Ying Liu

Keyword(s):

Empirical Study ◽

Real World ◽

Class Imbalance ◽

Imbalanced Data ◽

Real World Applications ◽

Under Sampling ◽

The Difference ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

F Measure

Nowadays there are large volumes of data in real-world applications, which poses great challenge to class-imbalance learning: the large amount of the majority class examples and severe class-imbalance. Previous studies on class-imbalance learning mainly focused on relatively small or moderate class-imbalance. In this paper we conduct an empirical study to explore the difference between learning with small or moderate class-imbalance and learning with severe class-imbalance. The experimental results show that: (1) Traditional methods cannot handle severe class-imbalance effectively. (2) AUC, G-mean and F-measure can be very inconsistent for severe class-imbalance, which seldom appears when class-imbalance is moderate. And G-mean is not appropriate for severe class-imbalance learning because it is not sensitive to the change of imbalance ratio. (3) When AUC and G-mean are evaluation metrics, EasyEnsemble is the best method, followed by BalanceCascade and under-sampling. (4) A little under-full balance is better for under-sampling to handle severe class-imbalance. And it is important to handle false positives when design methods for severe class-imbalance.

Download Full-text

Studying cost-sensitive learning for multi-class imbalance in Internet traffic classification

The Journal of China Universities of Posts and Telecommunications ◽

10.1016/s1005-8885(11)60319-1 ◽

2012 ◽

Vol 19 (6) ◽

pp. 63-72 ◽

Cited By ~ 9

Author(s):

Zhen LIU ◽

Qiong LIU

Keyword(s):

Class Imbalance ◽

Internet Traffic ◽

Traffic Classification ◽

Cost Sensitive Learning ◽

Internet Traffic Classification

Download Full-text

Two-Stage Cost-Sensitive Learning for Data Streams With Concept Drift and Class Imbalance

IEEE Access ◽

10.1109/access.2020.3031603 ◽

2020 ◽

Vol 8 ◽

pp. 191942-191955

Author(s):

Yange Sun ◽

Yi Sun ◽

Honghua Dai

Keyword(s):

Data Streams ◽

Concept Drift ◽

Class Imbalance ◽

Two Stage ◽

Cost Sensitive Learning

Download Full-text

An empirical study of cost-sensitive learning in cultural modeling

Information Systems and e-Business Management ◽

10.1007/s10257-012-0198-4 ◽

2012 ◽

Vol 11 (3) ◽

pp. 437-455 ◽

Cited By ~ 1

Author(s):

Peng Su ◽

Wenji Mao ◽

Daniel Zeng

Keyword(s):

Empirical Study ◽

Cost Sensitive Learning ◽

Cultural Modeling

Download Full-text