COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction

Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem

Information Systems ◽

10.1016/j.is.2015.02.006 ◽

2015 ◽

Vol 51 ◽

pp. 62-71 ◽

Cited By ~ 56

Author(s):

Michael J. Siers ◽

Md Zahidul Islam

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Potential Solution ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Software Defect ◽

Decision Forest

Download Full-text

Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering

IEEE Access ◽

10.1109/access.2019.2945858 ◽

2019 ◽

Vol 7 ◽

pp. 145725-145737 ◽

Cited By ~ 4

Author(s):

Lina Gong ◽

Shujuan Jiang ◽

Li Jiang

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Software Defect

Download Full-text

An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction

IEEE Access ◽

10.1109/access.2018.2817572 ◽

2018 ◽

Vol 6 ◽

pp. 24184-24195 ◽

Cited By ~ 26

Author(s):

Shamsul Huda ◽

Kevin Liu ◽

Mohamed Abdelrazek ◽

Amani Ibrahim ◽

Sultan Alyahya ◽

...

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Software Defect

Download Full-text

Class Imbalance Reduction (CIR): A Novel Approach to Software Defect Prediction in the Presence of Class Imbalance

Symmetry ◽

10.3390/sym12030407 ◽

2020 ◽

Vol 12 (3) ◽

pp. 407 ◽

Cited By ~ 2

Author(s):

Kiran Kumar Bejjanki ◽

Jayadev Gyani ◽

Narsimha Gugulothu

Keyword(s):

Open Source Software ◽

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Learning Tools ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Software Defect ◽

Novel Approach ◽

Improved Performance

Software defect prediction (SDP) is the technique used to predict the occurrences of defects in the early stages of software development process. Early prediction of defects will reduce the overall cost of software and also increase its reliability. Most of the defect prediction methods proposed in the literature suffer from the class imbalance problem. In this paper, a novel class imbalance reduction (CIR) algorithm is proposed to create a symmetry between the defect and non-defect records in the imbalance datasets by considering distribution properties of the datasets and is compared with SMOTE (synthetic minority oversampling technique), a built-in package of many machine learning tools that is considered a benchmark in handling class imbalance problems, and with K-Means SMOTE. We conducted the experiment on forty open source software defect datasets from PRedict or Models in Software Engineering (PROMISE) repository using eight different classifiers and evaluated with six performance measures. The results show that the proposed CIR method shows improved performance over SMOTE and K-Means SMOTE.

Download Full-text

Feature Clustering and Ensemble Learning Based Approach for Software Defect Prediction

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999201109201259 ◽

2020 ◽

Vol 13 ◽

Author(s):

R. Srivastava ◽

Aman Kumar Jain

Keyword(s):

Clustering Algorithm ◽

Class Imbalance ◽

Software Defect Prediction ◽

Class Imbalance Problem ◽

Feature Clustering ◽

Significance Level ◽

Software Products ◽

Imbalance Problem ◽

Software Defect ◽

Software Modules

Objective:: Defects in delivered software products not only have financial implications but also blemish the reputation of the organisation and lead to wastage of time and human resource. This paper aims to detect defects in software modules. Methods:: Our approach sequentially combines SMOTE algorithm to deal with class imbalance problem, K - means clustering algorithm to obtain a set of key features based on inter-class and intra-class coefficient of correlation and ensemble modelling to predict defects in software modules. After cautious examination, an ensemble framework of XGBoost, Decision Tree and Random Forest is used for prediction of software defects owing to numerous merits of ensembling approach. Results:: We have used five open-source datasets from NASA Promise Repository for Software Engineering. The result obtained from our approach has been compared with that of individual algorithms used in ensemble. A confidence interval for the accuracy of our approach with respect to performance evaluation metrics namely Accuracy, Precision, Recall, F1 score and AUC score has also been constructed at a significance level of 0.01. Conclusion:: Results have been depicted pictographically.

Download Full-text

Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence51648.2021.9377068 ◽

2021 ◽

Author(s):

Ruchika Malhotra ◽

Vaibhav Agrawal ◽

Vedansh Pal ◽

Tushar Agarwal

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction

Artificial Intelligence Review ◽

10.1007/s10462-021-10044-w ◽

2021 ◽

Author(s):

Somya Goyal

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Under Sampling

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction

2012 11th International Conference on Machine Learning and Applications ◽

10.1109/icmla.2012.145 ◽

2012 ◽

Cited By ~ 4

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Hybrid Approach ◽

Class Imbalance ◽

High Dimensionality ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

A Framework for Homogeneous Cross-Project Defect Prediction

International Journal of Software Innovation ◽

10.4018/ijsi.2021010105 ◽

2021 ◽

Vol 9 (1) ◽

pp. 52-68

Author(s):

Lipika Goel ◽

Mayank Sharma ◽

Sunil Kumar Khatri ◽

D. Damodaran

Keyword(s):

Class Imbalance ◽

The Other ◽

Training Data ◽

Defect Prediction ◽

Rank Test ◽

Ensemble Model ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Proposed Model ◽

Cross Project

Often, the prior defect data of the same project is unavailable; researchers thought whether the defect data of the other projects can be used for prediction. This made cross project defect prediction an open research issue. In this approach, the training data often suffers from class imbalance problem. Here, the work is directed on homogeneous cross-project defect prediction. A novel ensemble model that will perform in dual fold is proposed. Firstly, it will handle the class imbalance problem of the dataset. Secondly, it will perform the prediction of the target class. For handling the imbalance problem, the training dataset is divided into data frames. Each data frame will be balanced. An ensemble model using the maximum voting of all random forest classifiers is implemented. The proposed model shows better performance in comparison to the other baseline models. Wilcoxon signed rank test is performed for validation of the proposed model.

Download Full-text