COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction

2021 ◽  
Vol 129 ◽  
pp. 106432 ◽  
Author(s):  
Shuo Feng ◽  
Jacky Keung ◽  
Xiao Yu ◽  
Yan Xiao ◽  
Kwabena Ebo Bennin ◽  
...  
IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 24184-24195 ◽  
Author(s):  
Shamsul Huda ◽  
Kevin Liu ◽  
Mohamed Abdelrazek ◽  
Amani Ibrahim ◽  
Sultan Alyahya ◽  
...  

Symmetry ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 407 ◽  
Author(s):  
Kiran Kumar Bejjanki ◽  
Jayadev Gyani ◽  
Narsimha Gugulothu

Software defect prediction (SDP) is the technique used to predict the occurrences of defects in the early stages of software development process. Early prediction of defects will reduce the overall cost of software and also increase its reliability. Most of the defect prediction methods proposed in the literature suffer from the class imbalance problem. In this paper, a novel class imbalance reduction (CIR) algorithm is proposed to create a symmetry between the defect and non-defect records in the imbalance datasets by considering distribution properties of the datasets and is compared with SMOTE (synthetic minority oversampling technique), a built-in package of many machine learning tools that is considered a benchmark in handling class imbalance problems, and with K-Means SMOTE. We conducted the experiment on forty open source software defect datasets from PRedict or Models in Software Engineering (PROMISE) repository using eight different classifiers and evaluated with six performance measures. The results show that the proposed CIR method shows improved performance over SMOTE and K-Means SMOTE.


Author(s):  
R. Srivastava ◽  
Aman Kumar Jain

Objective:: Defects in delivered software products not only have financial implications but also blemish the reputation of the organisation and lead to wastage of time and human resource. This paper aims to detect defects in software modules. Methods:: Our approach sequentially combines SMOTE algorithm to deal with class imbalance problem, K - means clustering algorithm to obtain a set of key features based on inter-class and intra-class coefficient of correlation and ensemble modelling to predict defects in software modules. After cautious examination, an ensemble framework of XGBoost, Decision Tree and Random Forest is used for prediction of software defects owing to numerous merits of ensembling approach. Results:: We have used five open-source datasets from NASA Promise Repository for Software Engineering. The result obtained from our approach has been compared with that of individual algorithms used in ensemble. A confidence interval for the accuracy of our approach with respect to performance evaluation metrics namely Accuracy, Precision, Recall, F1 score and AUC score has also been constructed at a significance level of 0.01. Conclusion:: Results have been depicted pictographically.


2021 ◽  
Vol 9 (1) ◽  
pp. 52-68
Author(s):  
Lipika Goel ◽  
Mayank Sharma ◽  
Sunil Kumar Khatri ◽  
D. Damodaran

Often, the prior defect data of the same project is unavailable; researchers thought whether the defect data of the other projects can be used for prediction. This made cross project defect prediction an open research issue. In this approach, the training data often suffers from class imbalance problem. Here, the work is directed on homogeneous cross-project defect prediction. A novel ensemble model that will perform in dual fold is proposed. Firstly, it will handle the class imbalance problem of the dataset. Secondly, it will perform the prediction of the target class. For handling the imbalance problem, the training dataset is divided into data frames. Each data frame will be balanced. An ensemble model using the maximum voting of all random forest classifiers is implemented. The proposed model shows better performance in comparison to the other baseline models. Wilcoxon signed rank test is performed for validation of the proposed model.


Sign in / Sign up

Export Citation Format

Share Document