An Empirical Study on Software Defect Prediction Using CodeBERT Model

Cong Pan; Minyan Lu; Biao Xu

doi:10.3390/app11114793

An Empirical Study on Software Defect Prediction Using CodeBERT Model

Applied Sciences ◽

10.3390/app11114793 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4793

Author(s):

Cong Pan ◽

Minyan Lu ◽

Biao Xu

Keyword(s):

Deep Learning ◽

Software Engineering ◽

Empirical Study ◽

Empirical Studies ◽

Language Model ◽

Prediction Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Cross Project

Deep learning-based software defect prediction has been popular these days. Recently, the publishing of the CodeBERT model has made it possible to perform many software engineering tasks. We propose various CodeBERT models targeting software defect prediction, including CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT. We perform empirical studies using such models in cross-version and cross-project software defect prediction to investigate if using a neural language model like CodeBERT could improve prediction performance. We also investigate the effects of different prediction patterns in software defect prediction using CodeBERT models. The empirical results are further discussed.

Download Full-text

A Ranking-Oriented Approach to Cross-Project Software Defect Prediction: An Empirical Study

Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering ◽

10.18293/seke2016-047 ◽

2016 ◽

Cited By ~ 2

Author(s):

Guoan You ◽

Yutao Ma

Keyword(s):

Empirical Study ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Oriented Approach ◽

Cross Project

Download Full-text

Local versus Global Models for Just-In-Time Software Defect Prediction

Scientific Programming ◽

10.1155/2019/2384706 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Xingguang Yang ◽

Huiqun Yu ◽

Guisheng Fan ◽

Kai Shi ◽

Liqiong Chen

Keyword(s):

Cross Validation ◽

Prediction Models ◽

Prediction Performance ◽

Defect Prediction ◽

Just In Time ◽

Software Defect Prediction ◽

Local Models ◽

Global Models ◽

Software Defect ◽

Cross Project

Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.

Download Full-text

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Symmetry ◽

10.3390/sym12071147 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1147 ◽

Cited By ~ 2

Author(s):

Abdullateef O. Balogun ◽

Shuib Basri ◽

Saipunidzam Mahamad ◽

Said J. Abdulkadir ◽

Malek A. Almomani ◽

...

Keyword(s):

Feature Selection ◽

Empirical Study ◽

Prediction Models ◽

Empirical Studies ◽

Experimental Results ◽

Defect Prediction ◽

Software Defect Prediction ◽

Search Methods ◽

Software Defect ◽

The Impact

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

Download Full-text

An empirical study on the effectiveness of data resampling approaches for cross‐project software defect prediction

IET Software ◽

10.1049/sfw2.12052 ◽

2021 ◽

Author(s):

Kwabena Ebo Bennin ◽

Amjed Tahir ◽

Stephen G. MacDonell ◽

Jürgen Börstler

Keyword(s):

Empirical Study ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Cross Project

Download Full-text

Software Defect Prediction Via Deep Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1858.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 343-349

Keyword(s):

Deep Learning ◽

Random Forest ◽

Defect Prediction ◽

Software Defect Prediction ◽

Limited Data ◽

Training Models ◽

Software Defect ◽

Bayes Network ◽

Fold Cross Validation ◽

Cross Project

Existing models on defect prediction are trained on historical limited data which has been studied from a variety of pioneering and researchers. Cross-project defect prediction, which is often reuse data from other projects, works well when the data of training models is completely sufficient to meet the project demands. However, current studies on software defect prediction require some degree of heterogeneity of metric values that does not always lead to accurate predictions. Inspired by the current research studies, this paper takes the benefit with the state-of-the-art of deep learning and random forest to perform various experiments using five different datasets. Our model is ideal for predicting of defects with 90% accuracy using 10-fold cross-validation. The achieved results show that Random Forest and Deep learning are giving more accurate predictions with compared to Bayes network and SVM on all five datasets. We also derived Deep Learning that can be competitive classifiers and provide more robust for detecting defect prediction.

Download Full-text

An investigation of cross-project learning in online just-in-time software defect prediction

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering ◽

10.1145/3377811.3380403 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sadia Tabassum ◽

Leandro L. Minku ◽

Danyi Feng ◽

George G. Cabral ◽

Liyan Song

Keyword(s):

Defect Prediction ◽

Just In Time ◽

Software Defect Prediction ◽

Project Learning ◽

Software Defect ◽

Cross Project

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Software defect prediction using K‐PCA and various kernel‐based extreme learning machine: an empirical study

IET Software ◽

10.1049/iet-sen.2020.0119 ◽

2020 ◽

Vol 14 (7) ◽

pp. 768-782

Author(s):

Sushant Kumar Pandey ◽

Deevashwer Rathee ◽

Anil Kumar Tripathi

Keyword(s):

Empirical Study ◽

Extreme Learning Machine ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Machine

Download Full-text

An Empirical Study of Model-Agnostic Interpretation Technique for Just-in-Time Software Defect Prediction

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Collaborative Computing: Networking, Applications and Worksharing ◽

10.1007/978-3-030-92635-9_25 ◽

2021 ◽

pp. 420-438

Author(s):

Xingguang Yang ◽

Huiqun Yu ◽

Guisheng Fan ◽

Zijie Huang ◽

Kang Yang ◽

...

Keyword(s):

Empirical Study ◽

Defect Prediction ◽

Just In Time ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

A Top-k Learning to Rank Approach to Cross-Project Software Defect Prediction

2018 25th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2018.00048 ◽

2018 ◽

Cited By ~ 1

Author(s):

Feng Wang ◽

Jinxiao Huang ◽

Yutao Ma

Keyword(s):

Learning To Rank ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Cross Project

Download Full-text