A Novel Feature Subset Selection Algorithm for Software Defect Prediction

The exponential growth in the field of information technology, need for quality-based software development is highly demanded. The important factor to be focused during the software development is software defect detection in earlier stages. Failure to detect hidden faults will affect the effectiveness and quality of the software usage and its maintenance. In traditional software defect prediction models, projects with same metrics are involved in prediction process. In recent years, active topic is dealing with Cross Project Defect Prediction (CPDP) to predict defects on software project from other software projects dataset. Still, traditional cross project defect prediction approaches also require common metrics among the dataset of two projects for constructing the defect prediction techniques. Suppose if cross project dataset with different metrics has to be used for defect prediction then these methods become infeasible. To overcome the issues in software defect prediction using Heterogeneous cross projects dataset, this paper introduced a Boosted Relief Feature Subset Selection (BRFSS) to handle the two different projects with Heterogeneous feature sets. BRFSS employs the mapping approach to embed the data from two different domains into a comparable feature space with a lower dimension. Based on the similarity measure the difference among the mapped domains of dataset are used for prediction process. This work used five different software groups with six different datasets to perform heterogeneous cross project defect prediction using firefly particle swarm optimization. To produce optimal defect prediction in the Heterogeneous environment, the knowledge of particle swarm optimization by inducing firefly algorithm. The simulation result is compared with other standard models, the outcome of the result proved the efficiency of the prediction process while using firefly enabled particle swarm optimization.

Download Full-text

An Evidential approach on Feature Subset Selection in Software Defect Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i12.4149 ◽

2017 ◽

Vol 5 (12) ◽

pp. 41-49

Author(s):

M. Jaikumar ◽

◽

V. Kathiresan

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach

Applied Sciences ◽

10.3390/app9132764 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2764 ◽

Cited By ~ 8

Author(s):

Abdullateef Oluwagbemiga Balogun ◽

Shuib Basri ◽

Said Jadid Abdulkadir ◽

Ahmad Sobri Hashim

Keyword(s):

Software Metrics ◽

Prediction Models ◽

Predictive Performance ◽

Search Method ◽

Feature Subset Selection ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect

Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.

Download Full-text

EMPIRICAL ASSESSMENT OF MACHINE LEARNING BASED SOFTWARE DEFECT PREDICTION TECHNIQUES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008003947 ◽

2008 ◽

Vol 17 (02) ◽

pp. 389-400 ◽

Cited By ~ 41

Author(s):

VENKATA UDAYA B. CHALLAGULLA ◽

FAROKH B. BASTANI ◽

I-LING YEN ◽

RAYMOND A. PAUL

Keyword(s):

Machine Learning ◽

Feature Subset Selection ◽

Defect Prediction ◽

Data Sets ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect ◽

Intelligent Software ◽

Instance Based Learning ◽

Prediction Techniques

Automated reliability assessment is essential for systems that entail dynamic adaptation based on runtime mission-specific requirements. One approach along this direction is to monitor and assess the system using machine learning-based software defect prediction techniques. Due to the dynamic nature of software data collected, Instance-based learning algorithms are proposed for the above purposes. To evaluate the accuracy of these methods, the paper presents an empirical analysis of four different real-time software defect data sets using different predictor models. The results show that a combination of 1R and Instance-based learning along with Consistency-based subset evaluation technique provides a relatively better consistency in achieving accurate predictions as compared with other models. No direct relationship is observed between the skewness present in the data sets and the prediction accuracy of these models. Principal Component Analysis (PCA) does not show a consistent advantage in improving the accuracy of the predictions. While random reduction of attributes gave poor accuracy results, simple Feature Subset Selection methods performed better than PCA for most prediction models. Based on these results, the paper presents a high-level design of an Intelligent Software Defect Analysis tool (ISDAT) for dynamic monitoring and defect assessment of software modules.

Download Full-text

A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction

Computational Intelligence and Neuroscience ◽

10.1155/2021/5069016 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

Abdullateef O. Balogun ◽

Shuib Basri ◽

Saipunidzam Mahamad ◽

Luiz Fernando Capretz ◽

Abdullahi Abubakar Imam ◽

...

Keyword(s):

Feature Selection ◽

Rank Aggregation ◽

Feature Subset Selection ◽

Selection Problem ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Local Optima ◽

Software Defect ◽

Wrapper Feature Selection

The high dimensionality of software metric features has long been noted as a data quality problem that affects the performance of software defect prediction (SDP) models. This drawback makes it necessary to apply feature selection (FS) algorithm(s) in SDP processes. FS approaches can be categorized into three types, namely, filter FS (FFS), wrapper FS (WFS), and hybrid FS (HFS). HFS has been established as superior because it combines the strength of both FFS and WFS methods. However, selecting the most appropriate FFS (filter rank selection problem) for HFS is a challenge because the performance of FFS methods depends on the choice of datasets and classifiers. In addition, the local optima stagnation and high computational costs of WFS due to large search spaces are inherited by the HFS method. Therefore, as a solution, this study proposes a novel rank aggregation-based hybrid multifilter wrapper feature selection (RAHMFWFS) method for the selection of relevant and irredundant features from software defect datasets. The proposed RAHMFWFS is divided into two stepwise stages. The first stage involves a rank aggregation-based multifilter feature selection (RMFFS) method that addresses the filter rank selection problem by aggregating individual rank lists from multiple filter methods, using a novel rank aggregation method to generate a single, robust, and non-disjoint rank list. In the second stage, the aggregated ranked features are further preprocessed by an enhanced wrapper feature selection (EWFS) method based on a dynamic reranking strategy that is used to guide the feature subset selection process of the HFS method. This, in turn, reduces the number of evaluation cycles while amplifying or maintaining its prediction performance. The feasibility of the proposed RAHMFWFS was demonstrated on benchmarked software defect datasets with Naïve Bayes and Decision Tree classifiers, based on accuracy, the area under the curve (AUC), and F-measure values. The experimental results showed the effectiveness of RAHMFWFS in addressing filter rank selection and local optima stagnation problems in HFS, as well as the ability to select optimal features from SDP datasets while maintaining or enhancing the performance of SDP models. To conclude, the proposed RAHMFWFS achieved good performance by improving the prediction performances of SDP models across the selected datasets, compared to existing state-of-the-arts HFS methods.

Download Full-text

RFC: A feature selection algorithm for software defect prediction

Journal of Systems Engineering and Electronics ◽

10.23919/jsee.2021.000032 ◽

2021 ◽

Vol 32 (2) ◽

pp. 389-398

Author(s):

Xu Xiaolong ◽

Chen Wen ◽

Wang Xinheng

Keyword(s):

Feature Selection ◽

Defect Prediction ◽

Software Defect Prediction ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Software Defect

Download Full-text

A conservative feature subset selection algorithm with missing data

Neurocomputing ◽

10.1016/j.neucom.2009.05.019 ◽

2010 ◽

Vol 73 (4-6) ◽

pp. 585-590 ◽

Cited By ~ 12

Author(s):

Alex Aussem ◽

Sergio Rodrigues de Morais

Keyword(s):

Missing Data ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm

Download Full-text

A novel Markov boundary based feature subset selection algorithm

Neurocomputing ◽

10.1016/j.neucom.2009.05.018 ◽

2010 ◽

Vol 73 (4-6) ◽

pp. 578-584 ◽

Cited By ~ 17

Author(s):

Sérgio Rodrigues de Morais ◽

Alex Aussem

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm

Download Full-text

Feature Subset Selection Algorithm for High-Minded Dimensional Data by Using Fast Cluster

International Journal of Engineering Trends and Technology ◽

10.14445/22315381/ijett-v14p246 ◽

2014 ◽

Vol 14 (5) ◽

pp. 232-237

Author(s):

B.Swarna Kumari ◽

◽

M.Doorvasulu Naidu

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm

Download Full-text

The feature subset selection algorithm

Journal of Electronics (China) ◽

10.1007/s11767-003-0088-5 ◽

2003 ◽

Vol 20 (1) ◽

pp. 57-61

Author(s):

Yongguo Liu ◽

Xueming Li ◽

Zhongfu Wu

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm

Download Full-text