scholarly journals Cross-Project Defect Prediction Method Based on Manifold Feature Transformation

2021 ◽  
Vol 13 (8) ◽  
pp. 216
Author(s):  
Yu Zhao ◽  
Yi Zhu ◽  
Qiao Yu ◽  
Xiaoying Chen

Traditional research methods in software defect prediction use part of the data in the same project to train the defect prediction model and predict the defect label of the remaining part of the data. However, in the practical realm of software development, the software project that needs to be predicted is generally a brand new software project, and there is not enough labeled data to build a defect prediction model; therefore, traditional methods are no longer applicable. Cross-project defect prediction uses the labeled data of the same type of project similar to the target project to build the defect prediction model, so as to solve the problem of data loss in traditional methods. However, the difference in data distribution between the same type of project and the target project reduces the performance of defect prediction. To solve this problem, this paper proposes a cross-project defect prediction method based on manifold feature transformation. This method transforms the original feature space of the project into a manifold space, then reduces the difference in data distribution of the transformed source project and the transformed target project in the manifold space, and finally uses the transformed source project to train a naive Bayes prediction model with better performance. A comparative experiment was carried out using the Relink dataset and the AEEEM dataset. The experimental results show that compared with the benchmark method and several cross-project defect prediction methods, the proposed method effectively reduces the difference in data distribution between the source project and the target project, and obtains a higher F1 value, which is an indicator commonly used to measure the performance of the two-class model.

2018 ◽  
Vol 232 ◽  
pp. 03017
Author(s):  
Jie Zhang ◽  
Gang Wang ◽  
Haobo Jiang ◽  
Fangzheng Zhao ◽  
Guilin Tian

Software Defect Prediction has been an important part of Software engineering research since the 1970s. This technique is used to calculate and analyze the measurement and defect information of the historical software module to complete the defect prediction of the new software module. Currently, most software defect prediction model is established on the basis of the same software project data set. The training date sets used to construct the model and the test data sets used to validate the model are from the same software projects. But in practice, for those has less historical data of a software project or new projects, the defect of traditional prediction method shows lower forecast performance. For the traditional method, when the historical data is insufficient, the software defect prediction model cannot be fully studied. It is difficult to achieve high prediction accuracy. In the process of cross-project prediction, the problem that we will faced is data distribution differences. For the above problems, this paper presents a software defect prediction model based on migration learning and traditional software defect prediction model. This model uses the existing project data sets to predict software defects across projects. The main work of this article includes: 1) Data preprocessing. This section includes data feature correlation analysis, noise reduction and so on, which effectively avoids the interference of over-fitting problem and noise data on prediction results. 2) Migrate learning. This section analyzes two different but related project data sets and reduces the impact of data distribution differences. 3) Artificial neural networks. According to class imbalance problems of the data set, using artificial neural network and dynamic selection training samples reduce the influence of prediction results because of the positive and negative samples data. The data set of the Relink project and AEEEM is studied to evaluate the performance of the f-measure and the ROC curve and AUC calculation. Experiments show that the model has high predictive performance.


2020 ◽  
Author(s):  
Zhubo Xu ◽  
Weifeng Qin

Abstract Football is one of the sports that is loved by people all over the world. Its sales ability in the league should not be underestimated. Moreover, football has been developed in our country since ancient times and has a huge fan base, and fans are the main target of football league sales. Predicting the sales effect of the football league is helpful for the seller to formulate a suitable sales strategy and avoid the problem of product surplus. This article mainly introduces the prediction research of football league sales effect based on BP neural network, and intends to provide ideas and methods for predicting the sales effect of football league. This paper puts forward the basic method of the sales effect prediction of the football league and the BP neural network football league sales effect prediction method to analyze and predict the sales effect of the football league. In addition, the steps of establishing BP neural network design, building BP neural network football league sales effect prediction model and applying BP neural network football league sales effect prediction model are also proposed. The experimental results of this article show that the difference between the fitting part of the neural network model and the real value of the football league sales effect sample data is in the range of , the error percentage difference is small, and the prediction results are valid。


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Yanli Shao ◽  
Jingru Zhao ◽  
Xingqi Wang ◽  
Weiwei Wu ◽  
Jinglong Fang

As the scale and complexity of software increase, software security issues have become the focus of society. Software defect prediction (SDP) is an important means to assist developers in discovering and repairing potential defects that may endanger software security in advance and improving software security and reliability. Currently, cross-project defect prediction (CPDP) and cross-company defect prediction (CCDP) are widely studied to improve the defect prediction performance, but there are still problems such as inconsistent metrics and large differences in data distribution between source and target projects. Therefore, a new CCDP method based on metric matching and sample weight setting is proposed in this study. First, a clustering-based metric matching method is proposed. The multigranularity metric feature vector is extracted to unify the metric dimension while maximally retaining the information contained in the metrics. Then use metric clustering to eliminate metric redundancy and extract representative metrics through principal component analysis (PCA) to support one-to-one metric matching. This strategy not only solves the metric inconsistent and redundancy problem but also transforms the cross-company heterogeneous defect prediction problem into a homogeneous problem. Second, a sample weight setting method is proposed to transform the source data distribution. Wherein the statistical source sample frequency information is set as an impact factor to increase the weight of source samples that are more similar to the target samples, which improves the data distribution similarity between the source and target projects, thereby building a more accurate prediction model. Finally, after the above two-step processing, some classical machine learning methods are applied to build the prediction model, and 12 project datasets in NASA and PROMISE are used for performance comparison. Experimental results prove that the proposed method has superior prediction performance over other mainstream CCDP methods.


IET Software ◽  
2021 ◽  
Author(s):  
Qingan Huang ◽  
Le Ma ◽  
Siyu Jiang ◽  
Guobin Wu ◽  
Hengjie Song ◽  
...  

Algorithms ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 13
Author(s):  
Shengbing Ren ◽  
Wanying Zhang ◽  
Hafiz Shahbaz Munir ◽  
Lei Xia

Software defect prediction is an important means to guarantee software quality. Because there are no sufficient historical data within a project to train the classifier, cross-project defect prediction (CPDP) has been recognized as a fundamental approach. However, traditional defect prediction methods use feature attributes to represent samples, which cannot avoid negative transferring, may result in poor performance model in CPDP. This paper proposes a multi-source cross-project defect prediction method based on dissimilarity space (DM-CPDP). This method not only retains the original information, but also obtains the relationship with other objects. So it can enhances the discriminant ability of the sample attributes to the class label. This method firstly uses the density-based clustering method to construct the prototype set with the cluster center of samples in the target set. Then, the arc-cosine kernel is used to calculate the sample dissimilarities between the prototype set and the source domain or the target set to form the dissimilarity space. In this space, the training set is obtained with the earth mover’s distance (EMD) method. For the unlabeled samples converted from the target set, the k-Nearest Neighbor (KNN) algorithm is used to label those samples. Finally, the model is learned from training data based on TrAdaBoost method and used to predict new potential defects. The experimental results show that this approach has better performance than other traditional CPDP methods.


Author(s):  
Shengbing Ren ◽  
Wanying Zhang ◽  
Hafiz Shahbaz Munir ◽  
Lei Xia

Software defect prediction is an important means to guarantee software quality. Because there are no sufficient historical data within a project to train the classifier, cross-project defect prediction (CPDP) has been recognized as a fundamental approach.  However, traditional defect prediction methods using feature attributes to represent samples, which can not avoid negative transferring, may result in poor performance model in CPDP. This paper proposes a multi-source cross-project defect prediction method based on dissimilarity space ( DM-CPDP). This method first uses the density-based clustering method to construct the prototype set with the cluster center of samples in the target set. Then, the arc-cosine kernel is used to form the dissimilarity space, and in this space the training set is obtained with the earth mover’s distance (EMD) method. For the unlabeled samples converted from the target set, the KNN algorithm is used to label those samples. Finally, we use TrAdaBoost method to establish the prediction model.  The experimental results show that our approach has better performance than other traditional CPDP methods.


Sign in / Sign up

Export Citation Format

Share Document