Analysis of Data Mining Techniques for Software Effort Estimation

Software development effort estimation is important for quality management in the software development industry, yet its automation still remains a challenging issue. Applying machine learning algorithms alone often cannot achieve satisfactory results. This paper presents an integrated data mining framework that incorporates domain knowledge into a series of data analysis and modeling processes, including visualization, feature selection, and model validation. An empirical study on the software effort estimation problem using a benchmark dataset shows the necessity and effectiveness of the proposed approach.

Download Full-text

COMPARATIVE ANALYSIS OF SOFTWARE EFFORT ESTIMATION USING DATA MINING TECHNIQUE AND FEATURE SELECTION

JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) ◽

10.33480/jitk.v6i2.1968 ◽

2021 ◽

Vol 6 (2) ◽

pp. 167-174

Author(s):

Abdul Latif ◽

Lady Agustin Fitriana ◽

Muhammad Rifqi Firdaus

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Linear Regression ◽

Software Development ◽

Data Mining Algorithm ◽

Effort Estimation ◽

Software Effort Estimation ◽

Data Mining Technique ◽

Software Business

Software development involves several interrelated factors that influence development efforts and productivity. Improving the estimation techniques available to project managers will facilitate more effective time and budget control in software development. Software Effort Estimation or software cost/effort estimation can help a software development company to overcome difficulties experienced in estimating software development efforts. This study aims to compare the Machine Learning method of Linear Regression (LR), Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Decision Tree Random Forest (DTRF) to calculate estimated cost/effort software. Then these five approaches will be tested on a dataset of software development projects as many as 10 dataset projects. So that it can produce new knowledge about what machine learning and non-machine learning methods are the most accurate for estimating software business. As well as knowing between the selection between using Particle Swarm Optimization (PSO) for attributes selection and without PSO, which one can increase the accuracy for software business estimation. The data mining algorithm used to calculate the most optimal software effort estimate is the Linear Regression algorithm with an average RMSE value of 1603,024 for the 10 datasets tested. Then using the PSO feature selection can increase the accuracy or reduce the RMSE average value to 1552,999. The result indicates that, compared with the original regression linear model, the accuracy or error rate of software effort estimation has increased by 3.12% by applying PSO feature selection

Download Full-text

Implementation of Data Mining Techniques for Software Development Effort Estimation

Computational Intelligence Techniques and Their Applications to Software Engineering Problems ◽

10.1201/9781003079996-3 ◽

2020 ◽

pp. 29-47

Author(s):

Deepti Gupta ◽

Sushma Malik

Keyword(s):

Data Mining ◽

Software Development ◽

Development Effort ◽

Effort Estimation ◽

Data Mining Techniques ◽

Software Development Effort ◽

Software Development Effort Estimation

Download Full-text

Application of data mining techniques for identifying the holistic athlete's characteristics

PsycEXTRA Dataset ◽

10.1037/e548052012-458 ◽

2007 ◽

Author(s):

Stavroula Psouni ◽

Dimitris Psounis

Keyword(s):

Data Mining ◽

Data Mining Techniques

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

How Sweet and Ripe are the Fruits? Data Mining Techniques for Classifying and Predicting ‘Quick-Wins’ Direct Capital Investment in Indonesia as One Approach to Business intelligence Orientation and Knowledge Management Scenarios of Indonesian Enterprises

ACMIT Proceedings ◽

10.33555/acmit.v1i1.13 ◽

2019 ◽

Vol 1 (1) ◽

pp. 121-131

Author(s):

Ali Fauzi

Keyword(s):

Data Mining ◽

Business Intelligence ◽

Direct Investment ◽

Capital Investment ◽

Export Market ◽

Added Value ◽

Knowledge Generation ◽

Management Scenarios ◽

Data Mining Techniques ◽

Knowledge Based Economy

The existence of big data of Indonesian FDI (foreign direct investment)/ CDI (capital direct investment) has not been exploited somehow to give further ideas and decision making basis. Example of data exploitation by data mining techniques are for clustering/labeling using K-Mean and classification/prediction using Naïve Bayesian of such DCI categories. One of DCI form is the ‘Quick-Wins’, a.k.a. ‘Low-Hanging-Fruits’ Direct Capital Investment (DCI), or named shortly as QWDI. Despite its mentioned unfavorable factors, i.e. exploitation of natural resources, low added-value creation, low skill-low wages employment, environmental impacts, etc., QWDI , to have great contribution for quick and high job creation, export market penetration and advancement of technology potential. By using some basic data mining techniques as complements to usual statistical/query analysis, or analysis by similar studies or researches, this study has been intended to enable government planners, starting-up companies or financial institutions for further CDI development. The idea of business intelligence orientation and knowledge generation scenarios is also one of precious basis. At its turn, Information and Communication Technology (ICT)’s enablement will have strategic role for Indonesian enterprises growth and as a fundamental for ‘knowledge based economy’ in Indonesia.

Download Full-text