Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques

Nowadays, heart disease is the main cause of several deaths among all other diseases. Due to the lack of resources in the medical field, the prediction of heart diseases becomes a major problem. For early diagnosis and treatment, some classification algorithms such as Decision Tree and Random Forest Algorithm are used. The data mining techniques compare the accuracy of the algorithm and predict heart diseases. The main aim of this paper is to predict heart disease based on the dataset values. In this paper we are comparing the accuracy of above two algorithms. To implement these methods the following steps are used. In first phase, a dataset of 13 attributes is collected and it was applied on classification techniques using the Decision tree and Random Forest Algorithms. Finally, the accuracy is collected for both the algorithms. In this paper we observed that random forest is generating better results than decision tree in prediction of heart diseases.

Download Full-text

Mengatasi Imbalanced Class Pada Software Defect Prediction Menggunakan Two-Step Clustering-Based Undersampling dan Bagging Tehcnique

Jurnal Informatika ◽

10.31311/ji.v6i1.5448 ◽

2019 ◽

Vol 6 (1) ◽

pp. 107-113

Author(s):

Muhammad Faittullah Akbar ◽

Ilham Kurniawan ◽

Ahmad Fauzi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Area Under The Curve ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Random Undersampling ◽

Imbalanced Class ◽

Program Data

Ketidakseimbangan kelas seringkali menjadi masalah di berbagai set data dunia nyata, di mana satu kelas (yaitu kelas minoritas) berisi sejumlah kecil titik data dan yang lainnya (yaitu kelas mayoritas) berisi sejumlah besar titik data. Sangat sulit untuk mengembangkan model yang efektif dengan menggunakan data mining dan algoritma machine learning tanpa mempertimbangkan preprocessing data untuk menyeimbangkan set data yang tidak seimbang. Random undersampling dan oversampling telah digunakan dalam banyak penelitian untuk memastikan bahwa kelas yang berbeda mengandung jumlah titik data yang sama. Dalam penelitian ini, kami mengusulkan kombinasi two-step clustering-based random undersampling dan bagging technique untuk meningkatkan nilai akurasi software defect prediction. Metode yang diusulkan dievaluasi menggunakan lima set data dari repositori program data metrik NASA dan area under the curve (AUC) sebagai evaluasi utama. Hasil telah menunjukkan bahwa metode yang diusulkan menghasilkan kinerja yang sangat baik untuk semua dataset (AUC> 0,9). Dalam hal SN, percobaan kedua mengungguli percobaan pertama di hampir semua dataset (3 dari 5 dataset). Sementara itu, dalam hal SP, percobaan pertama tidak mengungguli percobaan kedua di semua dataset. Secara keseluruhan percobaan kedua mengungguli dan lebih baik daripada percobaan pertama karena evaluasi utama dalam klasifikasi kelas yang tidak seimbang seperti SDP adalah AUC Oleh karena itu, dapat disimpulkan bahwa metode yang diusulkan menghasilkan kinerja yang optimal baik untuk set data skala kecil maupun besar.

Download Full-text

Investigation of Software Defect Prediction Using Data Mining Framework

Research Journal of Applied Sciences Engineering and Technology ◽

10.19026/rjaset.11.1676 ◽

2015 ◽

Vol 11 (1) ◽

pp. 63-69 ◽

Cited By ~ 1

Author(s):

M. Anbu ◽

G.S. Anandha Mala

Keyword(s):

Data Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Using Data

Download Full-text

Crime analysis in India using data mining techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.6.10779 ◽

2018 ◽

Vol 7 (2.6) ◽

pp. 253 ◽

Cited By ~ 2

Author(s):

Deepika K K ◽

Smitha Vinod

Keyword(s):

Data Mining ◽

Neural Networks ◽

Random Forest ◽

Crime Analysis ◽

Random Forest Algorithm ◽

Indian States ◽

Data Mining Techniques ◽

Crime Data ◽

Crime Detection ◽

Using Data

An approach for crime detection in India using Data mining techniques is proposed in this paper. The approach consists of the following steps - Data pre-processing, clustering, classification and visualization. Data mining techniques are often applied to Criminology as it provides good results. Criminology is a field which studies about various crime characteristics. Analyzing crime data means exploring crime data. Crime is identified using k-means clustering and the clusters are formed based on the similarity of the crime attributes. The Random Forest algorithm and Neural networks are applied on the data for classification. Visualization is achieved using the Google marker clustering and the crime spots are marked on the India map. The accuracy is verified using WEKA tool. This approach will benefit the Crime department of India in analyzing crime with better prediction. The paper focuses on the crime analysis of various Indian states and union territories during 2001 to 2012.

Download Full-text

A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms

Journal of Innovative Science and Engineering (JISE) ◽

10.38088/jise.693098 ◽

2020 ◽

pp. 11-21

Author(s):

Zeynep Behrin GÜVEN AYDIN ◽

Rüya ŞAMLI

Keyword(s):

Data Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Data Mining Algorithms ◽

Software Defect ◽

Using Data ◽

Mining Algorithms

Download Full-text