Use of Data Mining for Prediction of Customer Loyalty

Andri Wijaya; Abba Suganda Girsang

doi:10.21512/commit.v10i1.1660

Use of Data Mining for Prediction of Customer Loyalty

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v10i1.1660 ◽

2015 ◽

Vol 10 (1) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

Andri Wijaya ◽

Abba Suganda Girsang

Keyword(s):

Data Mining ◽

Customer Loyalty ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Training Set ◽

Use Of Data ◽

C4.5 Algorithm

This article discusses the analysis of customer loyalty using three data mining methods: C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4.5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set.

Download Full-text

Prediksi Kelulusan dan Putus Studi Mahasiswa dengan Pendekatan Bertingkat pada Perguruan Tinggi

SIMADA (Jurnal Sistem Informasi & Manajemen Basis Data) ◽

10.30873/simada.v3i2.2359 ◽

2021 ◽

Vol 3 (2) ◽

pp. 140-148

Author(s):

Hermanto Hermanto

Keyword(s):

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Drop Out ◽

Training Data ◽

K Nearest Neighbor ◽

Quality Of Higher Education

Currently, the problem of college failure, its on-time graduation, and the factors that cause it is still an interesting research topic (C. Marquez-Vera, C. Romero and S. Ventura, 2011). This study compares three data mining classification algorithms namely Naive Bayes, Decision Tree and K-Nearest Neighbor to predict graduation and dropout risk for students to improve the quality of higher education and the most accurate algorithms to use Prepare graduation and dropout prediction Student studies. The best algorithm for predicting graduation and dropout is the decision tree with the best accuracy value of 99.15% with a training data ratio of 30%. Keyword : Data Mining; Algoritma Naive Bayes; Decision Tree; K-Nearest Neighbor; Predict Graduation; Drop Out.

Download Full-text

Performance of Naïve Bayes, C4.5 and KNN using Breast Cancer, Iris and Hypothyroid Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8795.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2193-2197

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Specific Pattern ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Digital Format ◽

Tree Classifier

Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.

Download Full-text

PREDIKSI TINGKAT KELULUSAN TEPAT WAKTU DENGAN METODE NAÏVE BAYES DAN K-NEAREST NEIGHBOR

Jurnal Informasi dan Komputer ◽

10.35959/jik.v7i1.118 ◽

2019 ◽

Vol 7 (1) ◽

pp. 7-16

Author(s):

Sidik Rahmatullah

Keyword(s):

Data Mining ◽

Human Capital ◽

Nearest Neighbor ◽

Naive Bayes ◽

Soft Skills ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Hard Skills

Lulusan adalah status yang dicapai mahasiswa setelah menyelesaikan proses pendidikan sesuai dengan persyaratan kelulusan yang ditetapkan oleh program studi. Sebagai salah satu keluaran langsung dari proses pendidikan yang dilakukan oleh program studi, lulusan yang bermutu memiliki ciri penguasaan kompetensi akademik termasuk hard skills dan soft skills sebagaimana dinyatakan dalam sasaran mutu serta dibuktikan dengan kinerja lulusan di masyarakat sesuai dengan profesi dan bidang ilmu. Program studi yang bermutu memiliki sistem pengelolaan lulusan yang baik sehingga mampu menjadikannya sebagai human capital bagi progam studi yang bersangkutan. Penelitian ini menggunakan metode data mining yang digunakan untuk memprediksi tingkat kelulusan mahasiswa menggunakan dua metode yaitu Naive Bayes dan K-Nearest Neighbor. Hasil dari penelitian ini dapat memprediksi mahasiswa tepat lulus atau terlambat. Uji coba dilakukan dengan menggunakan data lulusan mahasiswa S1 Sistem informasi STMIK Dian Cipta Cendikia Kotabumi sebanyak 600 data untuk training dan 180 data untuk testing. Hasil uji coba menunjukkan bahwa dengan menggunakan Naive Bayes menghasilkan akurasi sebesar 85%, sedangkan menggunakan algoritma K-nearest neighbor menghasilkan akurasi sebesar 68.89 %.

Download Full-text

Dengue Fever Prediction using Datamining Classification Technique

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8810.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 8685-8688

Keyword(s):

Data Mining ◽

Dengue Fever ◽

Nearest Neighbor ◽

Naive Bayes ◽

Developed Countries ◽

Naïve Bayes ◽

The Body ◽

Maximum Accuracy ◽

Life Threatening ◽

The Developed Countries

Dengue is a life threatening disease in all the developed countries like India. This is a virus borne disease caused by breeding of Aedesmosquito. Dengue is caused by female mosquitoes. A predictive system which can identify and minime the loss due to this problem can be constructed Datasets used is here the body temperature ,vomiting,metallic taste,joint pain etc.. the main objective ofthis paper is to classify data and to identify the maximum accuracy to predict the dengue fever using description like yes /no. So the classification techniques used here is Bayes classification ,nearest neighbor (knn),naïve bayes,rule bayes,id3,and decision tree .from the classified algorithms Naïve bayes had occurred maximum accuracy of 72%.Rapid miner is the data mining tool used to classify the data mining techniques.

Download Full-text

OPTIMASI DATA MINING MENGGUNAKAN ALGORITMA NAÏVE BAYES DAN C4.5 UNTUK KLASIFIKASI KELULUSAN MAHASISWA

Jurnal Teknologi Informasi dan Komputer ◽

10.36002/jutik.v5i1.634 ◽

2019 ◽

Vol 5 (1) ◽

Author(s):

Ni Luh Ratniasih

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Method Comparison ◽

Naïve Bayes ◽

Bayes Method ◽

C4.5 Algorithm ◽

Long Time ◽

Student Graduation ◽

Small Capacity ◽

Information Values

ABSTRACT Presentation of data to produce information values is often displayed in the form of tabulations. If the data displayed has a small capacity, it may not be difficult to process the information. But if the data presented has a very large capacity, it is feared there are obstacles to absorbing information accurately and quickly. This is because that it takes a long time to read the data displayed in detail until the end of the data. The data to be discussed in this study are data of STMIK STIKOM Bali students. Historical data displayed will be converted into a decision tree. Thus the absorption of information will become easier. This research implements data mining disciplines using the naïve bayes method comparison with C4.5 algorithm which is a method for performing classification techniques and applied with Rapid Miner tools. Keywords : C4.5, KNN, Student Graduation ABSTRAK Penyajian data untuk menghasilkan nilai informasi sering kali ditampilkan dalam bentuk tabulasi. Apabila data yang ditampilkan memiliki kapasitas kecil, mungkin tidak terlalu sulit untuk mencerna kandungan informasi tersebut. Tetapi apabila data yang disajikan memiliki kapasitas yang sangat besar, dikawatirkan adanya kendala untuk menyerap informasi secara tepat dan cepat. Hal ini dikarenakan bahwa dibutuhkan waktu yang cukup lama untuk membaca data yang ditampilkan secara rinci hingga akhir data. Data yang akan dibahas dalam penelitian ini adalah data mahasiswa STMIK STIKOM Bali. Data historis yang ditampilkan akan dikonversi menjadi bentuk pohon keputusan. Dengan demikian penyerapan informasi akan menjadi lebih mudah. Penelitian ini mengimplemen-tasikan disiplin ilmu data mining menggunakan komparasi metode naïve bayes dengan algoritma C4.5 yang merupakan sebuah metode untuk melakukan teknik klasifikasi serta diaplikasikan dengan tools Rapid Miner. Kata kunci : C4.5, KNN, Kelulusan Mahasiswa

Download Full-text

The Comparison of Data Mining Methods Using C4.5 Algorithm and Naive Bayes in Predicting Heart Disease

Tech-E ◽

10.31253/te.v4i2.543 ◽

2021 ◽

Vol 4 (2) ◽

pp. 44

Author(s):

Rino Rino

Keyword(s):

Data Mining ◽

Heart Disease ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set ◽

A Value ◽

C4.5 Algorithm ◽

Calculation Results ◽

Mining Methods ◽

Bayes Algorithm

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.

Download Full-text

Sentiment Analysis Berbasis Algoritma Naïve Bayes Classsifier untuk Identifikasi Persepsi Masyarakat Terhadap Produk / Layanan Perusahaan

JOINS (Journal of Information System) ◽

10.33633/joins.v5i1.3608 ◽

2020 ◽

Vol 5 (1) ◽

pp. 126-135

Author(s):

Affandy Affandy ◽

Oktania Nandiyati

Keyword(s):

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Bayes Classifier ◽

Use Of Data ◽

Strategic Plans ◽

Good Classification ◽

Calculation Results ◽

A Company

Twitter is the most popular microblogging service in Indonesia, with nearly 23 million users. In the era of big data such as the current tweets from customers, observers, potential consumers, or the community of users of products or services of a company will greatly help companies in knowing the industrial and consumer landscape, so as to determine strategic plans that will contribute to the company's growth. However, the use of data from social media such as Twitter is hampered by a number of technical difficulties in the process of collecting, processing, and analysing. Specifically, this research applies the Naïve Bayes Classifier algorithm in the process of sentiment analysis of tweets data into a prototype application that is intended to make it easier for companies / organizations to know people's perceptions of their products or services. The NBC algorithm was chosen because this algorithm is able to do a good classification even though it uses small training data, but has high accuracy and process speed for handling large training data. From the evaluation results found a prototype running well where the keywords entered will trigger the Twitter API to crawl the data then the mining process can be monitored at each stage and at the end of the process, the system will show the final sentiment level values and the representation of the calculation results log in a chart form over a certain period of time.

Download Full-text

Analisis Komparatif Evaluasi Performa Algoritma Klasifikasi pada Readmisi Pasien Diabetes

Jurnal Buana Informatika ◽

10.24002/jbi.v7i4.770 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Mochammad Yusa ◽

Ema Utami ◽

Emha T. Luthfi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Kappa Statistic ◽

Naïve Bayes ◽

Validation Dataset ◽

K Nearest Neighbor ◽

Fold Cross Validation

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi

Download Full-text

Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2937 ◽

2021 ◽

Vol 5 (2) ◽

pp. 640

Author(s):

Mulkan Azhari ◽

Zakaria Situmorang ◽

Rika Rosnelly

Keyword(s):

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Svm Algorithm ◽

Testing Data ◽

C4.5 Algorithm ◽

Bayes Algorithm ◽

Using Data

In this study aims to compare the performance of several classification algorithms namely C4.5, Random Forest, SVM, and naive bayes. Research data in the form of JISC participant data amounting to 200 data. Training data amounted to 140 (70%) and testing data amounted to 60 (30%). Classification simulation using data mining tools in the form of rapidminer. The results showed that . In the C4.5 algorithm obtained accuracy of 86.67%. Random Forest algorithm obtained accuracy of 83.33%. In SVM algorithm obtained accuracy of 95%. Naive Bayes' algorithm obtained an accuracy of 86.67%. The highest algorithm accuracy is in SVM algorithm and the smallest is in random forest algorithm

Download Full-text

Random Subclasses Ensembles by Using 1-Nearest Neighbor Framework

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500318 ◽

2017 ◽

Vol 31 (10) ◽

pp. 1750031

Author(s):

Amir Ahmad ◽

Hamza Abujabal ◽

C. Aswani Kumar

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Ensemble Methods ◽

Naïve Bayes ◽

Training Data ◽

Classifier Ensemble ◽

Base Classifier ◽

Decision Boundaries ◽

Better Than

A classifier ensemble is a combination of diverse and accurate classifiers. Generally, a classifier ensemble performs better than any single classifier in the ensemble. Naive Bayes classifiers are simple but popular classifiers for many applications. As it is difficult to create diverse naive Bayes classifiers, naive Bayes ensembles are not very successful. In this paper, we propose Random Subclasses (RS) ensembles for Naive Bayes classifiers. In the proposed method, new subclasses for each class are created by using 1-Nearest Neighbor (1-NN) framework that uses randomly selected points from the training data. A classifier considers each subclass as a class of its own. As the method to create subclasses is random, diverse datasets are generated. Each classifier in an ensemble learns on one dataset from the pool of diverse datasets. Diverse training datasets ensure diverse classifiers in the ensemble. New subclasses create easy to learn decision boundaries that in turn create accurate naive Bayes classifiers. We developed two variants of RS, in the first variant RS(2), two subclasses per class were created whereas in the second variant RS(4), four subclasses per class were created. We studied the performance of these methods against other popular ensemble methods by using naive Bayes as the base classifier. RS(4) outperformed other popular ensemble methods. A detailed study was carried out to understand the behavior of RS ensembles.

Download Full-text