scholarly journals Use of Data Mining for Prediction of Customer Loyalty

Author(s):  
Andri Wijaya ◽  
Abba Suganda Girsang

This  article  discusses  the  analysis  of  customer  loyalty  using  three  data  mining  methods:  C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world  empirical  data.  The  data  contain  ten  attributes related to the customer loyalty and are obtained from a national  multimedia  company  in  Indonesia.  The  dataset contains 2269 records. The study also evaluates the effects of  the  size  of  the  training  data  to  the  accuracy  of  the classification.  The  results  suggest  that  C4.5  algorithm produces   highest classification   accuracy   at   the   order of  81%  followed  by  the  methods  of  Naive  Bayes  76% and  Nearest  Neighbor  55%.  In  addition,  the  numerical evaluation  also  suggests  that  the  proportion  of  80%  is optimal  for  the  training  set.

2021 ◽  
Vol 3 (2) ◽  
pp. 140-148
Author(s):  
Hermanto Hermanto

Currently, the problem of college failure, its on-time graduation, and the factors that cause it is still an interesting research topic (C. Marquez-Vera, C. Romero and S. Ventura, 2011). This study compares three data mining classification algorithms namely Naive Bayes, Decision Tree and K-Nearest Neighbor to predict graduation and dropout risk for students to improve the quality of higher education and the most accurate algorithms to use Prepare graduation and dropout prediction Student studies. The best algorithm for predicting graduation and dropout is the decision tree with the best accuracy value of 99.15% with a training data ratio of 30%. Keyword : Data Mining; Algoritma Naive Bayes; Decision Tree; K-Nearest Neighbor; Predict Graduation; Drop Out.


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


2019 ◽  
Vol 7 (1) ◽  
pp. 7-16
Author(s):  
Sidik Rahmatullah

 Lulusan adalah status yang dicapai mahasiswa setelah menyelesaikan proses pendidikan sesuai dengan persyaratan kelulusan yang ditetapkan oleh program studi. Sebagai salah satu keluaran langsung dari proses pendidikan yang dilakukan oleh program studi, lulusan yang bermutu memiliki ciri penguasaan kompetensi akademik termasuk hard skills dan soft skills sebagaimana dinyatakan dalam sasaran mutu serta dibuktikan dengan kinerja lulusan di masyarakat sesuai dengan profesi dan bidang ilmu. Program studi yang bermutu memiliki sistem pengelolaan lulusan yang baik sehingga mampu menjadikannya sebagai human capital bagi progam studi yang bersangkutan.  Penelitian ini menggunakan metode data mining yang digunakan untuk memprediksi tingkat kelulusan mahasiswa menggunakan dua metode yaitu Naive Bayes dan K-Nearest Neighbor. Hasil dari penelitian ini dapat memprediksi mahasiswa tepat lulus atau terlambat. Uji coba dilakukan dengan menggunakan data lulusan mahasiswa S1 Sistem informasi STMIK Dian Cipta Cendikia Kotabumi  sebanyak 600 data untuk training dan 180 data untuk testing. Hasil uji coba menunjukkan bahwa dengan menggunakan Naive Bayes menghasilkan akurasi  sebesar 85%, sedangkan menggunakan algoritma K-nearest neighbor menghasilkan akurasi sebesar 68.89 %.


2019 ◽  
Vol 8 (4) ◽  
pp. 8685-8688

Dengue is a life threatening disease in all the developed countries like India. This is a virus borne disease caused by breeding of Aedesmosquito. Dengue is caused by female mosquitoes. A predictive system which can identify and minime the loss due to this problem can be constructed Datasets used is here the body temperature ,vomiting,metallic taste,joint pain etc.. the main objective ofthis paper is to classify data and to identify the maximum accuracy to predict the dengue fever using description like yes /no. So the classification techniques used here is Bayes classification ,nearest neighbor (knn),naïve bayes,rule bayes,id3,and decision tree .from the classified algorithms Naïve bayes had occurred maximum accuracy of 72%.Rapid miner is the data mining tool used to classify the data mining techniques.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Ni Luh Ratniasih

ABSTRACT<br />Presentation of data to produce information values is often displayed in the form of tabulations. If the data displayed has a small capacity, it may not be difficult to process the information. But if the data presented has a very large capacity, it is feared there are obstacles to absorbing information accurately and quickly. This is because that it takes a long time to read the data displayed in detail until the end of the data. The data to be discussed in this study are data of STMIK STIKOM Bali students. Historical data displayed will be converted into a decision tree. Thus the absorption of information will become easier. This research implements data mining disciplines using the naïve bayes method comparison with C4.5 algorithm which is a method for performing classification techniques and applied with Rapid Miner tools.<br />Keywords : C4.5, KNN, Student Graduation<br />ABSTRAK<br />Penyajian data untuk menghasilkan nilai informasi sering kali ditampilkan dalam bentuk tabulasi. Apabila data yang ditampilkan memiliki kapasitas kecil, mungkin tidak terlalu sulit untuk mencerna kandungan informasi tersebut. Tetapi apabila data yang disajikan memiliki kapasitas yang sangat besar, dikawatirkan adanya kendala untuk menyerap informasi secara tepat dan cepat. Hal ini dikarenakan bahwa dibutuhkan waktu yang cukup lama untuk membaca data yang ditampilkan secara rinci hingga akhir data. Data yang akan dibahas dalam penelitian ini adalah data mahasiswa STMIK STIKOM Bali. Data historis yang ditampilkan akan dikonversi menjadi bentuk pohon keputusan. Dengan demikian penyerapan informasi akan menjadi lebih mudah. Penelitian ini mengimplemen-tasikan disiplin ilmu data mining menggunakan komparasi metode naïve bayes dengan algoritma C4.5 yang merupakan sebuah metode untuk melakukan teknik klasifikasi serta diaplikasikan dengan tools Rapid Miner.<br />Kata kunci : C4.5, KNN, Kelulusan Mahasiswa


Tech-E ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 44
Author(s):  
Rino Rino

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.


2020 ◽  
Vol 5 (1) ◽  
pp. 126-135
Author(s):  
Affandy Affandy ◽  
Oktania Nandiyati

Twitter is the most popular microblogging service in Indonesia, with nearly 23 million users. In the era of big data such as the current tweets from customers, observers, potential consumers, or the community of users of products or services of a company will greatly help companies in knowing the industrial and consumer landscape, so as to determine strategic plans that will contribute to the company's growth. However, the use of data from social media such as Twitter is hampered by a number of technical difficulties in the process of collecting, processing, and analysing. Specifically, this research applies the Naïve Bayes Classifier algorithm in the process of sentiment analysis of tweets data into a prototype application that is intended to make it easier for companies / organizations to know people's perceptions of their products or services. The NBC algorithm was chosen because this algorithm is able to do a good classification even though it uses small training data, but has high accuracy and process speed for handling large training data. From the evaluation results found a prototype running well where the keywords entered will trigger the Twitter API to crawl the data then the mining process can be monitored at each stage and at the end of the process, the system will show the final sentiment level values and the representation of the calculation results log in a chart form over a certain period of time.


2016 ◽  
Vol 7 (4) ◽  
Author(s):  
Mochammad Yusa ◽  
Ema Utami ◽  
Emha T. Luthfi

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi


2021 ◽  
Vol 5 (2) ◽  
pp. 640
Author(s):  
Mulkan Azhari ◽  
Zakaria Situmorang ◽  
Rika Rosnelly

In this study aims to compare the performance of several classification algorithms namely C4.5, Random Forest, SVM, and naive bayes. Research data in the form of JISC participant data amounting to 200 data. Training data amounted to 140 (70%) and testing data amounted to 60 (30%). Classification simulation using data mining tools in the form of rapidminer. The results showed that . In the C4.5 algorithm obtained accuracy of 86.67%. Random Forest algorithm obtained accuracy of 83.33%. In SVM algorithm obtained accuracy of 95%. Naive Bayes' algorithm obtained an accuracy of 86.67%. The highest algorithm accuracy is in SVM algorithm and the smallest is in random forest algorithm


Author(s):  
Amir Ahmad ◽  
Hamza Abujabal ◽  
C. Aswani Kumar

A classifier ensemble is a combination of diverse and accurate classifiers. Generally, a classifier ensemble performs better than any single classifier in the ensemble. Naive Bayes classifiers are simple but popular classifiers for many applications. As it is difficult to create diverse naive Bayes classifiers, naive Bayes ensembles are not very successful. In this paper, we propose Random Subclasses (RS) ensembles for Naive Bayes classifiers. In the proposed method, new subclasses for each class are created by using 1-Nearest Neighbor (1-NN) framework that uses randomly selected points from the training data. A classifier considers each subclass as a class of its own. As the method to create subclasses is random, diverse datasets are generated. Each classifier in an ensemble learns on one dataset from the pool of diverse datasets. Diverse training datasets ensure diverse classifiers in the ensemble. New subclasses create easy to learn decision boundaries that in turn create accurate naive Bayes classifiers. We developed two variants of RS, in the first variant RS(2), two subclasses per class were created whereas in the second variant RS(4), four subclasses per class were created. We studied the performance of these methods against other popular ensemble methods by using naive Bayes as the base classifier. RS(4) outperformed other popular ensemble methods. A detailed study was carried out to understand the behavior of RS ensembles.


Sign in / Sign up

Export Citation Format

Share Document