The Comparison of Decision Tree Based Insurance Churn Prediction between Spark ML and SPSS

Author(s):  
Wei Zhang ◽  
Tong Mo ◽  
Weiping Li ◽  
Hanyu Huang ◽  
Xiaogang Tian
2017 ◽  
Vol 117 (1) ◽  
pp. 90-109 ◽  
Author(s):  
Eui-Bang Lee ◽  
Jinwha Kim ◽  
Sang-Gun Lee

Purpose The purpose of this paper is to identify the influence of the frequency of word exposure on online news based on the availability heuristic concept. So that this is different from most churn prediction studies that focus on subscriber data. Design/methodology/approach This study examined the churn prediction through words presented the previous studies and additionally identified words what churn generate using data mining technology in combination with logistic regression, decision tree graphing, neural network models, and a partial least square (PLS) model. Findings This study found prediction rates similar to those delivered by subscriber data-based analyses. In addition, because previous studies do not clearly suggest the effects of the factors, this study uses decision tree graphing and PLS modeling to identify which words deliver positive or negative influences. Originality/value These findings imply an expansion of churn prediction, advertising effect, and various psychological studies. It also proposes concrete ideas to advance the competitive advantage of companies, which not only helps corporate development, but also improves industry-wide efficiency.


2018 ◽  
Vol 7 (3.34) ◽  
pp. 291
Author(s):  
M Malleswari ◽  
R.J Manira ◽  
Praveen Kumar ◽  
Murugan .

 Big data analytics has been the focus for large scale data processing. Machine learning and Big data has great future in prediction. Churn prediction is one of the sub domain of big data. Preventing customer attrition especially in telecom is the advantage of churn prediction.  Churn prediction is a day-to-day affair involving millions. So a solution to prevent customer attrition can save a lot. This paper propose to do comparison of three machine learning techniques Decision tree algorithm, Random Forest algorithm and Gradient Boosted tree algorithm using Apache Spark. Apache Spark is a data processing engine used in big data which provides in-memory processing so that the processing speed is higher. The analysis is made by extracting the features of the data set and training the model. Scala is a programming language that combines both object oriented and functional programming and so a powerful programming language. The analysis is implemented using Apache Spark and modelling is done using scala ML. The accuracy of Decision tree model came out as 86%, Random Forest model is 87% and Gradient Boosted tree is 85%. 


Author(s):  
Ait Daoud Rachid ◽  
Amine Abdellah ◽  
Bouikhalene Belaid ◽  
Lbibb Rachid

<p><span>With the growth of the e-commerce sector, customers have more choices, a fact which encourages them to divide their purchases amongst several e-commerce sites and compare their competitors’ products, yet this increases high risks of churning. A review of the literature on customer churning models reveals that no prior research had considered both partial and total defection in non-contractual online environments. Instead, they focused either on a total or partial defect. This study proposes a customer churn prediction model in an e-commerce context, wherein a clustering phase is based on the integration of the k-means method and the Length-Recency-Frequency-Monetary (LRFM) model. This phase is employed to define churn followed by a multi-class prediction phase based on three classification techniques: Simple decision tree, Artificial neural networks and Decision tree ensemble, in which the dependent variable classifies a particular customer into a customer continuing loyal buying patterns (Non-churned), a partial defector (Partially-churned), and a total defector (Totally-churned). Macro-averaging measures including average accuracy, macro-average of Precision, Recall, and F-1 are used to evaluate classifiers’ performance on 10-fold cross validation. Using real data from an online store, the results show the efficiency of decision tree ensemble model over the other models in identifying both future partial and total defection.</span></p>


Sign in / Sign up

Export Citation Format

Share Document