scholarly journals Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1557
Author(s):  
Zne-Jung Lee ◽  
Chou-Yuan Lee ◽  
Li-Yun Chang ◽  
Natsuki Sano

To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster k. Moreover, the clustering methods of k-means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.

Data analytics has grown in a machine learning context. Whatever the reason data is used or exploited, customer segmentation or marketing targeting, it must be processed first and represented on feature vectors. Many algorithms, such as clustering, regression, classification, and others, need to be represented and clarified in order to facilitate processing and statistical analysis. If we have seen, through the previous chapters, the importance of big data analysis (the Why?), as with every major innovation, the biggest confusion lies in the exact scope (What?) and its implementation (How?). In this chapter, we will take a look at the different algorithms and techniques analytics that we can use in order to exploit the large amounts of data.


Author(s):  
Mohammed Al Zobbi ◽  
Belal Alsinglawi ◽  
Omar Mubin ◽  
Fady Alnajjar

Coronavirus Disease 2019 (COVID-19) has affected day to day life and slowed down the global economy. Most countries are enforcing strict quarantine to control the havoc of this highly contagious disease. Since the outbreak of COVID-19, many data analyses have been done to provide close support to decision-makers. We propose a method comprising data analytics and machine learning classification for evaluating the effectiveness of lockdown regulations. Lockdown regulations should be reviewed on a regular basis by governments, to enable reasonable control over the outbreak. The model aims to measure the efficiency of lockdown procedures for various countries. The model shows a direct correlation between lockdown procedures and the infection rate. Lockdown efficiency is measured by finding a correlation coefficient between lockdown attributes and the infection rate. The lockdown attributes include retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, residential, and schools. Our results show that combining all the independent attributes in our study resulted in a higher correlation (0.68) to the dependent value Interquartile 3 (Q3). Mean Absolute Error (MAE) was found to be the least value when combining all attributes.


2013 ◽  
Vol 3 (2) ◽  
pp. 40-57
Author(s):  
Shigeaki Sakurai

This paper introduces knowledge discovery methods based on inductive learning techniques from textual data. The author argues three methods extracting features of the textual data. First one activates a key concept dictionary, second one does a key phrase pattern dictionary, and third one does a named entity extractor. These features are used in order to generate rules representing relationships between the features and text classes. The rules are described in the format of a fuzzy decision tree. Also, these features are used in order to acquire a classification model based on SVM (Support Vector Machine). The model can classify new textual data into the text classes with high classification accuracy. Lastly, this paper introduces two application tasks based on these methods and verifies the effect of the methods.


Author(s):  
Adiraju Prashantha Rao

As the speed of information growth exceeds in this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. Data analytic is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is about discovering knowledge from large volumes data and applying it to the business. Machine learning is ideal for exploiting the opportunities hidden in big data. This chapter able to discover and display the patterns buried in the data using machine learning.


Web Services ◽  
2019 ◽  
pp. 684-700
Author(s):  
Adiraju Prashantha Rao

As the speed of information growth exceeds in this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. Data analytic is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is about discovering knowledge from large volumes data and applying it to the business. Machine learning is ideal for exploiting the opportunities hidden in big data. This chapter able to discover and display the patterns buried in the data using machine learning.


2009 ◽  
pp. 201-217
Author(s):  
Malcolm J. Beynon

This chapter considers the role of fuzzy decision trees as a tool for intelligent data analysis in domestic travel research. It demonstrates the readability and interpretability the findings from fuzzy decision tree analysis can pertain, first presented in a small problem allowing the fullest opportunity for the analysis to be followed. The investigation of the traffic fatalities in the states of the US offers an example of a more comprehensive fuzzy decision tree analysis. The graphical representations of the fuzzy based membership functions show how the necessary linguistic terms are defined. The final fuzzy decision trees, both tutorial and US traffic fatalities based, show the structured form the analysis offers, as well as more readable decision rules contained therein.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Zhu Gu ◽  
Chaohu He

After the reform and the opening, the economy of our country has developed rapidly, and the living conditions of the people have become better and better. As a result, they have a lot of time to pay attention to their health, which has promoted the rapid development of the sports and fitness industry in my country. In response to the increasing development of the sports and fitness sector of my country, the current state of the administration of members of the sports fitness industry does not keep pace with the development of the sports and fitness industry of my country. Based on this, this article uses a fuzzy decision tree algorithm to establish a decision tree based on the characteristics of customer data and loses existing customers. Analyzing the situation is of strategic significance for improving the competitiveness of the club. This article selects the 7 most commonly used data sets from the UCI data set as the initial experimental data for model training in three different formats and then uses the data of a specific club member to conduct experiments, using these data files as training samples to construct a vague analysis of the decision tree to overturn the customer to analyze the main factors of customer change. Experiments show that the fuzzy decision tree ID3 algorithm based on mobile computing has the highest accuracy in the Iris data set, reaching 97.8%, and the accuracy rate in the Wine data set is the smallest, only 65.2%. The mobile computing-based fuzzy decision tree ID3 algorithm proposed in this paper obtained the highest correct rate (86.32%). This shows that, compared to traditional analysis methods, the blurred decision tree obtained for churn client analysis has the advantages of high classification accuracy and is understandable so that ideal classification accuracy can be achieved when the tree is small.


Sign in / Sign up

Export Citation Format

Share Document