Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Zne-Jung Lee; Chou-Yuan Lee; Li-Yun Chang; Natsuki Sano

doi:10.3390/sym13091557

Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Symmetry ◽

10.3390/sym13091557 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1557

Author(s):

Zne-Jung Lee ◽

Chou-Yuan Lee ◽

Li-Yun Chang ◽

Natsuki Sano

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Data Analytics ◽

Decision Rules ◽

Customer Segmentation ◽

Decision Makers ◽

Feature Engineering ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Clustering And Classification

To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster k. Moreover, the clustering methods of k-means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.

Download Full-text

Techniques and Methods That Help to Make Big Data the Simplest Recipe for Success

Big Data Analytics for Entrepreneurial Success - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-5225-7609-9.ch006 ◽

2019 ◽

pp. 161-194

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Statistical Analysis ◽

Data Analytics ◽

Big Data Analysis ◽

Customer Segmentation ◽

Learning Context ◽

Feature Vectors

Data analytics has grown in a machine learning context. Whatever the reason data is used or exploited, customer segmentation or marketing targeting, it must be processed first and represented on feature vectors. Many algorithms, such as clustering, regression, classification, and others, need to be represented and clarified in order to facilitate processing and statistical analysis. If we have seen, through the previous chapters, the importance of big data analysis (the Why?), as with every major innovation, the biggest confusion lies in the exact scope (What?) and its implementation (How?). In this chapter, we will take a look at the different algorithms and techniques analytics that we can use in order to exploit the large amounts of data.

Download Full-text

Measurement Method for Evaluating the Lockdown Policies during the COVID-19 Pandemic

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17155574 ◽

2020 ◽

Vol 17 (15) ◽

pp. 5574 ◽

Cited By ~ 2

Author(s):

Mohammed Al Zobbi ◽

Belal Alsinglawi ◽

Omar Mubin ◽

Fady Alnajjar

Keyword(s):

Machine Learning ◽

Infection Rate ◽

Global Economy ◽

Data Analytics ◽

Measurement Method ◽

Mean Absolute Error ◽

Absolute Error ◽

Decision Makers ◽

Machine Learning Classification ◽

Data Analyses

Coronavirus Disease 2019 (COVID-19) has affected day to day life and slowed down the global economy. Most countries are enforcing strict quarantine to control the havoc of this highly contagious disease. Since the outbreak of COVID-19, many data analyses have been done to provide close support to decision-makers. We propose a method comprising data analytics and machine learning classification for evaluating the effectiveness of lockdown regulations. Lockdown regulations should be reviewed on a regular basis by governments, to enable reasonable control over the outbreak. The model aims to measure the efficiency of lockdown procedures for various countries. The model shows a direct correlation between lockdown procedures and the infection rate. Lockdown efficiency is measured by finding a correlation coefficient between lockdown attributes and the infection rate. The lockdown attributes include retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, residential, and schools. Our results show that combining all the independent attributes in our study resulted in a higher correlation (0.68) to the dependent value Interquartile 3 (Q3). Mean Absolute Error (MAE) was found to be the least value when combining all attributes.

Download Full-text

Analysis of Textual Data Based on Inductive Learning Techniques

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2013040103 ◽

2013 ◽

Vol 3 (2) ◽

pp. 40-57

Author(s):

Shigeaki Sakurai

Keyword(s):

Classification Accuracy ◽

Inductive Learning ◽

Classification Model ◽

Support Vector ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Named Entity ◽

High Classification Accuracy ◽

Learning Techniques ◽

Textual Data

This paper introduces knowledge discovery methods based on inductive learning techniques from textual data. The author argues three methods extracting features of the textual data. First one activates a key concept dictionary, second one does a key phrase pattern dictionary, and third one does a named entity extractor. These features are used in order to generate rules representing relationships between the features and text classes. The rules are described in the format of a fuzzy decision tree. Also, these features are used in order to acquire a classification model based on SVM (Support Vector Machine). The model can classify new textual data into the text classes with high classification accuracy. Lastly, this paper introduces two application tasks based on these methods and verifies the effect of the methods.

Download Full-text

Discovering Knowledge Hidden in Big Data from Machine-Learning Techniques

Advances in Data Mining and Database Management - Web Data Mining and the Development of Knowledge-Based Decision Support Systems ◽

10.4018/978-1-5225-1877-8.ch010 ◽

2017 ◽

pp. 167-183 ◽

Cited By ~ 1

Author(s):

Adiraju Prashantha Rao

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Decision Makers ◽

Machine Learning Techniques ◽

Human Beings ◽

Raw Data ◽

Learning Techniques ◽

Data Analytic ◽

Information Growth

As the speed of information growth exceeds in this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. Data analytic is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is about discovering knowledge from large volumes data and applying it to the business. Machine learning is ideal for exploiting the opportunities hidden in big data. This chapter able to discover and display the patterns buried in the data using machine learning.

Download Full-text

Discovering Knowledge Hidden in Big Data From Machine-Learning Techniques

Web Services ◽

10.4018/978-1-5225-7501-6.ch037 ◽

2019 ◽

pp. 684-700

Author(s):

Adiraju Prashantha Rao

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Decision Makers ◽

Machine Learning Techniques ◽

Human Beings ◽

Raw Data ◽

Learning Techniques ◽

Data Analytic ◽

Information Growth

Download Full-text

A Fuzzy Decision Tree Analysis of Traffic Fatalities in the US

Intelligent Data Analysis ◽

10.4018/978-1-59904-982-3.ch012 ◽

2009 ◽

pp. 201-217

Author(s):

Malcolm J. Beynon

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Decision Rules ◽

Decision Tree Analysis ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Traffic Fatalities ◽

Tree Analysis ◽

The Us ◽

Fuzzy Decision Trees

This chapter considers the role of fuzzy decision trees as a tool for intelligent data analysis in domestic travel research. It demonstrates the readability and interpretability the findings from fuzzy decision tree analysis can pertain, first presented in a small problem allowing the fullest opportunity for the analysis to be followed. The investigation of the traffic fatalities in the states of the US offers an example of a more comprehensive fuzzy decision tree analysis. The graphical representations of the fuzzy based membership functions show how the necessary linguistic terms are defined. The final fuzzy decision trees, both tutorial and US traffic fatalities based, show the structured form the analysis offers, as well as more readable decision rules contained therein.

Download Full-text

Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management

Wireless Communications and Mobile Computing ◽

10.1155/2021/4632722 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zhu Gu ◽

Chaohu He

Keyword(s):

Decision Tree ◽

Mobile Computing ◽

Classification Accuracy ◽

Decision Tree Algorithm ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Data Set ◽

Tree Algorithm ◽

Id3 Algorithm ◽

Fitness Industry

After the reform and the opening, the economy of our country has developed rapidly, and the living conditions of the people have become better and better. As a result, they have a lot of time to pay attention to their health, which has promoted the rapid development of the sports and fitness industry in my country. In response to the increasing development of the sports and fitness sector of my country, the current state of the administration of members of the sports fitness industry does not keep pace with the development of the sports and fitness industry of my country. Based on this, this article uses a fuzzy decision tree algorithm to establish a decision tree based on the characteristics of customer data and loses existing customers. Analyzing the situation is of strategic significance for improving the competitiveness of the club. This article selects the 7 most commonly used data sets from the UCI data set as the initial experimental data for model training in three different formats and then uses the data of a specific club member to conduct experiments, using these data files as training samples to construct a vague analysis of the decision tree to overturn the customer to analyze the main factors of customer change. Experiments show that the fuzzy decision tree ID3 algorithm based on mobile computing has the highest accuracy in the Iris data set, reaching 97.8%, and the accuracy rate in the Wine data set is the smallest, only 65.2%. The mobile computing-based fuzzy decision tree ID3 algorithm proposed in this paper obtained the highest correct rate (86.32%). This shows that, compared to traditional analysis methods, the blurred decision tree obtained for churn client analysis has the advantages of high classification accuracy and is understandable so that ideal classification accuracy can be achieved when the tree is small.

Download Full-text