Performance comparison of six Data mining models for soft churn customer prediction in Telecom

Marin Mandić; Goran Kraljević; Ivan Boban

doi:10.7251/ijeec1801029m

Performance comparison of six Data mining models for soft churn customer prediction in Telecom

IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING ◽

10.7251/ijeec1801029m ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Marin Mandić ◽

Goran Kraljević ◽

Ivan Boban

Keyword(s):

Data Mining ◽

Prediction Models ◽

Principal Component ◽

Performance Comparison ◽

Data Sets ◽

Network Algorithms ◽

Imbalance Problem ◽

Definition Of ◽

Multiple Conditions ◽

Selection Of

Due to a high competition in the market, the telecom operators are affected by churn, therefore it is very important for them to identify which users are likely to leave them and switch to the competition telecom company. This research uses data on behaviour of the users from telecom systems that serve to identify patterns in behaviours and thereby recognize the churn. Creating new definition of prepaid soft churn based on multiple conditions is valuable contribution of this paper. At preparing data, a selection of useful attributes was made using the Principal Component Analysis (PCA). The normalization of the attribute values has also been made in order to obtain a proper balance of the influence of all the attributes. Common problem with telecom churn prediction data is imbalance, taking into account the target variable. Such a case is also in the data used in this paper, where the percentage of churners is 12%. Comparison of undersampling and oversampling was performed as a method for resolving the data imbalance problem. Data sets with undersampling and oversasmpling have been used to train the decision tree, logistic regression and neural network algorithms and therefore six prediction models for detecting the churn of the Prepaid users in the telecom were created in this paper. Performance analysis and comparison of the six developed Data mining models was also performed.

Download Full-text

A Survey of Bayesian Data Mining

Data Mining ◽

10.4018/978-1-59140-051-6.ch001 ◽

2011 ◽

pp. 1-26 ◽

Cited By ~ 1

Author(s):

Stefan Arnborg

Keyword(s):

Data Mining ◽

Bayesian Analysis ◽

Graphical Models ◽

Bayesian Approach ◽

Data Sets ◽

Media Distribution ◽

Dirichlet Prior ◽

Approach Problem ◽

The Bayesian Approach ◽

Selection Of

This chapter reviews the fundamentals of inference, and gives a motivation for Bayesian analysis. The method is illustrated with dependency tests in data sets with categorical data variables, and the Dirichlet prior distributions. Principles and problems for deriving causality conclusions are reviewed, and illustrated with Simpson’s paradox. The selection of decomposable and directed graphical models illustrates the Bayesian approach. Bayesian and EM classification is shortly described. The material is illustrated on two cases, one in personalization of media distribution, one in schizophrenia research. These cases are illustrations of how to approach problem types that exist in many other application areas.

Download Full-text

Automatic and semantic pre — Selection of features using ontology for data mining on data sets related to cancer

International Conference on Information Society (i-Society 2014) ◽

10.1109/i-society.2014.7009060 ◽

2014 ◽

Author(s):

Adriana da Silva Jacinto ◽

Ricardo da Silva Santos ◽

Jose Maria Parente de Oliveira

Keyword(s):

Data Mining ◽

Data Sets ◽

Selection Of

Download Full-text

Are Female Paraphilias Hiding in Plain Sight? Risqué Male–Male Erotica for Women in Sinophone and Anglophone Regions

Archives of Sexual Behavior ◽

10.1007/s10508-021-02107-4 ◽

2021 ◽

Author(s):

Anna Madill ◽

Yao Zhao

Keyword(s):

Principal Component ◽

Legal Issues ◽

Market Force ◽

Data Sets ◽

Sexual Content ◽

Component Structure ◽

English Speaking ◽

Categorical Principal Component Analysis ◽

Sexual Stories ◽

Definition Of

AbstractFemale-oriented male–male erotica is a genre of popular culture often know as Boys’ Love (BL), yaoi, and danmei. It is one of the largest by-and-for women sexual subcultures and a global phenomenon. With the largest data sets in the field, we ask: Which risqué sexual content do Sinophone (Chinese-speaking) and Anglophone (English-speaking) participants particularly enjoy in BL and does this differ between cultures?, and Are there sub-demographics in Sinophone and in Anglophone culture who enjoy particular forms of risqué sexual content in BL and do these forms relate also to enjoyment of particular storylines and concern with legal issues? The material studied meets the DSM-5 definition of the paraphilic, and little is known about paraphilias in women or in the general population. Using Categorical Principal Component Analysis we explored one 15-response question from our Sinophone (N = 1922) and Anglophone (N = 1715) BL fandom surveys: Which risqué sexual content do you particularly enjoy in BL? We also tested for associations with seven demographic and other BL content-related questions. Notably, the component structure was nearly replicated between the two independent samples, in order of strength: BDSM Specialist, Mechanoid/Animal Sex Specialist, Underage Sex Specialist, and Minority Paraphilia Specialist. In both samples, it was the avid BL fans and/or those who liked explicitly sexual stories, a largely overlapping demographic, who most engage the risqué content, while, for the Sinophone, this included also more non-heterosexual and/or other-gendered people. We conclude that women’s paraphilias have been largely overlooked because they might be expressed more commonly through fantasy than action, that their mass expression has awaited both the means and the market force, and that current conceptualization of, and assumptions about, paraphilias is overly modeled on that of men.

Download Full-text

Constructionist Perspective of Organizational Data Mining

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch094 ◽

2005 ◽

pp. 535-539

Author(s):

Isabel Ramos ◽

João Álvaro Carvalho

Keyword(s):

Social Sciences ◽

Data Mining ◽

History Of Science ◽

Knowledge Creation ◽

Organizational Knowledge ◽

Cultural Root ◽

Organizational Knowledge Creation ◽

History Of ◽

Definition Of ◽

Selection Of

Scientific or organizational knowledge creation has been addressed from different perspectives along the history of science and, in particular, of social sciences. The process is guided by the set of values, beliefs and norms shared by the members of the community to which the creator of this knowledge belongs, that is, it is guided by the adopted paradigm (Lincoln & Guba, 2000). The adopted paradigm determines how the nature of the studied reality is understood, the criteria that will be used to assess the validity of the created knowledge, and the construction and selection of methods, techniques and tools to structure and support the creation of knowledge. This set of ontological, epistemological, and methodological assumptions that characterize the paradigm one implicitly or explicitly uses to make sense of the surrounding reality is the cultural root of the intellectual enterprises. Those assumptions constrain the accomplishment of activities such as construction of theories, definition of inquiry strategies, interpretation of perceived phenomena, and dissemination of knowledge (Schwandt, 2000).

Download Full-text

CONDITIONS FOR THE APPLICABILITY OF THE PRINCIPAL COMPONENT ANALYSIS TO THE CHARACTERIZATION OF THE 1/f-NOISE

Fluctuation and Noise Letters ◽

10.1142/s0219477506003100 ◽

2006 ◽

Vol 06 (01) ◽

pp. L17-L28 ◽

Cited By ~ 1

Author(s):

JOSÉ MANUEL LÓPEZ-ALONSO ◽

JAVIER ALDA

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Random Variable ◽

Component Analysis ◽

Optical Systems ◽

Data Sets ◽

Regular Time ◽

Pca Method ◽

Definition Of

Principal Component Analysis (PCA) has been applied to the characterization of the 1/f-noise. The application of the PCA to the 1/f noise requires the definition of a stochastic multidimensional variable. The components of this variable describe the temporal evolution of the phenomena sampled at regular time intervals. In this paper we analyze the conditions about the number of observations and the dimension of the multidimensional random variable necessary to use the PCA method in a sound manner. We have tested the obtained conditions for simulated and experimental data sets obtained from imaging optical systems. The results can be extended to other fields where this kind of noise is relevant.

Download Full-text

Constructionist Perspective of Organizational Data Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch137 ◽

2008 ◽

pp. 2296-2301

Author(s):

Isabel Ramos ◽

João Álvaro Carvalho

Keyword(s):

Social Sciences ◽

Data Mining ◽

History Of Science ◽

Knowledge Creation ◽

Organizational Knowledge ◽

Cultural Root ◽

Organizational Knowledge Creation ◽

History Of ◽

Definition Of ◽

Selection Of

Download Full-text

Data Mining: Modeling, Algorithms, Applications and Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.2786 ◽

2014 ◽

Vol 926-930 ◽

pp. 2786-2789

Author(s):

Jing Zhu Li ◽

Qian Li ◽

Tai Yu Liu ◽

Wei Hong Niu

Keyword(s):

Data Mining ◽

Algorithm Design ◽

Software Tools ◽

Current Data ◽

Data Sets ◽

Data Mining Algorithm ◽

Practical Application ◽

Current Field ◽

Comprehensive Description ◽

Definition Of

Data mining is a multidisciplinary field of the 20th century gradually, this paper based on data mining modeling, algorithms, applications and software tools were reviewed, the definition of data mining, the scope and characteristics of the data sets and data mining various practical situations; summarizes the data mining in the practical application of the basic steps and processes; data mining tasks in a variety of applications and modeling issues were discussed; cited the current field of data mining is mainly popular algorithms, and algorithm design issues to consider briefly analyzed; overview of the current data mining algorithm in a number of areas; more comprehensive description of the current performance and data mining software tools developer circumstances; Finally, the development of data mining prospects and direction prospected.

Download Full-text

Privacy-Preserving Hybrid K-Means

Censorship, Surveillance, and Privacy ◽

10.4018/978-1-5225-7113-1.ch049 ◽

2019 ◽

pp. 1009-1026

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Data Mining ◽

Differential Privacy ◽

Privacy Preserving ◽

Local Optimum ◽

Data Sets ◽

Swarm Optimization ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.

Download Full-text

Limitations in the Applicability of Bankruptcy Prediction Models

Economic and social alternatives ◽

10.37075/isa.2021.2.02 ◽

2021 ◽

Vol 27 (2) ◽

pp. 19-31

Keyword(s):

Prediction Models ◽

Bankruptcy Prediction ◽

Bankruptcy Risk ◽

Wide Range ◽

Quality Of Accounting Information ◽

Uniform Definition ◽

Definition Of ◽

Specific Definition ◽

Selection Of

The problems associated with the application of bankruptcy prediction models are of a wide range. A review of the literature shows the lack of a uniform definition of bankruptcy. The existing diversity in the definitions of bankruptcy complicates the comparability of the different studies, hence why it is considered appropriate to take the specific definition of bankruptcy that the bankruptcy prediction models are based on into account when applying them in practice. The selection of companies in the various studies has also been the subject of much criticism. The literature also raises the question of the quality of accounting information. There are also discussions about which indicators should be included in the models. Many studies have demonstrated the benefits of including market information as well as non-financial information in bankruptcy risk analysis. There is also no consensus on the statement that data on the cash flow of companies should be used to increase the predictive power of the models.

Download Full-text