scholarly journals Performance comparison of six Data mining models for soft churn customer prediction in Telecom

Author(s):  
Marin Mandić ◽  
Goran Kraljević ◽  
Ivan Boban

Due to a high competition in the market, the telecom operators are affected by churn, therefore it is very important for them to identify which users are likely to leave them and switch to the competition telecom company. This research uses data on behaviour of the users from telecom systems that serve to identify patterns in behaviours and thereby recognize the churn. Creating new definition of prepaid soft churn based on multiple conditions is valuable contribution of this paper. At preparing data, a selection of useful attributes was made using the Principal Component Analysis (PCA). The normalization of the attribute values has also been made in order to obtain a proper balance of the influence of all the attributes. Common problem with telecom churn prediction data is imbalance, taking into account the target variable. Such a case is also in the data used in this paper, where the percentage of churners is 12%. Comparison of undersampling and oversampling was performed as a method for resolving the data imbalance problem. Data sets with undersampling and oversasmpling have been used to train the decision tree, logistic regression and neural network algorithms and therefore six prediction models for detecting the churn of the Prepaid users in the telecom were created in this paper. Performance analysis and comparison of the six developed Data mining models was also performed.

Data Mining ◽  
2011 ◽  
pp. 1-26 ◽  
Author(s):  
Stefan Arnborg

This chapter reviews the fundamentals of inference, and gives a motivation for Bayesian analysis. The method is illustrated with dependency tests in data sets with categorical data variables, and the Dirichlet prior distributions. Principles and problems for deriving causality conclusions are reviewed, and illustrated with Simpson’s paradox. The selection of decomposable and directed graphical models illustrates the Bayesian approach. Bayesian and EM classification is shortly described. The material is illustrated on two cases, one in personalization of media distribution, one in schizophrenia research. These cases are illustrations of how to approach problem types that exist in many other application areas.


Author(s):  
Anna Madill ◽  
Yao Zhao

AbstractFemale-oriented male–male erotica is a genre of popular culture often know as Boys’ Love (BL), yaoi, and danmei. It is one of the largest by-and-for women sexual subcultures and a global phenomenon. With the largest data sets in the field, we ask: Which risqué sexual content do Sinophone (Chinese-speaking) and Anglophone (English-speaking) participants particularly enjoy in BL and does this differ between cultures?, and Are there sub-demographics in Sinophone and in Anglophone culture who enjoy particular forms of risqué sexual content in BL and do these forms relate also to enjoyment of particular storylines and concern with legal issues? The material studied meets the DSM-5 definition of the paraphilic, and little is known about paraphilias in women or in the general population. Using Categorical Principal Component Analysis we explored one 15-response question from our Sinophone (N = 1922) and Anglophone (N = 1715) BL fandom surveys: Which risqué sexual content do you particularly enjoy in BL? We also tested for associations with seven demographic and other BL content-related questions. Notably, the component structure was nearly replicated between the two independent samples, in order of strength: BDSM Specialist, Mechanoid/Animal Sex Specialist, Underage Sex Specialist, and Minority Paraphilia Specialist. In both samples, it was the avid BL fans and/or those who liked explicitly sexual stories, a largely overlapping demographic, who most engage the risqué content, while, for the Sinophone, this included also more non-heterosexual and/or other-gendered people. We conclude that women’s paraphilias have been largely overlooked because they might be expressed more commonly through fantasy than action, that their mass expression has awaited both the means and the market force, and that current conceptualization of, and assumptions about, paraphilias is overly modeled on that of men.


Author(s):  
Isabel Ramos ◽  
João Álvaro Carvalho

Scientific or organizational knowledge creation has been addressed from different perspectives along the history of science and, in particular, of social sciences. The process is guided by the set of values, beliefs and norms shared by the members of the community to which the creator of this knowledge belongs, that is, it is guided by the adopted paradigm (Lincoln & Guba, 2000). The adopted paradigm determines how the nature of the studied reality is understood, the criteria that will be used to assess the validity of the created knowledge, and the construction and selection of methods, techniques and tools to structure and support the creation of knowledge. This set of ontological, epistemological, and methodological assumptions that characterize the paradigm one implicitly or explicitly uses to make sense of the surrounding reality is the cultural root of the intellectual enterprises. Those assumptions constrain the accomplishment of activities such as construction of theories, definition of inquiry strategies, interpretation of perceived phenomena, and dissemination of knowledge (Schwandt, 2000).


2006 ◽  
Vol 06 (01) ◽  
pp. L17-L28 ◽  
Author(s):  
JOSÉ MANUEL LÓPEZ-ALONSO ◽  
JAVIER ALDA

Principal Component Analysis (PCA) has been applied to the characterization of the 1/f-noise. The application of the PCA to the 1/f noise requires the definition of a stochastic multidimensional variable. The components of this variable describe the temporal evolution of the phenomena sampled at regular time intervals. In this paper we analyze the conditions about the number of observations and the dimension of the multidimensional random variable necessary to use the PCA method in a sound manner. We have tested the obtained conditions for simulated and experimental data sets obtained from imaging optical systems. The results can be extended to other fields where this kind of noise is relevant.


2008 ◽  
pp. 2296-2301
Author(s):  
Isabel Ramos ◽  
João Álvaro Carvalho

Scientific or organizational knowledge creation has been addressed from different perspectives along the history of science and, in particular, of social sciences. The process is guided by the set of values, beliefs and norms shared by the members of the community to which the creator of this knowledge belongs, that is, it is guided by the adopted paradigm (Lincoln & Guba, 2000). The adopted paradigm determines how the nature of the studied reality is understood, the criteria that will be used to assess the validity of the created knowledge, and the construction and selection of methods, techniques and tools to structure and support the creation of knowledge. This set of ontological, epistemological, and methodological assumptions that characterize the paradigm one implicitly or explicitly uses to make sense of the surrounding reality is the cultural root of the intellectual enterprises. Those assumptions constrain the accomplishment of activities such as construction of theories, definition of inquiry strategies, interpretation of perceived phenomena, and dissemination of knowledge (Schwandt, 2000).


2014 ◽  
Vol 926-930 ◽  
pp. 2786-2789
Author(s):  
Jing Zhu Li ◽  
Qian Li ◽  
Tai Yu Liu ◽  
Wei Hong Niu

Data mining is a multidisciplinary field of the 20th century gradually, this paper based on data mining modeling, algorithms, applications and software tools were reviewed, the definition of data mining, the scope and characteristics of the data sets and data mining various practical situations; summarizes the data mining in the practical application of the basic steps and processes; data mining tasks in a variety of applications and modeling issues were discussed; cited the current field of data mining is mainly popular algorithms, and algorithm design issues to consider briefly analyzed; overview of the current data mining algorithm in a number of areas; more comprehensive description of the current performance and data mining software tools developer circumstances; Finally, the development of data mining prospects and direction prospected.


Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2021 ◽  
Vol 27 (2) ◽  
pp. 19-31

The problems associated with the application of bankruptcy prediction models are of a wide range. A review of the literature shows the lack of a uniform definition of bankruptcy. The existing diversity in the definitions of bankruptcy complicates the comparability of the different studies, hence why it is considered appropriate to take the specific definition of bankruptcy that the bankruptcy prediction models are based on into account when applying them in practice. The selection of companies in the various studies has also been the subject of much criticism. The literature also raises the question of the quality of accounting information. There are also discussions about which indicators should be included in the models. Many studies have demonstrated the benefits of including market information as well as non-financial information in bankruptcy risk analysis. There is also no consensus on the statement that data on the cash flow of companies should be used to increase the predictive power of the models.


Sign in / Sign up

Export Citation Format

Share Document