An Expanded Assessment of Data Mining Approaches for Analyzing Actuarial Student Success Rate

2016 ◽  
Vol 3 (1) ◽  
pp. 22-44 ◽  
Author(s):  
Alan Olinsky ◽  
Phyllis Schumacher ◽  
John Quinn

One way to enhance the likelihood that more university students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This paper expands upon a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained using input variables describing academic attributes of the students. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. In addition, the non-predictive method of cluster analysis is applied in order to group these students into distinct classifications based on the values of the input variables. Finally, a new approach to modeling in SAS®, called Rapid Predictive Modeler (RPM), is described and utilized. The results of the RPM also select the regression model as the best predictor.

Data Mining ◽  
2013 ◽  
pp. 1819-1834
Author(s):  
Alan Olinsky ◽  
Phyllis A. Schumacher ◽  
John Quinn

One way to enhance the likelihood that more students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This chapter details a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. Not only is the resulting model a good one for predicting success in the major, it also allows us the ability to better counsel students.


Author(s):  
Alan Olinsky ◽  
Phyllis A. Schumacher ◽  
John Quinn

One way to enhance the likelihood that more students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This chapter details a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. Not only is the resulting model a good one for predicting success in the major, it also allows us the ability to better counsel students.


Author(s):  
Yevgeniy Bodyanskiy ◽  
Olena Vynokurova ◽  
Oleksii Tyshchenko

This work is devoted to synthesis of adaptive hybrid systems based on the Computational Intelligence (CI) methods (especially artificial neural networks (ANNs)) and the Group Method of Data Handling (GMDH) ideas to get new qualitative results in Data Mining, Intelligent Control and other scientific areas. The GMDH-artificial neural networks (GMDH-ANNs) are currently well-known. Their nodes are two-input N-Adalines. On the other hand, these ANNs can require a considerable number of hidden layers for a necessary approximation quality. Introduced Q-neurons can provide a higher quality using the quadratic approximation. Their main advantage is a high learning rate. Universal approximating properties of the GMDH-ANNs can be achieved with the help of compartmental R-neurons representing a two-input RBFN with the grid partitioning of the input variables' space. An adjustment procedure of synaptic weights as well as both centers and receptive fields is provided. At the same time, Epanechnikov kernels (their derivatives are linear to adjusted parameters) can be used instead of conventional Gauss functions in order to increase a learning process rate. More complex tasks deal with stochastic time series processing. This kind of tasks can be solved with the help of the introduced adaptive W-neurons (wavelets). Learning algorithms are characterized by both tracking and smoothing properties based on the quadratic learning criterion. Robust algorithms which eliminate an influence of abnormal outliers on the learning process are introduced too. Theoretical results are illustrated by multiple experiments that confirm the proposed approach's effectiveness.


2010 ◽  
Vol 37 (3) ◽  
pp. 389-400 ◽  
Author(s):  
Lu Sun ◽  
Jun Yang ◽  
Hani Mahmassani ◽  
Wenjun Gu ◽  
Bum-Jin Kim

In this paper, we developed a methodological framework to deal with traffic-stream modeling based on data mining, steepest-ascend algorithm, and genetic algorithm. The new method is adaptive in nature and has a greater flexibility and generality compared with existing methods. It provides an optimum overall fitting of the observed data. Specifically, the advantages of adaptive regression are that (1) knot positions and model parameters are estimated optimally and simultaneously using genetic algorithm, and presetting of knot positions can be performed in terms of either density or speed; (2) the method is automatic and data driven, and it will always find out the best fitting model to site-dependent actual traffic data; and (3) the user has a great flexibility to specify the degree-model continuity and to define and add new basis functions that are parsimonious and fit better into the traffic data in some regime of speed–density relation. The proposed method and developed computer software package MiningFlow will be beneficial to traffic operations and traffic simulation.


Author(s):  
Niall Rooney

The concept of ensemble learning has its origins in research from the late 1980s/early 1990s into combining a number of artificial neural networks (ANNs) models for regression tasks. Ensemble learning is now a widely deployed and researched topic within the area of machine learning and data mining. Ensemble learning, as a general definition, refers to the concept of being able to apply more than one learning model to a particular machine learning problem using some method of integration. The desired goal of course is that the ensemble as a unit will outperform any of its individual members for the given learning task. Ensemble learning has been extended to cover other learning tasks such as classification (refer to Kuncheva, 2004 for a detailed overview of this area), online learning (Fern & Givan, 2003) and clustering (Strehl & Ghosh, 2003). The focus of this article is to review ensemble learning with respect to regression, where by regression, we refer to the supervised learning task of creating a model that relates a continuous output variable to a vector of input variables.


Author(s):  
Mohamed Salah Hamdi

Data-mining technology delivers two key benefits: (i) a descriptive function, enabling enterprises, regardless of industry or size, in the context of defined business objectives, to automatically explore, visualize, and understand their data and to identify patterns, relationships, and dependencies that impact business outcomes (i.e., revenue growth, profit improvement, cost containment, and risk management); (ii) a predictive function, enabling relationships uncovered and identified through the data-mining process to be expressed as business rules or predictive models. These outputs can be communicated in traditional reporting formats (i.e., presentations, briefs, electronic information sharing) to guide business planning and strategy. Also, these outputs, expressed as programming code, can be deployed or hard wired into business-operating systems to generate predictions of future outcomes, based on newly generated data, with higher accuracy and certainty.


2009 ◽  
Vol 419-420 ◽  
pp. 369-372 ◽  
Author(s):  
Po Tsang B. Huang ◽  
James C. Chen ◽  
Yuan Tsan Jou

The key element of the in-process surface roughness monitoring system is the decision-making model, which is utilized to analyze the input factors and then to generate a proper output. The success of the in-process monitoring system depends on the accuracy of the decision-making model. To increase the accuracy and reliability of model, it is important to reduce the variation of the inputs. To achieve this objective, an integration of regression and neural network was developed as a decision-making model in this research. In this integrated model, the regression model was applied as a filter to sort the input variables into groups. Furthermore, the grouped data was implemented to train and to generate different neural networks models to reduce the affection of input variation and increase the accuracy of the monitoring system. The input variables was first filtered by the threshold of regression model, and then analyzed by different neural networks model based on the filtered result. Finally, to evaluate the performance of the integrated model, the regression neural network and traditional neural networks were both developed for surface roughness monitoring system in an end milling operation to compare the accuracy of systems.


2019 ◽  
Vol 21 (5) ◽  
pp. 798-811 ◽  
Author(s):  
Zohreh Sheikh Khozani ◽  
Khabat Khosravi ◽  
Binh Thai Pham ◽  
Bjørn Kløve ◽  
Wan Hanna Melini Wan Mohtar ◽  
...  

Abstract Momentum exchange in the mixing region between the floodplain and the main channel is an essential hydraulic process, particularly for the estimation of discharge. The current study investigated various data mining models to estimate apparent shear stress in a symmetric compound channel with smooth and rough floodplains. The applied predictive models include random forest (RF), random tree (RT), reduced error pruning tree (REPT), M5P, and the distinguished hybrid bagging-M5P model. The models are constructed based on several correlated physical channel characteristic variables to predict the apparent shear stress. A sensitivity analysis is applied to select the best function tuning parameters for each model. Results showed that input with six variables exhibited the best prediction results for RF model while input with four variables produced the best performance for other models. Based on the optimised input variables for each model, the efficiency of five predictive models discussed here was evaluated. It was found that the M5P and hybrid bagging-M5P models with the coefficient of determination (R2) equal to 0.905 and 0.92, respectively, in the testing stage are superior in estimating apparent shear stress in compound channels than other RF, RT and REPT models.


2019 ◽  
Vol 1 (2) ◽  
pp. 225-230
Author(s):  
Aswan Supriyadi Sunge

Diabetes is one of the chronic diseases caused by excess sugar in the blood. Various methods of automated algorithms in various to anticipate and diagnose diabetes. One approach to data mining method can help diagnose the patient's disease. In the presence of predictions can save human life and begin prevention before the disease attacks the patient. Choosing a legitimate classification clearly expands the truth and accuracy of the system as levels continue to increase. Most diabetics know little about the risk factors they face before the diagnosis. This method uses developing five predictive models using 9 input variables and one output variable from the dataset information. The purpose of this study was to compare performance analysis of Naive Bayes, Decision Tree, SVM, K-NN and ANN models to predict diabetes millitus


Informatics ◽  
2018 ◽  
Vol 6 (1) ◽  
pp. 1 ◽  
Author(s):  
Ioannis Livieris

In this work, a new approach for training artificial neural networks is presented which utilises techniques for solving the constraint optimisation problem. More specifically, this study converts the training of a neural network into a constraint optimisation problem. Furthermore, we propose a new neural network training algorithm based on the L-BFGS-B method. Our numerical experiments illustrate the classification efficiency of the proposed algorithm and of our proposed methodology, leading to more efficient, stable and robust predictive models.


Sign in / Sign up

Export Citation Format

Share Document