An Expanded Assessment of Data Mining Approaches for Analyzing Actuarial Student Success Rate

Alan Olinsky; Phyllis Schumacher; John Quinn

doi:10.4018/ijban.2016010102

Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate

Data Mining ◽

10.4018/978-1-4666-2455-9.ch094 ◽

2013 ◽

pp. 1819-1834

Author(s):

Alan Olinsky ◽

Phyllis A. Schumacher ◽

John Quinn

Keyword(s):

Data Mining ◽

Neural Networks ◽

Logistic Regression ◽

Decision Tree ◽

Student Success ◽

Predictive Models ◽

Drop Out ◽

Predicting Success ◽

Best Fitting ◽

Fitting Model

One way to enhance the likelihood that more students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This chapter details a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. Not only is the resulting model a good one for predicting success in the major, it also allows us the ability to better counsel students.

Download Full-text

Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate

Visual Analytics and Interactive Technologies ◽

10.4018/978-1-60960-102-7.ch010 ◽

2011 ◽

pp. 169-185

Author(s):

Alan Olinsky ◽

Phyllis A. Schumacher ◽

John Quinn

Keyword(s):

Data Mining ◽

Neural Networks ◽

Logistic Regression ◽

Decision Tree ◽

Student Success ◽

Predictive Models ◽

Drop Out ◽

Predicting Success ◽

Best Fitting ◽

Fitting Model

One way to enhance the likelihood that more students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This chapter details a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. Not only is the resulting model a good one for predicting success in the major, it also allows us the ability to better counsel students.

Download Full-text

Hybrid Wavelet-Neuro-Fuzzy Systems of Computational Intelligence in Data Mining Tasks

Handbook of Research on Machine Learning Innovations and Trends - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2229-4.ch035 ◽

2017 ◽

pp. 787-825

Author(s):

Yevgeniy Bodyanskiy ◽

Olena Vynokurova ◽

Oleksii Tyshchenko

Keyword(s):

Data Mining ◽

Neural Networks ◽

Artificial Neural Networks ◽

Learning Process ◽

Computational Intelligence ◽

Group Method ◽

Learning Criterion ◽

Adjustment Procedure ◽

Artificial Neural ◽

Input Variables

This work is devoted to synthesis of adaptive hybrid systems based on the Computational Intelligence (CI) methods (especially artificial neural networks (ANNs)) and the Group Method of Data Handling (GMDH) ideas to get new qualitative results in Data Mining, Intelligent Control and other scientific areas. The GMDH-artificial neural networks (GMDH-ANNs) are currently well-known. Their nodes are two-input N-Adalines. On the other hand, these ANNs can require a considerable number of hidden layers for a necessary approximation quality. Introduced Q-neurons can provide a higher quality using the quadratic approximation. Their main advantage is a high learning rate. Universal approximating properties of the GMDH-ANNs can be achieved with the help of compartmental R-neurons representing a two-input RBFN with the grid partitioning of the input variables' space. An adjustment procedure of synaptic weights as well as both centers and receptive fields is provided. At the same time, Epanechnikov kernels (their derivatives are linear to adjusted parameters) can be used instead of conventional Gauss functions in order to increase a learning process rate. More complex tasks deal with stochastic time series processing. This kind of tasks can be solved with the help of the introduced adaptive W-neurons (wavelets). Learning algorithms are characterized by both tracking and smoothing properties based on the quadratic learning criterion. Robust algorithms which eliminate an influence of abnormal outliers on the learning process are introduced too. Theoretical results are illustrated by multiple experiments that confirm the proposed approach's effectiveness.

Download Full-text

Data mining-based adaptive regression for developing equilibrium speed–density relationships

Canadian Journal of Civil Engineering ◽

10.1139/l09-158 ◽

2010 ◽

Vol 37 (3) ◽

pp. 389-400 ◽

Cited By ~ 9

Author(s):

Lu Sun ◽

Jun Yang ◽

Hani Mahmassani ◽

Wenjun Gu ◽

Bum-Jin Kim

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Computer Software ◽

Model Parameters ◽

Traffic Data ◽

Traffic Operations ◽

Density Relation ◽

Best Fitting ◽

Adaptive Regression ◽

Fitting Model

In this paper, we developed a methodological framework to deal with traffic-stream modeling based on data mining, steepest-ascend algorithm, and genetic algorithm. The new method is adaptive in nature and has a greater flexibility and generality compared with existing methods. It provides an optimum overall fitting of the observed data. Specifically, the advantages of adaptive regression are that (1) knot positions and model parameters are estimated optimally and simultaneously using genetic algorithm, and presetting of knot positions can be performed in terms of either density or speed; (2) the method is automatic and data driven, and it will always find out the best fitting model to site-dependent actual traffic data; and (3) the user has a great flexibility to specify the degree-model continuity and to define and add new basis functions that are parsimonious and fit better into the traffic data in some regime of speed–density relation. The proposed method and developed computer software package MiningFlow will be beneficial to traffic operations and traffic simulation.

Download Full-text

Ensemble Learning for Regression

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch120 ◽

2011 ◽

pp. 777-782

Author(s):

Niall Rooney

Keyword(s):

Machine Learning ◽

Data Mining ◽

Neural Networks ◽

Ensemble Learning ◽

Learning Task ◽

General Definition ◽

Learning Tasks ◽

Continuous Output ◽

Input Variables ◽

The Given

The concept of ensemble learning has its origins in research from the late 1980s/early 1990s into combining a number of artificial neural networks (ANNs) models for regression tasks. Ensemble learning is now a widely deployed and researched topic within the area of machine learning and data mining. Ensemble learning, as a general definition, refers to the concept of being able to apply more than one learning model to a particular machine learning problem using some method of integration. The desired goal of course is that the ensemble as a unit will outperform any of its individual members for the given learning task. Ensemble learning has been extended to cover other learning tasks such as classification (refer to Kuncheva, 2004 for a detailed overview of this area), online learning (Fern & Givan, 2003) and clustering (Strehl & Ghosh, 2003). The focus of this article is to review ensemble learning with respect to regression, where by regression, we refer to the supervised learning task of creating a model that relates a continuous output variable to a vector of input variables.

Download Full-text

Employing Neural Networks in Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch082 ◽

2011 ◽

pp. 433-437

Author(s):

Mohamed Salah Hamdi

Keyword(s):

Data Mining ◽

Neural Networks ◽

Risk Management ◽

Operating Systems ◽

Predictive Models ◽

Mining Technology ◽

Business Outcomes ◽

Revenue Growth ◽

Future Outcomes ◽

Electronic Information Sharing

Data-mining technology delivers two key benefits: (i) a descriptive function, enabling enterprises, regardless of industry or size, in the context of defined business objectives, to automatically explore, visualize, and understand their data and to identify patterns, relationships, and dependencies that impact business outcomes (i.e., revenue growth, profit improvement, cost containment, and risk management); (ii) a predictive function, enabling relationships uncovered and identified through the data-mining process to be expressed as business rules or predictive models. These outputs can be communicated in traditional reporting formats (i.e., presentations, briefs, electronic information sharing) to guide business planning and strategy. Also, these outputs, expressed as programming code, can be deployed or hard wired into business-operating systems to generate predictions of future outcomes, based on newly generated data, with higher accuracy and certainty.

Download Full-text

A Regression Neural Model for In-Process Surface Roughness Monitoring in End Milling Operations

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.419-420.369 ◽

2009 ◽

Vol 419-420 ◽

pp. 369-372 ◽

Cited By ~ 2

Author(s):

Po Tsang B. Huang ◽

James C. Chen ◽

Yuan Tsan Jou

Keyword(s):

Neural Network ◽

Neural Networks ◽

Decision Making ◽

Surface Roughness ◽

Regression Model ◽

Monitoring System ◽

End Milling ◽

Integrated Model ◽

Input Variables ◽

Decision Making Model

The key element of the in-process surface roughness monitoring system is the decision-making model, which is utilized to analyze the input factors and then to generate a proper output. The success of the in-process monitoring system depends on the accuracy of the decision-making model. To increase the accuracy and reliability of model, it is important to reduce the variation of the inputs. To achieve this objective, an integration of regression and neural network was developed as a decision-making model in this research. In this integrated model, the regression model was applied as a filter to sort the input variables into groups. Furthermore, the grouped data was implemented to train and to generate different neural networks models to reduce the affection of input variation and increase the accuracy of the monitoring system. The input variables was first filtered by the threshold of regression model, and then analyzed by different neural networks model based on the filtered result. Finally, to evaluate the performance of the integrated model, the regression neural network and traditional neural networks were both developed for surface roughness monitoring system in an end milling operation to compare the accuracy of systems.

Download Full-text

Determination of compound channel apparent shear stress: application of novel data mining models

Journal of Hydroinformatics ◽

10.2166/hydro.2019.037 ◽

2019 ◽

Vol 21 (5) ◽

pp. 798-811 ◽

Cited By ~ 25

Author(s):

Zohreh Sheikh Khozani ◽

Khabat Khosravi ◽

Binh Thai Pham ◽

Bjørn Kløve ◽

Wan Hanna Melini Wan Mohtar ◽

...

Keyword(s):

Data Mining ◽

Shear Stress ◽

Predictive Models ◽

Coefficient Of Determination ◽

Compound Channel ◽

Momentum Exchange ◽

Testing Stage ◽

Physical Channel ◽

Input Variables ◽

Hydraulic Process

Abstract Momentum exchange in the mixing region between the floodplain and the main channel is an essential hydraulic process, particularly for the estimation of discharge. The current study investigated various data mining models to estimate apparent shear stress in a symmetric compound channel with smooth and rough floodplains. The applied predictive models include random forest (RF), random tree (RT), reduced error pruning tree (REPT), M5P, and the distinguished hybrid bagging-M5P model. The models are constructed based on several correlated physical channel characteristic variables to predict the apparent shear stress. A sensitivity analysis is applied to select the best function tuning parameters for each model. Results showed that input with six variables exhibited the best prediction results for RF model while input with four variables produced the best performance for other models. Based on the optimised input variables for each model, the efficiency of five predictive models discussed here was evaluated. It was found that the M5P and hybrid bagging-M5P models with the coefficient of determination (R2) equal to 0.905 and 0.92, respectively, in the testing stage are superior in estimating apparent shear stress in compound channels than other RF, RT and REPT models.

Download Full-text

Comparison Data Mining Techniques To Prediction Diabetes Mellitus

Journal of Sustainable Engineering: Proceedings Series ◽

10.35793/joseps.v1i2.31 ◽

2019 ◽

Vol 1 (2) ◽

pp. 225-230

Author(s):

Aswan Supriyadi Sunge

Keyword(s):

Diabetes Mellitus ◽

Data Mining ◽

Predictive Models ◽

Human Life ◽

Mining Method ◽

Output Variable ◽

Ann Models ◽

Input Variables ◽

Automated Algorithms ◽

Comparison Data

Diabetes is one of the chronic diseases caused by excess sugar in the blood. Various methods of automated algorithms in various to anticipate and diagnose diabetes. One approach to data mining method can help diagnose the patient's disease. In the presence of predictions can save human life and begin prevention before the disease attacks the patient. Choosing a legitimate classification clearly expands the truth and accuracy of the system as levels continue to increase. Most diabetics know little about the risk factors they face before the diagnosis. This method uses developing five predictive models using 9 input variables and one output variable from the dataset information. The purpose of this study was to compare performance analysis of Naive Bayes, Decision Tree, SVM, K-NN and ANN models to predict diabetes millitus

Download Full-text

Improving the Classification Efficiency of an ANN Utilizing a New Training Methodology

Informatics ◽

10.3390/informatics6010001 ◽

2018 ◽

Vol 6 (1) ◽

pp. 1 ◽

Cited By ~ 15

Author(s):

Ioannis Livieris

Keyword(s):

Neural Network ◽

Neural Networks ◽

Predictive Models ◽

Numerical Experiments ◽

Neural Network Training ◽

Training Algorithm ◽

New Approach ◽

Training Methodology ◽

Network Training ◽

Classification Efficiency

In this work, a new approach for training artificial neural networks is presented which utilises techniques for solving the constraint optimisation problem. More specifically, this study converts the training of a neural network into a constraint optimisation problem. Furthermore, we propose a new neural network training algorithm based on the L-BFGS-B method. Our numerical experiments illustrate the classification efficiency of the proposed algorithm and of our proposed methodology, leading to more efficient, stable and robust predictive models.

Download Full-text