Feature Interaction in Terms of Prediction Performance

Sejong Oh

doi:10.3390/app9235191

Feature Interaction in Terms of Prediction Performance

Applied Sciences ◽

10.3390/app9235191 ◽

2019 ◽

Vol 9 (23) ◽

pp. 5191

Author(s):

Sejong Oh

Keyword(s):

High Performance ◽

Prediction Performance ◽

Feature Interaction ◽

Classification Models ◽

Learning Models ◽

Feature Importance ◽

Working Principle ◽

Specific Prediction ◽

Interaction Measure ◽

New Feature

There has been considerable development in machine learning in recent years with some remarkable successes. Although there are many high-performance methods, the interpretation of learning models remains challenging. Understanding the underlying theory behind the specific prediction of various models is difficult. Various studies have attempted to explain the working principle behind learning models using techniques like feature importance, partial dependency, feature interaction, and the Shapley value. This study introduces a new feature interaction measure. While recent studies have measured feature interaction using partial dependency, this study redefines feature interaction in terms of prediction performance. The proposed measure is easy to interpret, faster than partial dependency-based measures, and useful to explain feature interaction, which affects prediction performance in both regression and classification models.

Download Full-text

Comparison of feature importance measures as explanations for classification models

SN Applied Sciences ◽

10.1007/s42452-021-04148-9 ◽

2021 ◽

Vol 3 (2) ◽

Cited By ~ 1

Author(s):

Mirka Saarela ◽

Susanne Jauhiainen

Keyword(s):

Research Direction ◽

Classification Models ◽

Breast Cancer Data ◽

Learning Models ◽

False Negatives ◽

Cancer Data ◽

Feature Importance ◽

Interpretable Model ◽

Global And Local ◽

Machine Learning Models

AbstractExplainable artificial intelligence is an emerging research direction helping the user or developer of machine learning models understand why models behave the way they do. The most popular explanation technique is feature importance. However, there are several different approaches how feature importances are being measured, most notably global and local. In this study we compare different feature importance measures using both linear (logistic regression with L1 penalization) and non-linear (random forest) methods and local interpretable model-agnostic explanations on top of them. These methods are applied to two datasets from the medical domain, the openly available breast cancer data from the UCI Archive and a recently collected running injury data. Our results show that the most important features differ depending on the technique. We argue that a combination of several explanation techniques could provide more reliable and trustworthy results. In particular, local explanations should be used in the most critical cases such as false negatives.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

AB0652 MACHINE LEARNING TO PREDICT EARLY TNF INHIBITOR USERS IN PATIENTS WITH ANKYLOSING SPONDYLITIS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3743 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1620.1-1621

Author(s):

J. Lee ◽

H. Kim ◽

S. Y. Kang ◽

S. Lee ◽

Y. H. Eun ◽

...

Keyword(s):

Machine Learning ◽

Ankylosing Spondylitis ◽

Tnf Inhibitors ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

Background:Tumor necrosis factor (TNF) inhibitors are important drugs in treating patients with ankylosing spondylitis (AS). However, they are not used as a first-line treatment for AS. There is an insufficient treatment response to the first-line treatment, non-steroidal anti-inflammatory drugs (NSAIDs), in over 40% of patients. If we can predict who will need TNF inhibitors at an earlier phase, adequate treatment can be provided at an appropriate time and potential damages can be avoided. There is no precise predictive model at present. Recently, various machine learning methods show great performances in predictions using clinical data.Objectives:We aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis.Methods:The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early TNF inhibitor users treated by TNF inhibitors within six months of their follow-up (early-TNF users), and the others (non-early-TNF users). Machine learning models were formulated to predict the early-TNF users using the baseline data. Additionally, feature importance analysis was performed to delineate significant baseline characteristics.Results:The numbers of early-TNF and non-early-TNF users were 90 and 509, respectively. The best performing ANN model utilized 3 hidden layers with 50 hidden nodes each; its performance (area under curve (AUC) = 0.75) was superior to logistic regression model, support vector machine, and random forest model (AUC = 0.72, 0.65, and 0.71, respectively) in predicting early-TNF users. Feature importance analysis revealed erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), and height as the top significant baseline characteristics for predicting early-TNF users. Among these characteristics, height was revealed by machine learning models but not by conventional statistical techniques.Conclusion:Our model displayed superior performance in predicting early TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.Disclosure of Interests:None declared

Download Full-text

High-Performance Deep Learning Models for Seismic Noise Detection and Quality Control in the Processing Workflow

10.3997/2214-4609.202112414 ◽

2021 ◽

Author(s):

J. Liu ◽

J. Eldridge ◽

A. Kudarova ◽

P. Thomas ◽

P. Devarakota ◽

...

Keyword(s):

Quality Control ◽

Deep Learning ◽

Seismic Noise ◽

High Performance ◽

Noise Detection ◽

Learning Models

Download Full-text

Towards High-Performance Deep Learning Models in Tool Wear Classification with Generative Adversarial Networks

Journal of Materials Processing Technology ◽

10.1016/j.jmatprotec.2021.117484 ◽

2021 ◽

pp. 117484

Author(s):

Dirk Alexander Molitor ◽

Christian Kubik ◽

Marco Becker ◽

Ruben Helmut Hetfleisch ◽

Fan Lyu ◽

...

Keyword(s):

Deep Learning ◽

Tool Wear ◽

High Performance ◽

Generative Adversarial Networks ◽

Learning Models ◽

Adversarial Networks ◽

Tool Wear Classification

Download Full-text

Online learning behavior analysis based on machine learning

Asian Association of Open Universities Journal ◽

10.1108/aaouj-08-2019-0029 ◽

2019 ◽

Vol 14 (2) ◽

pp. 97-106

Author(s):

Ning Yan ◽

Oliver Tat-Sheung Au

Keyword(s):

Machine Learning ◽

Online Learning ◽

Correlation Analysis ◽

Prediction Accuracy ◽

Classification Models ◽

Limited Data ◽

Learning Models ◽

Learning Behavior ◽

Content Type ◽

Machine Learning Models

Purpose The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data. Design/methodology/approach The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues. Findings Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper. Originality/value This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.

Download Full-text

A Comparative Assessment of Six Machine Learning Models for Prediction of Bending Force in Hot Strip Rolling Process

Metals ◽

10.3390/met10050685 ◽

2020 ◽

Vol 10 (5) ◽

pp. 685 ◽

Cited By ~ 2

Author(s):

Xu Li ◽

Feng Luan ◽

Yan Wu

Keyword(s):

Prediction Accuracy ◽

Computational Cost ◽

Regression Tree ◽

Prediction Performance ◽

Learning Models ◽

Hot Strip Rolling ◽

Strip Rolling ◽

Bending Force ◽

Hot Strip ◽

Machine Learning Models

In the hot strip rolling (HSR) process, accurate prediction of bending force can improve the control accuracy of the strip crown and flatness, and further improve the strip shape quality. In this paper, six machine learning models, including Artificial Neural Network (ANN), Support Vector Machine (SVR), Classification and Regression Tree (CART), Bagging Regression Tree (BRT), Least Absolute Shrinkage and Selection operator (LASSO), and Gaussian Process Regression (GPR), were applied to predict the bending force in the HSR process. A comparative experiment was carried out based on a real-life dataset, and the prediction performance of the six models was analyzed from prediction accuracy, stability, and computational cost. The prediction performance of the six models was assessed using three evaluation metrics of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results show that the GPR model is considered as the optimal model for bending force prediction with the best prediction accuracy, better stability, and acceptable computational cost. The prediction accuracy and stability of CART and ANN are slightly lower than that of GPR. Although BRT also shows a good combination of prediction accuracy and computational cost, the stability of BRT is the worst in the six models. SVM not only has poor prediction accuracy, but also has the highest computational cost while LASSO showed the worst prediction accuracy.

Download Full-text

Machine learning to predict early TNF inhibitor users in patients with ankylosing spondylitis

Scientific Reports ◽

10.1038/s41598-020-75352-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Seulkee Lee ◽

Yeonghee Eun ◽

Hyungjin Kim ◽

Hoon-Suk Cha ◽

Eun-Mi Koh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.

Download Full-text

Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data

Water Research ◽

10.1016/j.watres.2019.115454 ◽

2020 ◽

Vol 171 ◽

pp. 115454 ◽

Cited By ~ 9

Author(s):

Kangyang Chen ◽

Hexia Chen ◽

Chuanlong Zhou ◽

Yichao Huang ◽

Xiangyang Qi ◽

...

Keyword(s):

Machine Learning ◽

Water Quality ◽

Big Data ◽

Surface Water Quality ◽

Prediction Performance ◽

Quality Prediction ◽

Learning Models ◽

Water Parameters ◽

Water Quality Prediction ◽

Machine Learning Models

Download Full-text

The application of Artificial Neural Network and k-Nearest Neighbour classification models in the scouting of high-performance archers from a selected fitness and motor skill performance parameters

Science & Sports ◽

10.1016/j.scispo.2019.02.006 ◽

2019 ◽

Vol 34 (4) ◽

pp. e241-e249 ◽

Cited By ~ 7

Author(s):

R. Muazu Musa ◽

A.P.P. Abdul Majeed ◽

Z. Taha ◽

M.R. Abdullah ◽

A.B. Husin Musawi Maliki ◽

...

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Motor Skill ◽

High Performance ◽

Nearest Neighbour ◽

Classification Models ◽

Performance Parameters ◽

Skill Performance ◽

Artificial Neural

Download Full-text