Predicting Math Student Success in the Initial Phase of College With Sparse Information Using Approaches From Statistical Learning

Frontiers in Education ◽

10.3389/feduc.2020.502698 ◽

2020 ◽

Vol 5 ◽

Author(s):

Pascal Kilian ◽

Frank Loose ◽

Augustin Kelava

Keyword(s):

Machine Learning ◽

Student Success ◽

Statistical Learning ◽

Initial Phase ◽

Feature Space ◽

Risk Groups ◽

Proof Of Concept ◽

First Semester ◽

Machine Learning Methods ◽

Extensive Variable

In math teacher education, dropout research relies mostly on frameworks which carry out extensive variable collections leading to a lack of practical applicability. We investigate the completion of a first semester course as a dropout indicator and thereby provide not only good predictions, but also generate interpretable and practicable results together with easy-to-understand recommendations. As proof-of-concept, a sparse feature space together with machine learning methods is used for prediction of dropout, wherein the most predictive features have to be identified. Interpretability can be reached by introducing risk groups for the students. Implications for interventions are discussed.

Download Full-text

treeheatr: an R package for interpretable decision tree visualizations

10.1101/2020.07.10.196352 ◽

2020 ◽

Author(s):

Trang T. Le ◽

Jason H. Moore

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Feature Space ◽

R Package ◽

Tree Structure ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Machine Learning Methods ◽

Link Type

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]

Download Full-text

Machine learning methods to predict child posttraumatic stress: a proof of concept study

BMC Psychiatry ◽

10.1186/s12888-017-1384-1 ◽

2017 ◽

Vol 17 (1) ◽

Cited By ~ 21

Author(s):

Glenn N. Saxe ◽

Sisi Ma ◽

Jiwen Ren ◽

Constantin Aliferis

Keyword(s):

Machine Learning ◽

Posttraumatic Stress ◽

Proof Of Concept ◽

Learning Methods ◽

Concept Study ◽

Machine Learning Methods

Download Full-text

A post-method condition analysis of using ensemble machine learning for cancer prognosis and diagnosis: a systematic review

10.21203/rs.2.18222/v1 ◽

2019 ◽

Author(s):

Leila Mirsadeghi ◽

Ali Mohammad Banaei-Moghaddam ◽

Seyed Reza Beh-Afarin ◽

Reza Haji Hosseini ◽

Kaveh Kavousi

Keyword(s):

Machine Learning ◽

Empirical Studies ◽

Ensemble Methods ◽

Feature Space ◽

Ensemble Classifier ◽

Cancer Prognosis ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Different Types

Abstract Background: Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance.Methods: This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study is to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies as regards EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures.Results: By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values.Conclusions: To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.

Download Full-text

A Post-Method Condition Analysis of Using Ensemble Machine Learning for Cancer Prognosis and Diagnosis: a systematic review

10.21203/rs.2.10561/v1 ◽

2019 ◽

Author(s):

Kaveh Kavousi ◽

Leila Mirsadeghi ◽

Reza Haji Hosseini ◽

Ali Mohammad Banaei-Moghaddam ◽

Seyed Reza Beh-Afarin

Keyword(s):

Machine Learning ◽

Empirical Studies ◽

Ensemble Methods ◽

Feature Space ◽

Ensemble Classifier ◽

Cancer Prognosis ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Different Types

Abstract Background Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance. Methods This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study was to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies regarding EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures. Results By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values. Conclusions To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.

Download Full-text

Dimensionality reduction and clustering of time series for anomaly detection in a supermarket heating system

Journal of Physics Conference Series ◽

10.1088/1742-6596/2042/1/012027 ◽

2021 ◽

Vol 2042 (1) ◽

pp. 012027

Author(s):

Lorenzo Salmina ◽

Roberto Castello ◽

Justine Stoll ◽

Jean-Louis Scartezzini

Keyword(s):

Machine Learning ◽

Time Series ◽

Dimensionality Reduction ◽

A Priori ◽

Feature Space ◽

Heating System ◽

Energy System ◽

Industrial Building ◽

Heating Systems ◽

Machine Learning Methods

Abstract A timely identification of an anomalous functioning of the energy system of an industrial building would increase the efficiency and the resilience of the energy infrastructure, beside reducing the economic wastage. This work has been inspired by the need of identifying, for a series of supermarket buildings in Switzerland, the failures happening in their heating systems across the years in an unsupervised and easy-to-visualize fashion for the building managers. The lack of any a-priori label differentiating between typical and anomalous behaviors calls for the usage of unsupervised machine learning methods to extract the relevant features to describe the system operations, to reduce the dimension of the feature space, and to cluster together similar patterns of operations. The method is validated on a standard supermarket building, where it successfully discriminates winter and summer operations from periods of refurbishment or malfunctioning of the heating system.

Download Full-text

Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

Medical Physics ◽

10.1118/1.4944738 ◽

2016 ◽

Vol 43 (5) ◽

pp. 2040-2052 ◽

Cited By ~ 19

Author(s):

Noorazrul Yahya ◽

Martin A. Ebert ◽

Max Bulsara ◽

Michael J. House ◽

Angel Kennedy ◽

...

Keyword(s):

Machine Learning ◽

Learning Strategies ◽

Statistical Learning ◽

Predictive Models ◽

External Beam Radiotherapy ◽

Urinary Symptoms ◽

External Beam ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

treeheatr: an R package for interpretable decision tree visualizations

Bioinformatics ◽

10.1093/bioinformatics/btaa662 ◽

2020 ◽

Author(s):

Trang T Le ◽

Jason H Moore

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Feature Space ◽

R Package ◽

Tree Structure ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Continuous Integration ◽

Machine Learning Methods

Abstract Summary treeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods. Availability and implementation The treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous integration.

Download Full-text

Automatic Anomaly Detection on In-Production Manufacturing Machines Using Statistical Learning Methods

Sensors ◽

10.3390/s20082344 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2344 ◽

Cited By ~ 7

Author(s):

Federico Pittino ◽

Michael Puggl ◽

Thomas Moldaschl ◽

Christina Hirschl

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Statistical Learning ◽

Control Charts ◽

Industry 4.0 ◽

Production Environment ◽

Statistical Machine Learning ◽

Learning Methods ◽

Machine Learning Methods ◽

And Control

Anomaly detection is becoming increasingly important to enhance reliability and resiliency in the Industry 4.0 framework. In this work, we investigate different methods for anomaly detection on in-production manufacturing machines taking into account their variability, both in operation and in wear conditions. We demonstrate how the nature of the available data, featuring any anomaly or not, is of importance for the algorithmic choice, discussing both statistical machine learning methods and control charts. We finally develop methods for automatic anomaly detection, which obtain a recall close to one on our data. Our developed methods are designed not to rely on a continuous recalibration and hand-tuning by the machine user, thereby allowing their deployment in an in-production environment robustly and efficiently.

Download Full-text

Machine Learning Methods for "Small-n, Large-p" Problems: Understanding the Complex Drivers of Modern-Day Slavery

10.21203/rs.3.rs-296275/v1 ◽

2021 ◽

Author(s):

Rosa Lavelle-Hill ◽

Anjali Mazumder ◽

James Goulding ◽

Gavin Smith ◽

Todd Landman

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Feature Space ◽

Machine Learning Methods ◽

Effective Interventions ◽

Modern Slavery ◽

Out Of Sample ◽

Small N ◽

Modern Day Slavery ◽

Linear Machine

Abstract 40 million people are estimated to be in some form of modern slavery across the globe. Understanding the factors that make any particular individual or geographical region vulnerable to such abuse is essential for the development of effective interventions and policy. Efforts to isolate and assess the importance of individual drivers statistically are impeded by two key challenges: data scarcity and high dimensionality. The hidden nature of modern slavery restricts available datapoints; and the large number of candidate variables that are potentially predictive of slavery inflates the feature space exponentially. The result is a highly problematic "small-n, large-p' setting, where overfitting and multi-collinearity can render more traditional statistical approaches inapplicable. Recent advances in non-parametric computational methods, however, offer scope to overcome such challenges. We present an approach that combines non-linear machine learning models and strict cross-validation methods with novel variable importance techniques, emphasising the importance of stability of model explanations via Rashomon-set analysis. This approach is used to model the prevalence of slavery in 48 countries, with results bringing to light the importance predictive factors - such as a country's capacity to protect the physical security of women, which has previously been under-emphasized in the literature. Out-of-sample estimates of slavery prevalence are then made for countries where no survey data currently exists.

Download Full-text

Low-Cost Environmental and Motion Sensor Data for Complex Activity Recognition: Proof of Concept

Engineering Proceedings ◽

10.3390/ecsa-7-08194 ◽

2020 ◽

Vol 2 (1) ◽

pp. 54

Author(s):

Rok Novak ◽

David Kocman ◽

Johanna Amalia Robinson ◽

Tjaša Kanduč ◽

Denis Sarigiannis ◽

...

Keyword(s):

Machine Learning ◽

Low Cost ◽

Sensor Data ◽

Motion Sensor ◽

Data Sets ◽

Proof Of Concept ◽

Complex Activity ◽

Machine Learning Methods ◽

Complex Activities

The merge of new sensing technologies with machine learning methods can be used as a tool to recognize complex activities. A wearable particulate matter (PM) sensor, in combination with a motion tracker, was provided to 97 individuals for 7 days in two seasons. These data sets were used in three different models, constructed by the classification of activity. Using algorithms IBk, J48 and RandomForest for hourly (minute) values, an accuracy of 31.0 (23.1)%, 28.6 (22.0)% and 35.7 (23.0)%, respectively, was achieved. Most misclassified instances concern vaguely defined activities. Low accuracy can also be explained with the differences in time scales. The accuracy could be improved by more clearly defining the activities and collecting per-minute data.

Download Full-text