A Learning Analytics Approach to Identify Students at Risk of Dropout: A Case Study with a Technical Distance Education Course

Emanuel Marques Queiroga; João Ladislau Lopes; Kristofer Kappel; Marilton Aguiar; Ricardo Matsumura Araújo; Roberto Munoz; Rodolfo Villarroel; Cristian Cechinel

doi:10.3390/app10113998

A Learning Analytics Approach to Identify Students at Risk of Dropout: A Case Study with a Technical Distance Education Course

Applied Sciences ◽

10.3390/app10113998 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3998 ◽

Cited By ~ 1

Author(s):

Emanuel Marques Queiroga ◽

João Ladislau Lopes ◽

Kristofer Kappel ◽

Marilton Aguiar ◽

Ricardo Matsumura Araújo ◽

...

Keyword(s):

Genetic Algorithm ◽

At Risk ◽

Learning Analytics ◽

Prediction Models ◽

Demographic Data ◽

Characteristic Curve ◽

At Risk Students ◽

Virtual Learning Environment ◽

Learning Context ◽

Student Dropout

Contemporary education is a vast field that is concerned with the performance of education systems. In a formal e-learning context, student dropout is considered one of the main problems and has received much attention from the learning analytics research community, which has reported several approaches to the development of models for the early prediction of at-risk students. However, maximizing the results obtained by predictions is a considerable challenge. In this work, we developed a solution using only students’ interactions with the virtual learning environment and its derivative features for early predict at-risk students in a Brazilian distance technical high school course that is 103 weeks in duration. To maximize results, we developed an elitist genetic algorithm based on Darwin’s theory of natural selection for hyperparameter tuning. With the application of the proposed technique, we predicted the student at risk with an Area Under the Receiver Operating Characteristic Curve (AUROC) above 0.75 in the initial weeks of a course. The results demonstrate the viability of applying interaction count and derivative features to generate prediction models in contexts where access to demographic data is restricted. The application of a genetic algorithm to the tuning of hyperparameters classifiers can increase their performance in comparison with other techniques.

Download Full-text

Adopting Learning Analytics in a First-Year Veterinarian Professional Program: What We Could Know in Advance about Student Learning Progress

Journal of Veterinary Medical Education ◽

10.3138/jvme-2020-0045 ◽

2021 ◽

Vol 48 (6) ◽

pp. 720-728

Author(s):

Wenting Weng ◽

Nicola L. Ritter ◽

Karen Cornell ◽

Molly Gonzales

Keyword(s):

At Risk ◽

Academic Success ◽

Student Learning ◽

Learning Analytics ◽

Prediction Models ◽

Early Stage ◽

At Risk Students ◽

Learning Performance ◽

First Year ◽

Actual Performance

Over the past decade, the field of education has seen stark changes in the way that data are collected and leveraged to support high-stakes decision-making. Utilizing big data as a meaningful lens to inform teaching and learning can increase academic success. Data-driven research has been conducted to understand student learning performance, such as predicting at-risk students at an early stage and recommending tailored interventions to support services. However, few studies in veterinary education have adopted Learning Analytics. This article examines the adoption of Learning Analytics by using the retrospective data from the first-year professional Doctor of Veterinary Medicine program. The article gives detailed examples of predicting six courses from week 0 (i.e., before the classes started) to week 14 in the semester of Spring 2018. The weekly models for each course showed the change of prediction results as well as the comparison between the prediction results and students’ actual performance. From the prediction models, at-risk students were successfully identified at the early stage, which would help inform instructors to pay more attention to them at this point.

Download Full-text

On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining

Big Data and Cognitive Computing ◽

10.3390/bdcc6010006 ◽

2022 ◽

Vol 6 (1) ◽

pp. 6

Author(s):

Gomathy Ramaswami ◽

Teo Susnjak ◽

Anuradha Mathrani

Keyword(s):

At Risk ◽

Predictive Model ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

At Risk Students ◽

Educational Data Mining ◽

Generic Model ◽

Generic Models ◽

Excellent Candidate

Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.

Download Full-text

Early Detection of At-Risk Undergraduate Students through Academic Performance Predictors

Higher Education Studies ◽

10.5539/hes.v7n3p42 ◽

2017 ◽

Vol 7 (3) ◽

pp. 42

Author(s):

Vikash Rowtho

Keyword(s):

At Risk ◽

Academic Performance ◽

Early Detection ◽

Undergraduate Students ◽

At Risk Students ◽

Regression Analyses ◽

Linear Discriminant ◽

Student Dropout ◽

Performance Predictors ◽

A New Technique

Undergraduate student dropout is gradually becoming a global problem and the 39 Small Islands Developing States (SIDS) are no exception to this trend. The purpose of this research was to develop a method that can be used for early detection of students who are at-risk of performing poorly in their undergraduate studies. A sample of 279 students participated in the study conducted in a Mauritian private tertiary academic institution. Results of regression analyses identified the variables having a significant influence on academic performance. These variables were used in a linear discriminant analysis where 74 percent of the students could be correctly classified into three categories: at-risk, pass or fail. In conclusion, this study has proposed a new technique that can be used by institutions to determine significant academic performance predictors and then identify at-risk students upon whom interventions can be implemented prior to exams to address the problem of dropouts.

Download Full-text

Monitoring Students at the University: Design and Application of a Moodle Plugin

Applied Sciences ◽

10.3390/app10103469 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3469 ◽

Cited By ~ 1

Author(s):

María Consuelo Sáiz-Manzanares ◽

Raúl Marticorena-Sánchez ◽

César Ignacio García-Osorio

Keyword(s):

At Risk ◽

Early Detection ◽

At Risk Students ◽

Machine Learning Techniques ◽

Dropout Rates ◽

Dynamic Learning ◽

Academic Risk ◽

University Environment ◽

Student Dropout ◽

The University

Early detection of at-risk students is essential, especially in the university environment. Moreover, personalized learning has been shown to increase motivation and lower student dropout rates. At present, the average dropout rates among students following courses leading to the award of Spanish university degrees are around 18% and 42.8% for presential teaching and online courses, respectively. The objectives of this study are: (1) to design and to implement a Modular Object-Oriented Dynamic Learning Environment (Moodle) plugin, “eOrientation”, for the early detection of at-risk students; (2) to test the effectiveness of the “eOrientation” plugin on university students. We worked with 279 third-year students following health sciences degrees. A process for extracting information records was also implemented. In addition, a learning analytics module was developed, through which both supervised and unsupervised Machine Learning techniques can be applied. All these measures facilitated the personalized monitoring of the students and the easier detection of students at academic risk. The use of this tool could be of great importance to teachers and university governing teams, as it can assist the early detection of students at academic risk. Future studies will be aimed at testing the plugin using the Moodle environment on degree courses at other universities.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

Machine-Learning Prediction of Comorbid Substance Use Disorders in ADHD Youth Using Swedish Registry Data

10.1101/661983 ◽

2019 ◽

Author(s):

Yanli Zhang-James ◽

Qi Chen ◽

Ralf Kuja-Halkola ◽

Paul Lichtenstein ◽

Henrik Larsson ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Substance Use ◽

Substance Use Disorders ◽

Prediction Models ◽

Characteristic Curve ◽

Registry Data ◽

Longitudinal Models ◽

Cross Sectional ◽

Using Data

AbstractBackgroundChildren with attention-deficit/hyperactivity disorder (ADHD) have a high risk for substance use disorders (SUDs). Early identification of at-risk youth would help allocate scarce resources for prevention programs.MethodsPsychiatric and somatic diagnoses, family history of these disorders, measures of socioeconomic distress and information about birth complications were obtained from the national registers in Sweden for 19,787 children with ADHD born between 1989-1993. We trained 1) cross-sectional machine learning models using data available by age 17 to predict SUD diagnosis between ages 18-19; and 2) a longitudinal model to predict new diagnoses at each age.ResultsThe area under the receiver operating characteristic curve (AUC) was 0.73 and 0.71 for the random forest and multilayer perceptron cross-sectional models. A prior diagnosis of SUD was the most important predictor, accounting for 25% of correct predictions. However, after excluding this predictor, our model still significantly predicted the first-time diagnosis of SUD during age 18-19 with an AUC of 0.67. The average of the AUCs from longitudinal models predicting new diagnoses one, two, five and ten years in the future was 0.63.ConclusionsSignificant predictions of at-risk co-morbid SUDs in individuals with ADHD can be achieved using population registry data, even many years prior to the first diagnosis. Longitudinal models can potentially monitor their risks over time. More work is needed to create prediction models based on electronic health records or linked population-registers that are sufficiently accurate for use in the clinic.

Download Full-text

Using Heart Rate to Predict Students’ Academic Performance

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4740.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 5916-5920

Keyword(s):

At Risk ◽

Heart Rate ◽

Academic Performance ◽

Prediction Models ◽

At Risk Students ◽

Rate Data ◽

Threshold Values ◽

Heart Rate Data ◽

Rate Fluctuation ◽

Remedial Intervention

Timeliness was a missing factor in many studies on Academic Performance Prediction to identify at-risk students. This study embarked on a search to evaluate the feasibility of predicting students’ performance based on heart rate data collected during classes. This dimension of data was collected in the first four weeks after semester commencement to validate accurate prediction that will enable educationists to introduce remedial intervention to at-risk students. Another aim of this study is to determine the best threshold values for the different types of heart rate fluctuations that can be used in predicting academic achievements. The threshold values were tested further to verify whether the prediction model for individual course or combined courses was more accurate. Results revealed that heart rate data alone can achieve a maximum prediction accuracy of 88% and recall of 100%. Threshold values calculated in derived heart rate fluctuation types produces the best results. Prediction models for individual courses outperform the model using average threshold values of all courses.

Download Full-text

iMoodle: An Intelligent Gamified Moodle to Predict “at-risk” Students Using Learning Analytics Approaches

Smart Computing and Intelligence - Data Analytics Approaches in Educational Games and Gamification Systems ◽

10.1007/978-981-32-9335-9_6 ◽

2019 ◽

pp. 113-126 ◽

Cited By ~ 4

Author(s):

Mouna Denden ◽

Ahmed Tlili ◽

Fathi Essalmi ◽

Mohamed Jemni ◽

Maiga Chang ◽

...

Keyword(s):

At Risk ◽

Learning Analytics ◽

At Risk Students

Download Full-text

Using Learning Analytics to Predict At-Risk Students in Online Graduate Public Affairs and Administration Education

Journal of Public Affairs Education ◽

10.1080/15236803.2015.12001831 ◽

2015 ◽

Vol 21 (2) ◽

pp. 247-262 ◽

Cited By ~ 15

Author(s):

Jay Bainbridge ◽

James Melitski ◽

Anne Zahradnik ◽

Eitel J. M. Lauría ◽

Sandeep Jayaprakash ◽

...

Keyword(s):

At Risk ◽

Learning Analytics ◽

At Risk Students ◽

Public Affairs

Download Full-text

Using learning analytics to improve online formative quiz engagement

Irish Journal of Technology Enhanced Learning ◽

10.22554/ijtel.v3i1.25 ◽

2018 ◽

Vol 3 (1) ◽

Cited By ~ 1

Author(s):

Irene O'Dowd

Keyword(s):

Learning Analytics ◽

Teaching And Learning ◽

Virtual Learning Environment ◽

Digital Learning ◽

Learning Context ◽

Student Gender ◽

Patterns Of Use ◽

Primary Teaching ◽

Historic Data ◽

Formative Knowledge

This paper describes the findings of a small research study, conducted in a third-level online college, using learning analytics to examine the implementation of formative quizzes in a blended-learning post-primary teaching programme. Using historic data captured in a virtual learning environment (VLE) for a single cohort (n=126), patterns of use of formative Knowledge Check quizzes were analysed with particular regard to completion and retakes. Three hypotheses were tested using appropriate data correlation methods. Completion levels for quizzes were correlated with completion levels for other online tasks to see whether an increase in task workload resulted in a decrease in quiz engagement. A second test compared levels of quiz re-attempts with completion levels for other online tasks, to see whether different patterns of quiz attempts were linked to different levels of online engagement. Finally, the data was analysed to ascertain the relationship, if any, between student gender and different patterns of quiz attempts, to see if gender might be a factor in quiz engagement. The findings of this study suggested that the decrease in engagement with quizzes was not significantly related to task workload increase, and that there is a relationship between quiz re-attempts and higher module engagement. The findings are presented and discussed in the context of student engagement with online formative strategies in humanities-based subjects. Options are considered for enhancing engagement and formative value in this teaching and learning context; the potential of learning analytics in informing evidence-based improvements in digital learning design is also assessed.

Download Full-text