Predicting Next-Day Perceived and Physiological Stress of Pregnant Women Using Machine Learning and Explainability: Algorithm Development and Validation (Preprint)

Mapping Intimacies ◽

10.2196/preprints.33850 ◽

2021 ◽

Author(s):

Ada Ng ◽

Boyang Wei ◽

Jaya Jain ◽

Erin Ward ◽

Darius Tandon ◽

...

Keyword(s):

Machine Learning ◽

Data Collection ◽

Pregnant Women ◽

Perceived Stress ◽

Physiological Stress ◽

Behavioral Therapy ◽

Learning Models ◽

Adverse Health Effects ◽

Using Data ◽

Machine Learning Models

BACKGROUND Cognitive behavioral therapy (CBT)-based interventions are effective in reducing prenatal stress, which can have severe adverse health effects on mother and newborn if unaddressed. Predicting next-day physiologic or perceived stress can help to inform and enable preemptive interventions for a likely physiologically and/or perceptibly stressful day. Machine learning models are useful tools that can be developed to predict next-day physiologic and perceived stress using data collected the previous day. Such models can improve our understanding of the specific factors that predict physiologic and perceived stress and will also allow researchers to develop systems that collect selected features for assessment for clinical trials in order to minimize the burden of data collection. OBJECTIVE To build and evaluate a machine-learned model that predicts next-day physiologic and perceived stress using sensor-based, ecological momentary assessment (EMA)-based, and intervention-based features and to explain the prediction results. METHODS We enrolled pregnant women into a prospective proof-of-concept study and collected electrocardiography, EMA, and CBT intervention data over 12 weeks. We used the data to train and evaluate six machine learning models to predict next-day physiologic and perceived stress. After selecting the best performing model, SHapley Additive exPlanations (SHAP) were used to identify feature importance and explainability of each feature. RESULTS A total of 16 pregnant women enrolled in the study. Overall, 4157.18 hours of data were collected, and participants answered 2838 EMAs. After applying feature selection, 8 and 10 features were found to positively predict next-day physiologic and perceived stress, respectively. A random forest classifier performed the best in predicting next-day physiologic (F1-score 0.84) and next-day perceived stress (F1-score 0.74) using all features. While any subset of sensor-based, EMA-based, and/or intervention-based features could reliably predict next-day physiologic stress, EMA-based features were necessary to predict next-day perceived stress. Analysis of explainability metrics showed that prolonged duration of physiologic stress was highly predictive of next-day physiologic stress and that physiologic stress and perceived stress were temporally divergent. CONCLUSIONS In this study we were able to build interpretable machine learning models to predict next-day physiologic and perceived stress, and we identify unique features that were highly predictive of next-day stress that can help reduce the burden of data collection.

Download Full-text

Data-Driven Approach for Predicting and Explaining the Risk of Long-Term Unemployment

E3S Web of Conferences ◽

10.1051/e3sconf/202021401023 ◽

2020 ◽

Vol 214 ◽

pp. 01023

Author(s):

Linan (Frank) Zhao

Keyword(s):

Machine Learning ◽

Age Groups ◽

Learning Models ◽

Public Authorities ◽

Ensemble Machine Learning ◽

European Public ◽

Data Driven Approach ◽

Using Data ◽

Machine Learning Models

Long-term unemployment has significant societal impact and is of particular concerns for policymakers with regard to economic growth and public finances. This paper constructs advanced ensemble machine learning models to predict citizens’ risks of becoming long-term unemployed using data collected from European public authorities for employment service. The proposed model achieves 81.2% accuracy on identifying citizens with high risks of long-term unemployment. This paper also examines how to dissect black-box machine learning models by offering explanations at both a local and global level using SHAP, a state-of-the-art model-agnostic approach to explain factors that contribute to long-term unemployment. Lastly, this paper addresses an under-explored question when applying machine learning in the public domain, that is, the inherent bias in model predictions. The results show that popular models such as gradient boosted trees may produce unfair predictions against senior age groups and immigrants. Overall, this paper sheds light on the recent increasing shift for governments to adopt machine learning models to profile and prioritize employment resources to reduce the detrimental effects of long-term unemployment and improve public welfare.

Download Full-text

Machine Learning for the Diagnosis of Orthodontic Extractions: A Computational Analysis Using Ensemble Learning

Bioengineering ◽

10.3390/bioengineering7020055 ◽

2020 ◽

Vol 7 (2) ◽

pp. 55

Author(s):

Yasir Suhail ◽

Madhur Upadhyay ◽

Aditya Chhibber ◽

Kshitiz

Keyword(s):

Machine Learning ◽

Human Error ◽

Ensemble Methods ◽

Treatment Decision ◽

Learning Models ◽

Error Training ◽

Treatment Plans ◽

Suitable Treatment ◽

Using Data ◽

Machine Learning Models

Extraction of teeth is an important treatment decision in orthodontic practice. An expert system that is able to arrive at suitable treatment decisions can be valuable to clinicians for verifying treatment plans, minimizing human error, training orthodontists, and improving reliability. In this work, we train a number of machine learning models for this prediction task using data for 287 patients, evaluated independently by five different orthodontists. We demonstrate why ensemble methods are particularly suited for this task. We evaluate the performance of the machine learning models and interpret the training behavior. We show that the results for our model are close to the level of agreement between different orthodontists.

Download Full-text

Collaboration Analytics — Current State and Potential Futures

Journal of Learning Analytics ◽

10.18608/jla.2021.7447 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1-12

Author(s):

Bertrand Schneider ◽

Nia Dowell ◽

Kate Thompson

Keyword(s):

Machine Learning ◽

Data Collection ◽

Theory Building ◽

Learning Models ◽

Special Issue ◽

Current State ◽

Machine Learning Models

This special issue brings together a rich collection of papers in collaboration analytics. With topics including theory building, data collection, modelling, designing frameworks, and building machine learning models, this issue represents some of the most active areas of research in the field. In this editorial, we summarize the papers; discuss the nature of collaboration analytics based on this body of work; describe the associated opportunities, challenges, and risks; and depict potential futures for the field. We conclude by discussing the implications of this special issue for collaboration analytics.

Download Full-text

Machine learning can accurately predict pre-admission baseline hemoglobin and creatinine in intensive care patients

npj Digital Medicine ◽

10.1038/s41746-019-0192-z ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 2

Author(s):

Antonin Dauvin ◽

Carolina Donado ◽

Patrik Bachtiger ◽

Ke-Chun Huang ◽

Christopher Martin Sauer ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Intensive Care ◽

Impaired Renal Function ◽

Kidney Injury ◽

Learning Models ◽

Intensive Care Patients ◽

Using Data ◽

Machine Learning Models ◽

Baseline Hemoglobin

AbstractPatients admitted to the intensive care unit frequently have anemia and impaired renal function, but often lack historical blood results to contextualize the acuteness of these findings. Using data available within two hours of ICU admission, we developed machine learning models that accurately (AUC 0.86–0.89) classify an individual patient’s baseline hemoglobin and creatinine levels. Compared to assuming the baseline to be the same as the admission lab value, machine learning performed significantly better at classifying acute kidney injury regardless of initial creatinine value, and significantly better at predicting baseline hemoglobin value in patients with admission hemoglobin of <10 g/dl.

Download Full-text

A Preliminary Experimental Outline to Train Machine Learning Models for the Unobtrusive, Real-Time Detection of Acute Physiological Stress Levels during Training Exercises

The 14th PErvasive Technologies Related to Assistive Environments Conference ◽

10.1145/3453892.3461833 ◽

2021 ◽

Author(s):

Andre Jeworutzki ◽

Jan Schwarzer ◽

Kai von Luck ◽

Susanne Draheim ◽

Qi Wang

Keyword(s):

Machine Learning ◽

Real Time ◽

Physiological Stress ◽

Learning Models ◽

Stress Levels ◽

Real Time Detection ◽

Training Exercises ◽

Machine Learning Models

Download Full-text

Using AI for Mental Health Analysis and Prediction in School Surveys

European Journal of Public Health ◽

10.1093/eurpub/ckaa165.336 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

H S Adnan ◽

A Srsic ◽

P M Venticich ◽

D M R Townend

Keyword(s):

Public Health ◽

Mental Health ◽

Artificial Intelligence ◽

Machine Learning ◽

Data Analysis ◽

Data Collection ◽

Health Surveillance ◽

Current Paper ◽

Learning Models ◽

Machine Learning Models

Abstract Background Childhood and adolescence are critical stages of life for mental health and well-being. Schools are a key setting for mental health promotion and illness prevention. One in five children and adolescents have a mental disorder, about half of mental disorders beginning before the age of 14. Beneficial and explainable artificial intelligence can replace current paper-based and online approaches to school mental health surveys. This can enhance data acquisition, interoperability, data-driven analysis, trust and compliance. This paper presents a model for using chatbots for non-obtrusive data collection and supervised machine learning models for data analysis; and discusses ethical considerations pertaining to the use of these models. Methods For data acquisition, the proposed model uses chatbots which interact with students. The conversation log acts as the source of raw data for the machine learning. Pre-processing of the data is automated by filtering for keywords and phrases. Existing survey results, obtained through current paper-based data collection methods, are evaluated by domain experts (health professionals). These can be used to create a test dataset to validate the machine learning models. Supervised learning can then be deployed to classify specific behaviour and mental health patterns. Results We present a model that can be used to improve upon current paper-based data collection and manual data analysis methods. An open-source GitHub repository contains necessary tools and components of this model. Privacy is respected through rigorous observance of confidentiality and data protection requirements. Critical reflection on these ethics and law aspects is included in the project. Conclusions This model strengthens mental health surveillance in schools. The same tools and components could be applied to other public health data. Future extensions of this model could also incorporate unsupervised learning to find clusters and patterns of unknown effects. Key messages This model uses artificial intelligence to improve mental health surveillance and evaluation in school settings. Artificial intelligence can be applied more broadly in public health to harness the potential of predictive models.

Download Full-text

Shear stress distribution prediction in symmetric compound channels using data mining and machine learning models

Frontiers of Structural and Civil Engineering ◽

10.1007/s11709-020-0634-3 ◽

2020 ◽

Vol 14 (5) ◽

pp. 1097-1109

Author(s):

Zohreh Sheikh Khozani ◽

Khabat Khosravi ◽

Mohammadamin Torabi ◽

Amir Mosavi ◽

Bahram Rezaei ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Shear Stress ◽

Stress Distribution ◽

Shear Stress Distribution ◽

Learning Models ◽

Compound Channels ◽

Using Data ◽

Machine Learning Models

Download Full-text

Fetal birthweight prediction with measured data by a temporal machine learning method

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01388-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jing Tao ◽

Zhenming Yuan ◽

Li Sun ◽

Kai Yu ◽

Zhifen Zhang

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Pregnant Women ◽

Electronic Medical Records ◽

Gestational Age ◽

Empirical Formula ◽

Medical Records ◽

Learning Models ◽

Accuracy Rate ◽

Machine Learning Models

Abstract Background Birthweight is an important indicator during the fetal development process to protect the maternal and infant safety. However, birthweight is difficult to be directly measured, and is usually roughly estimated by the empirical formulas according to the experience of the doctors in clinical practice. Methods This study attempts to combine multiple electronic medical records with the B-ultrasonic examination of pregnant women to construct a hybrid birth weight predicting classifier based on long short-term memory (LSTM) networks. The clinical data were collected from 5,759 Chinese pregnant women who have given birth, with more than 57,000 obstetric electronic medical records. We evaluated the prediction by the mean relative error (MRE) and the accuracy rate of different machine learning classifiers at different predicting periods for first delivery and multiple deliveries. Additionally, we evaluated the classification accuracies of different classifiers respectively for the Small-for-Gestational-age (SGA), Large-for-Gestational-Age (LGA) and Appropriate-for-Gestational-Age (AGA) groups. Results The results show that the accuracy rate of the prediction model using Convolutional Neuron Networks (CNN), Random Forest (RF), Linear-Regression, Support Vector Regression (SVR), Back Propagation Neural Network(BPNN), and the proposed hybrid-LSTM at the 40th pregnancy week for first delivery were 0.498, 0.662, 0.670, 0.680, 0.705 and 0.793, respectively. Among the groups of less than 39th pregnancy week, the 39th pregnancy week and more than 40th week, the hybrid-LSTM model obtained the best accuracy and almost the least MRE compared with those of machine learning models. Not surprisingly, all the machine learning models performed better than the empirical formula. In the SGA, LGA and AGA group experiments, the average accuracy by the empirical formula, logistic regression (LR), BPNN, CNN, RF and Hybrid-LSTM were 0.780, 0.855, 0.890, 0.906, 0.916 and 0.933, respectively. Conclusions The results of this study are helpful for the birthweight prediction and development of guidelines for clinical delivery treatments. It is also useful for the implementation of a decision support system using the temporal machine learning prediction model, as it can assist the clinicians to make correct decisions during the obstetric examinations and remind pregnant women to manage their weight.

Download Full-text

S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-1362 ◽

2021 ◽

Vol 9 (4) ◽

pp. 769-788

Author(s):

Shan Zhong ◽

David Hitchcock

Keyword(s):

Machine Learning ◽

Stock Prices ◽

Stock Price ◽

Language Models ◽

News Item ◽

Learning Models ◽

Stock Price Prediction ◽

Price Prediction ◽

Using Data ◽

Machine Learning Models

We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.

Download Full-text

Early diagnosis of gestational diabetes mellitus using circulating microRNAs

Acta Endocrinologica ◽

10.1530/eje-19-0206 ◽

2019 ◽

Vol 181 (5) ◽

pp. 565-577 ◽

Cited By ~ 13

Author(s):

Liron Yoffe ◽

Avital Polsky ◽

Avital Gilam ◽

Chen Raff ◽

Federico Mecacci ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Gestational Diabetes ◽

Gestational Diabetes Mellitus ◽

Pregnant Women ◽

Predictive Value ◽

First Trimester ◽

Learning Models ◽

Circulating Micrornas ◽

Machine Learning Models

Design Gestational diabetes mellitus (GDM) is one of the most common pregnancy complications and its prevalence is constantly rising worldwide. Diagnosis is commonly in the late second or early third trimester of pregnancy, though the development of GDM starts early; hence, first-trimester diagnosis is feasible. Objective Our objective was to identify microRNAs that best distinguish GDM samples from those of healthy pregnant women and to evaluate the predictive value of microRNAs for GDM detection in the first trimester. Methods We investigated the abundance of circulating microRNAs in the plasma of pregnant women in their first trimester. Two populations were included in the study to enable population-specific as well as cross-population inspection of expression profiles. Each microRNA was tested for differential expression in GDM vs control samples, and their efficiency for GDM detection was evaluated using machine-learning models. Results Two upregulated microRNAs (miR-223 and miR-23a) were identified in GDM vs the control set, and validated on a new cohort of women. Using both microRNAs in a logistic-regression model, we achieved an AUC value of 0.91. We further demonstrated the overall predictive value of microRNAs using several types of multivariable machine-learning models that included the entire set of expressed microRNAs. All models achieved high accuracy when applied on the dataset (mean AUC = 0.77). The significance of the classification results was established via permutation tests. Conclusions Our findings suggest that circulating microRNAs are potential biomarkers for GDM in the first trimester. This warrants further examination and lays the foundation for producing a novel early non-invasive diagnostic tool for GDM.

Download Full-text