Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams

Andre P. Frade; Patrick McCabe; Richard I. Cooper

doi:10.1039/d0ce00111b

Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams

CrystEngComm ◽

10.1039/d0ce00111b ◽

2020 ◽

Vol 22 (43) ◽

pp. 7186-7192

Author(s):

Andre P. Frade ◽

Patrick McCabe ◽

Richard I. Cooper

Keyword(s):

Machine Learning ◽

Hydrogen Bond ◽

Prediction Model ◽

Hydrogen Bond Network ◽

Learning Models ◽

Network Prediction ◽

Bond Network ◽

Machine Learning Models

The value of a hydrogen bond network prediction model was improved using a tool to increase prediction trust. Its accuracy could be improved up to 73% or 89% with the compromise that only 34% and 8% of the test examples could be predicted.

Download Full-text

A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering ◽

10.1145/3368089.3417063 ◽

2020 ◽

Author(s):

Zi Peng ◽

Jinqiu Yang ◽

Tse-Hsun (Peter) Chen ◽

Lei Ma

Keyword(s):

Machine Learning ◽

Autonomous Driving ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Enhancing the understanding of hydrological responses induced by ecological water replenishment using improved machine learning models: A case study in Yongding River

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.145489 ◽

2021 ◽

Vol 768 ◽

pp. 145489

Author(s):

Kangning Sun ◽

Litang Hu ◽

Jianli Guo ◽

Zhengqiu Yang ◽

Yuanzheng Zhai ◽

...

Keyword(s):

Machine Learning ◽

Learning Models ◽

Hydrological Responses ◽

Yongding River ◽

Machine Learning Models

Download Full-text

Comparison of two optimized machine learning models for predicting displacement of rainfall-induced landslide: A case study in Sichuan Province, China

Engineering Geology ◽

10.1016/j.enggeo.2017.01.022 ◽

2017 ◽

Vol 218 ◽

pp. 213-222 ◽

Cited By ~ 29

Author(s):

Xing Zhu ◽

Qiang Xu ◽

Minggao Tang ◽

Wen Nie ◽

Shuqi Ma ◽

...

Keyword(s):

Machine Learning ◽

Sichuan Province ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Validation Machine Learning Models To Predict Score On Graduate Tests Based On High School Test And Other Factors, Case Study: Colombia.

10.18687/laccei2021.1.1.343 ◽

2021 ◽

Author(s):

Maryori Sabalza Mejia ◽

Carolina Campillo Jimenez ◽

Juan Carlos Martinez Santos

Keyword(s):

Machine Learning ◽

High School ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Severity Analysis of Heavy Vehicle Crashes Using Machine Learning Models: A Case Study in New Jersey

International Conference on Transportation and Development 2021 ◽

10.1061/9780784483534.025 ◽

2021 ◽

Author(s):

Ahmed Sajid Hasan ◽

Md. Asif Bin Kabir ◽

Mohammad Jalayer

Keyword(s):

Machine Learning ◽

New Jersey ◽

Heavy Vehicle ◽

Vehicle Crashes ◽

Learning Models ◽

Machine Learning Models

Download Full-text

A Comparative Study on Machine Learning Models for Paprika Growth Prediction Model with Temperature Changes

The Journal of Korean Institute of Communications and Information Sciences ◽

10.7840/kics.2021.46.12.2393 ◽

2021 ◽

Vol 46 (12) ◽

pp. 2393-2402

Author(s):

SaravanaKumar Venkatesan ◽

Jonghyun Lim ◽

Chanagsun Shin ◽

Yongyun Cho

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Prediction Model ◽

Learning Models ◽

Growth Prediction ◽

Temperature Changes ◽

Machine Learning Models

Download Full-text

Explainable machine learning: A case study on impedance tube measurements

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2342 ◽

2021 ◽

Vol 263 (3) ◽

pp. 3223-3234

Author(s):

Merten Stender ◽

Mathies Wedler ◽

Norbert Hoffmann ◽

Christian Adams

Keyword(s):

Machine Learning ◽

Absorption Coefficient ◽

Specimen Thickness ◽

Learning Models ◽

Impedance Tube ◽

Frequency Regime ◽

Hidden Patterns ◽

Model Diagnosis ◽

Machine Learning Models

Machine learning (ML) techniques allow for finding hidden patterns and signatures in data. Currently, these methods are gaining increased interest in engineering in general and in vibroacoustics in particular. Although ML methods are successfully applied, it is hardly understood how these black box-type methods make their decisions. Explainable machine learning aims at overcoming this issue by deepening the understanding of the decision-making process through perturbation-based model diagnosis. This paper introduces machine learning methods and reviews recent techniques for explainability and interpretability. These methods are exemplified on sound absorption coefficient spectra of one sound absorbing foam material measured in an impedance tube. Variances of the absorption coefficient measurements as a function of the specimen thickness and the operator are modeled by univariate and multivariate machine learning models. In order to identify the driving patterns, i.e. how and in which frequency regime the measurements are affected by the setup specifications, Shapley additive explanations are derived for the ML models. It is demonstrated how explaining machine learning models can be used to discover and express complicated relations in experimental data, thereby paving the way to novel knowledge discovery strategies in evidence-based modeling.

Download Full-text

Validating the Early Prediction Model for COPD Patients Care through a Federated Machine Learning Architecture on FAIR Data (Preprint)

10.2196/preprints.35307 ◽

2021 ◽

Author(s):

Celia ALVAREZ-ROMERO ◽

Alicia MARTÍNEZ-GARCÍA ◽

Jara Eloisa TERNERO-VEGA ◽

Pablo DÍAZ-JIMÉNEZ ◽

Carlos JIMÉNEZ-DE-JUAN ◽

...

Keyword(s):

Machine Learning ◽

Health Care ◽

Prediction Model ◽

Prospective Study ◽

Health Research ◽

Early Prediction ◽

Learning Models ◽

Copd Patients ◽

Readmission Risk ◽

Machine Learning Models

BACKGROUND Due to the nature of health data, its sharing and reuse for research are limited by legal, technical and ethical implications. In this sense, to address that challenge, and facilitate and promote the discovery of scientific knowledge, the FAIR (Findable, Accessible, Interoperable, and Reusable) principles help organizations to share research data in a secure, appropriate and useful way for other researchers. OBJECTIVE The objective of this study was the FAIRification of health research existing datasets and applying a federated machine learning architecture on top of the FAIRified datasets of different health research performing organizations. The whole FAIR4Health solution was validated through the assessment of the generated model for real-time prediction of 30-days readmission risk in patients with Chronic Obstructive Pulmonary Disease (COPD). METHODS The application of the FAIR principles in health research datasets in three different health care settings enabled a retrospective multicenter study for the generation of federated machine learning models, aiming to develop the early prediction model for 30-days readmission risk in COPD patients. This prediction model was implemented upon the FAIR4Health platform and, finally, an observational prospective study with 30-days follow-up was carried out in two health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective parts of the study. RESULTS The prediction model for the 30-days hospital readmission risk was trained using the retrospective data of 4.944 COPD patients. The assessment of the prediction model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients in total for the observational prospective study from April 2021 to September 2021. The significant accuracy (0.98) and precision (0.25) of the prediction model generated upon the FAIR4Health platform was observed and, as a result, the generated prediction of 30-day readmission risk was confirmed in 87% of the cases. CONCLUSIONS A clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified datasets from different health research performing organizations, providing an assessment for predicting 30-days readmission risk in COPD patients. This demonstration allowed to state the relevance and need of implementing a FAIR data policy to facilitate data sharing and reuse in health research.

Download Full-text

Empirical and machine learning models for predicting daily global solar radiation from sunshine duration: A review and case study in China

Renewable and Sustainable Energy Reviews ◽

10.1016/j.rser.2018.10.018 ◽

2019 ◽

Vol 100 ◽

pp. 186-212 ◽

Cited By ~ 68

Author(s):

Junliang Fan ◽

Lifeng Wu ◽

Fucang Zhang ◽

Huanjie Cai ◽

Wenzhi Zeng ◽

...

Keyword(s):

Machine Learning ◽

Solar Radiation ◽

Sunshine Duration ◽

Global Solar Radiation ◽

Learning Models ◽

Machine Learning Models

Download Full-text

GC-MS Fingerprints Profiling Using Machine Learning Models for Food Flavor Prediction

Processes ◽

10.3390/pr8010023 ◽

2019 ◽

Vol 8 (1) ◽

pp. 23

Author(s):

Kexin Bi ◽

Dong Zhang ◽

Tong Qiu ◽

Yizhen Huang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Gas Chromatography Mass Spectrometry ◽

Learning Models ◽

Fingerprint Image ◽

Whole Process ◽

Evaluation Problem ◽

Potential Issue ◽

Machine Learning Models

Food flavor quality evaluation is attracting continuous attention, but a suitable evaluation system is severely lacking. Gas chromatography-mass spectrometry/olfactometry (GC-MS/O) is widely used to solve the food flavor evaluation problem, but the olfactometry evaluation is unfeasible to be carried out in large batches and is unreliable due to potential issue of an operator or systematic laboratory effect. Thus, a novel fingerprint modeling and profiling process was proposed based on several machine learning models including convolutional neural network (CNN). The fingerprint template was created by the data analysis of existing GC-MS spectrum dataset. Then the fingerprint image generation program was applied for structuring the complex instrumental data. Food olfactometry result was obtained by a machine learning method based on CNN using fingerprint image as the input. The case study on peanut oil samples demonstrated the model accuracy of around 93%. By structure optimization and further dataset expansion, the whole process has the potential to be utilized by sensory laboratories for aroma analysis instead of humans.

Download Full-text