scholarly journals Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams

CrystEngComm ◽  
2020 ◽  
Vol 22 (43) ◽  
pp. 7186-7192
Author(s):  
Andre P. Frade ◽  
Patrick McCabe ◽  
Richard I. Cooper

The value of a hydrogen bond network prediction model was improved using a tool to increase prediction trust. Its accuracy could be improved up to 73% or 89% with the compromise that only 34% and 8% of the test examples could be predicted.

2021 ◽  
Vol 263 (3) ◽  
pp. 3223-3234
Author(s):  
Merten Stender ◽  
Mathies Wedler ◽  
Norbert Hoffmann ◽  
Christian Adams

Machine learning (ML) techniques allow for finding hidden patterns and signatures in data. Currently, these methods are gaining increased interest in engineering in general and in vibroacoustics in particular. Although ML methods are successfully applied, it is hardly understood how these black box-type methods make their decisions. Explainable machine learning aims at overcoming this issue by deepening the understanding of the decision-making process through perturbation-based model diagnosis. This paper introduces machine learning methods and reviews recent techniques for explainability and interpretability. These methods are exemplified on sound absorption coefficient spectra of one sound absorbing foam material measured in an impedance tube. Variances of the absorption coefficient measurements as a function of the specimen thickness and the operator are modeled by univariate and multivariate machine learning models. In order to identify the driving patterns, i.e. how and in which frequency regime the measurements are affected by the setup specifications, Shapley additive explanations are derived for the ML models. It is demonstrated how explaining machine learning models can be used to discover and express complicated relations in experimental data, thereby paving the way to novel knowledge discovery strategies in evidence-based modeling.


2021 ◽  
Author(s):  
Celia ALVAREZ-ROMERO ◽  
Alicia MARTÍNEZ-GARCÍA ◽  
Jara Eloisa TERNERO-VEGA ◽  
Pablo DÍAZ-JIMÉNEZ ◽  
Carlos JIMÉNEZ-DE-JUAN ◽  
...  

BACKGROUND Due to the nature of health data, its sharing and reuse for research are limited by legal, technical and ethical implications. In this sense, to address that challenge, and facilitate and promote the discovery of scientific knowledge, the FAIR (Findable, Accessible, Interoperable, and Reusable) principles help organizations to share research data in a secure, appropriate and useful way for other researchers. OBJECTIVE The objective of this study was the FAIRification of health research existing datasets and applying a federated machine learning architecture on top of the FAIRified datasets of different health research performing organizations. The whole FAIR4Health solution was validated through the assessment of the generated model for real-time prediction of 30-days readmission risk in patients with Chronic Obstructive Pulmonary Disease (COPD). METHODS The application of the FAIR principles in health research datasets in three different health care settings enabled a retrospective multicenter study for the generation of federated machine learning models, aiming to develop the early prediction model for 30-days readmission risk in COPD patients. This prediction model was implemented upon the FAIR4Health platform and, finally, an observational prospective study with 30-days follow-up was carried out in two health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective parts of the study. RESULTS The prediction model for the 30-days hospital readmission risk was trained using the retrospective data of 4.944 COPD patients. The assessment of the prediction model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients in total for the observational prospective study from April 2021 to September 2021. The significant accuracy (0.98) and precision (0.25) of the prediction model generated upon the FAIR4Health platform was observed and, as a result, the generated prediction of 30-day readmission risk was confirmed in 87% of the cases. CONCLUSIONS A clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified datasets from different health research performing organizations, providing an assessment for predicting 30-days readmission risk in COPD patients. This demonstration allowed to state the relevance and need of implementing a FAIR data policy to facilitate data sharing and reuse in health research.


Processes ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 23
Author(s):  
Kexin Bi ◽  
Dong Zhang ◽  
Tong Qiu ◽  
Yizhen Huang

Food flavor quality evaluation is attracting continuous attention, but a suitable evaluation system is severely lacking. Gas chromatography-mass spectrometry/olfactometry (GC-MS/O) is widely used to solve the food flavor evaluation problem, but the olfactometry evaluation is unfeasible to be carried out in large batches and is unreliable due to potential issue of an operator or systematic laboratory effect. Thus, a novel fingerprint modeling and profiling process was proposed based on several machine learning models including convolutional neural network (CNN). The fingerprint template was created by the data analysis of existing GC-MS spectrum dataset. Then the fingerprint image generation program was applied for structuring the complex instrumental data. Food olfactometry result was obtained by a machine learning method based on CNN using fingerprint image as the input. The case study on peanut oil samples demonstrated the model accuracy of around 93%. By structure optimization and further dataset expansion, the whole process has the potential to be utilized by sensory laboratories for aroma analysis instead of humans.


Sign in / Sign up

Export Citation Format

Share Document