Semantic Description of Explainable Machine Learning Workflows for Improving Trust

Patricia Inoue Nakagawa; Luís Ferreira Pires; João Luiz Rebelo Moreira; Luiz Olavo Bonino da Silva Santos; Faiza Bukhsh

doi:10.3390/app112210804

Semantic Description of Explainable Machine Learning Workflows for Improving Trust

Applied Sciences ◽

10.3390/app112210804 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10804

Author(s):

Patricia Inoue Nakagawa ◽

Luís Ferreira Pires ◽

João Luiz Rebelo Moreira ◽

Luiz Olavo Bonino da Silva Santos ◽

Faiza Bukhsh

Keyword(s):

Machine Learning ◽

Support Vector ◽

Semantic Description ◽

Sensitive Data ◽

Holistic View ◽

Healthcare Data ◽

Domain Specific ◽

Foundational Ontology ◽

General Module

Explainable Machine Learning comprises methods and techniques that enable users to better understand the machine learning functioning and results. This work proposes an ontology that represents explainable machine learning experiments, allowing data scientists and developers to have a holistic view, a better understanding of the explainable machine learning process, and to build trust. We developed the ontology by reusing an existing domain-specific ontology (ML-SCHEMA) and grounding it in the Unified Foundational Ontology (UFO), aiming at achieving interoperability. The proposed ontology is structured in three modules: (1) the general module, (2) the specific module, and (3) the explanation module. The ontology was evaluated using a case study in the scenario of the COVID-19 pandemic using healthcare data from patients, which are sensitive data. In the case study, we trained a Support Vector Machine to predict mortality of patients infected with COVID-19 and applied existing explanation methods to generate explanations from the trained model. Based on the case study, we populated the ontology and queried it to ensure that it fulfills its intended purpose and to demonstrate its suitability.

Download Full-text

A One-shot Learning Approach to Image Classification using Genetic Programming

10.26686/wgtn.13150934.v1 ◽

2020 ◽

Author(s):

Harith Al-Sahaf ◽

Mengjie Zhang ◽

M Johnston

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Genetic Programming ◽

Image Classification ◽

Local Binary Patterns ◽

Support Vector ◽

Learning Approach ◽

Data Sets ◽

Domain Specific ◽

International Publishing

In machine learning, it is common to require a large number of instances to train a model for classification. In many cases, it is hard or expensive to acquire a large number of instances. In this paper, we propose a novel genetic programming (GP) based method to the problem of automatic image classification via adopting a one-shot learning approach. The proposed method relies on the combination of GP and Local Binary Patterns (LBP) techniques to detect a predefined number of informative regions that aim at maximising the between-class scatter and minimising the within-class scatter. Moreover, the proposed method uses only two instances of each class to evolve a classifier. To test the effectiveness of the proposed method, four different texture data sets are used and the performance is compared against two other GP-based methods namely Conventional GP and Two-tier GP. The experiments revealed that the proposed method outperforms these two methods on all the data sets. Moreover, a better performance has been achieved by Naïve Bayes, Support Vector Machine, and Decision Trees (J48) methods when extracted features by the proposed method have been used compared to the use of domain-specific and Two-tier GP extracted features. © Springer International Publishing 2013.

Download Full-text

Spatio-Temporal Modeling of Wind Speed Using EOF and Machine Learning

10.5194/egusphere-egu2020-9186 ◽

2020 ◽

Author(s):

Fabian Guignard ◽

Federico Amato ◽

Sylvain Robert ◽

Mikhail Kanevski

Keyword(s):

Machine Learning ◽

Renewable Energy ◽

Wind Speed ◽

Wind Field ◽

Empirical Orthogonal Functions ◽

Environmental Data ◽

Support Vector ◽

Modeling Tools ◽

Spatio Temporal

Spatio-temporal modelling of wind speed is an important issue in applied research, such as renewable energy and risk assessment. Due to its turbulent nature and its very high variability, wind speed interpolation is a challenging task. Being universal modeling tools, Machine Learning (ML) algorithms are well suited to detect and model non-linear environmental phenomena such as wind.The present research proposes a novel and general methodology for spatio-temporal interpolation with an application to hourly wind speed in Switzerland. The methodology is organized as follows. First, the dataset is decomposed through Empirical Orthogonal Functions (EOFs) in temporal basis and spatially dependent coefficients. EOFs constitute an orthogonal basis of the spatio-temporal signal from which the original wind field can be reconstructed. Subsequently, in order to be able to reconstruct the signal at spatial locations where measurements are unknown, the spatial coefficients resulted from the decomposition are interpolated. To this aim, several ML algorithms were used and compared, including k-Nearest Neighbors, Random Forest, Support Vector Machine, General Regression Neural Networks and Extreme Learning Machine. Finally, wind field is reconstructed with the help of the interpolated coefficients.A case study on real data is presented. Data consists of two years of wind speed measurements at hourly frequency collected by Meteoswiss at several hundreds of stations in Switzerland, which has a complex orography. After cleaning and handling of missing values, a careful exploratory data analysis was carried out, followed by the application of the proposed novel methodology. The model is validated on an independent test set of stations. The outcome of the case study is a time series of hourly maps of wind field at 250 meters spatial resolution, which is highly relevant for renewable energy potential assessment.In conclusion, the study introduced a new way to interpolate irregular spatio-temporal datasets. Further developments of the methodology could deal with the investigation of alternative basis such as Fourier and wavelets.&#160;ReferenceN. Cressie, C. K. Wikle, Statistics for Spatio-Temporal Data, Wiley, 2011.M. Kanevski, A. Pozdnoukhov, V. Timonin, Machine Learning for Spatial Environmental Data, CRC Press, 2009.

Download Full-text

Studying the Effect of Taking Statins before Infection in the Severity Reduction of COVID-19 with Machine Learning

BioMed Research International ◽

10.1155/2021/9995073 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Alireza Davoudi ◽

Mohsen Ahmadi ◽

Abbas Sharifi ◽

Roshina Hassantabar ◽

Narges Najafi ◽

...

Keyword(s):

Machine Learning ◽

Diastolic Pressure ◽

Systolic Pressure ◽

Illness Severity ◽

Support Vector ◽

Learning Approaches ◽

Angiotensin Converting Enzyme 2 ◽

Decision Tree Method ◽

The Impact

Statins can help COVID-19 patients’ treatment because of their involvement in angiotensin-converting enzyme-2. The main objective of this study is to evaluate the impact of statins on COVID-19 severity for people who have been taking statins before COVID-19 infection. The examined research patients include people that had taken three types of statins consisting of Atorvastatin, Simvastatin, and Rosuvastatin. The case study includes 561 patients admitted to the Razi Hospital in Ghaemshahr, Iran, during February and March 2020. The illness severity was encoded based on the respiratory rate, oxygen saturation, systolic pressure, and diastolic pressure in five categories: mild, medium, severe, critical, and death. Since 69.23% of participants were in mild severity condition, the results showed the positive effect of Simvastatin on COVID-19 severity for people that take Simvastatin before being infected by the COVID-19 virus. Also, systolic pressure for this case study is 137.31, which is higher than that of the total patients. Another result of this study is that Simvastatin takers have an average of 95.77 mmHg O2Sat; however, the O2Sat is 92.42, which is medium severity for evaluating the entire case study. In the rest of this paper, we used machine learning approaches to diagnose COVID-19 patients’ severity based on clinical features. Results indicated that the decision tree method could predict patients’ illness severity with 87.9% accuracy. Other methods, including the K -nearest neighbors (KNN) algorithm, support vector machine (SVM), Naïve Bayes classifier, and discriminant analysis, showed accuracy levels of 80%, 68.8%, 61.1%, and 85.1%, respectively.

Download Full-text

Realization of a Machine Learning Domain Specific Modeling Language: A Baseball Analytics Case Study

Proceedings of the 7th International Conference on Model-Driven Engineering and Software Development ◽

10.5220/0007245800150026 ◽

2019 ◽

Author(s):

Kaan Koseler ◽

Kelsea McGraw ◽

Matthew Stephan

Keyword(s):

Machine Learning ◽

Modeling Language ◽

Domain Specific ◽

Domain Specific Modeling ◽

Specific Modeling

Download Full-text

Integrating Employee Value Model with Churn Prediction

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910666200213123728 ◽

2020 ◽

Vol 10 (4) ◽

pp. 484-493

Author(s):

Nguyen Thi Ngoc Anh ◽

Nguyen Danh Tu ◽

Vijender Kumar Solanki ◽

Nguyen Linh Giang ◽

Vu Hoai Thu ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Support Vector ◽

Churn Prediction ◽

Artificial Intelligent ◽

Weak Points ◽

Employee Value ◽

Value Model ◽

The Impact

Background: In recent years, human resource management is a crucial role in every companies or organization’s operation. Loyalty employee or Churn employee influence the operation of the organization. The impact of Churn employees is difference because of their role in organization. Objective: Thus, we define two Employee Value Models (EVMs) of organizations or companies based on employee features that are popular of almost companies. Methods: Meanwhile, with the development of Artificial intelligent, machine learning is possible to give predict data-based models having high accuracy.Thus, integrating Churn prediction, EVM and machine learning such as support vector machine, logistic regression, random forest is proposed in this paper. The strong points of each model are used and weak points are reduced to help the companies or organizations avoid high value employee leaving in the future. The process of prediction integrating Churn, value of employee and machine learning are described detail in 6 steps. The pros of integrating model gives the more necessary results for company than Churn prediction model but the cons is complexity of model and algorithms and speed of computing. Results: A case study of an organization with 1470 employee positions is carried out to demonstrate the whole integrating churn predict, EVM and machine learning process. The accuracy of the integrating model is high from 82% to 85%. Moreover, the some results of Churn and value employee are analyzed. Conclusion: This paper is proposing upgrade models for predicting an employee who may leave an organization and integration of two models including employee value model and Churn prediction is feasible.

Download Full-text

Machine Learning-Based Prediction of Chlorophyll-a Variations in Receiving Reservoir of World’s Largest Water Transfer Project—A Case Study in the Miyun Reservoir, North China

Water ◽

10.3390/w13172406 ◽

2021 ◽

Vol 13 (17) ◽

pp. 2406

Author(s):

Zhenmei Liao ◽

Nan Zang ◽

Xuan Wang ◽

Chunhui Li ◽

Qiang Liu

Keyword(s):

Machine Learning ◽

Water Quality ◽

Chlorophyll A ◽

Trophic State ◽

Water Transfer ◽

Support Vector ◽

Miyun Reservoir ◽

Chl A ◽

Svm Model

Although water transfer projects can alleviate the water crisis, they may cause potential risks to water quality safety in receiving areas. The Miyun Reservoir in northern China, one of the receiving reservoirs of the world’s largest water transfer project (South-to-North Water Transfer Project, SNWTP), was selected as a case study. Considering its potential eutrophication trend, two machine learning models, i.e., the support vector machine (SVM) model and the random forest (RF) model, were built to investigate the trophic state by predicting the variations of chlorophyll-a (Chl-a) concentrations, the typical reflection of eutrophication, in the reservoir after the implementation of SNWTP. The results showed that compared with the SVM model, the RF model had higher prediction accuracy and more robust prediction ability with abnormal data, and was thus more suitable for predicting Chl-a concentration variations in the receiving reservoir. Additionally, short-term water transfer would not cause significant variations of Chl-a concentrations. After the project implementation, the impact of transferred water on the water quality of the receiving reservoir would have gradually increased. After a 10-year implementation, transferred water would cause a significant decline in the receiving reservoir’s water quality, and Chl-a concentrations would increase, especially from July to August. This led to a potential risk of trophic state change in the Miyun Reservoir and required further attention from managers. This study can provide prediction techniques and advice on water quality security management associated with eutrophication risks resulting from water transfer projects.

Download Full-text

Flood Stage Forecasting Using Machine-Learning Methods: A Case Study on the Parma River (Italy)

Water ◽

10.3390/w13121612 ◽

2021 ◽

Vol 13 (12) ◽

pp. 1612

Author(s):

Susanna Dazzi ◽

Renato Vacondio ◽

Paolo Mignosa

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Computational Time ◽

Support Vector ◽

Lead Times ◽

Efficiency Coefficient ◽

Forecast Horizon ◽

Training Time ◽

Forecasting System

Real-time river flood forecasting models can be useful for issuing flood alerts and reducing or preventing inundations. To this end, machine-learning (ML) methods are becoming increasingly popular thanks to their low computational requirements and to their reliance on observed data only. This work aimed to evaluate the ML models’ capability of predicting flood stages at a critical gauge station, using mainly upstream stage observations, though downstream levels should also be included to consider backwater, if present. The case study selected for this analysis was the lower stretch of the Parma River (Italy), and the forecast horizon was extended up to 9 h. The performances of three ML algorithms, namely Support Vector Regression (SVR), MultiLayer Perceptron (MLP), and Long Short-term Memory (LSTM), were compared herein in terms of accuracy and computational time. Up to 6 h ahead, all models provided sufficiently accurate predictions for practical purposes (e.g., Root Mean Square Error < 15 cm, and Nash-Sutcliffe Efficiency coefficient > 0.99), while peak levels were poorly predicted for longer lead times. Moreover, the results suggest that the LSTM model, despite requiring the longest training time, is the most robust and accurate in predicting peak values, and it should be preferred for setting up an operational forecasting system.

Download Full-text

Applying Machine Learning for Healthcare: A Case Study on Cervical Pain Assessment with Motion Capture

Applied Sciences ◽

10.3390/app10175942 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5942 ◽

Cited By ~ 2

Author(s):

Juan de la Torre ◽

Javier Marin ◽

Sergio Ilarri ◽

Jose J. Marin

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Gradient Boosting ◽

Support Vector ◽

Cervical Pain ◽

K Nearest Neighbors ◽

Network Algorithms ◽

Vector Machines ◽

Real Scenario

Given the exponential availability of data in health centers and the massive sensorization that is expected, there is an increasing need to manage and analyze these data in an effective way. For this purpose, data mining (DM) and machine learning (ML) techniques would be helpful. However, due to the specific characteristics of the field of healthcare, a suitable DM and ML methodology adapted to these particularities is required. The applied methodology must structure the different stages needed for data-driven healthcare, from the acquisition of raw data to decision-making by clinicians, considering the specific requirements of this field. In this paper, we focus on a case study of cervical assessment, where the goal is to predict the potential presence of cervical pain in patients affected with whiplash diseases, which is important for example in insurance-related investigations. By analyzing in detail this case study in a real scenario, we show how taking care of those particularities enables the generation of reliable predictive models in the field of healthcare. Using a database of 302 samples, we have generated several predictive models, including logistic regression, support vector machines, k-nearest neighbors, gradient boosting, decision trees, random forest, and neural network algorithms. The results show that it is possible to reliably predict the presence of cervical pain (accuracy, precision, and recall above 90%). We expect that the procedure proposed to apply ML techniques in the field of healthcare will help technologists, researchers, and clinicians to create more objective systems that provide support to objectify the diagnosis, improve test treatment efficacy, and save resources.

Download Full-text

Energy consumption prediction and diagnosis of public buildings based on support vector machine learning: A case study in China

Journal of Cleaner Production ◽

10.1016/j.jclepro.2020.122542 ◽

2020 ◽

Vol 272 ◽

pp. 122542 ◽

Cited By ~ 3

Author(s):

Yang Liu ◽

Hongyu Chen ◽

Limao Zhang ◽

Xianguo Wu ◽

Xian-jia Wang

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Energy Consumption ◽

Support Vector ◽

Public Buildings ◽

Energy Consumption Prediction ◽

Consumption Prediction

Download Full-text

A Comprehensive Survey on Data Utility and Privacy: Taking Indian Healthcare System as a Potential Case Study

Inventions ◽

10.3390/inventions6030045 ◽

2021 ◽

Vol 6 (3) ◽

pp. 45

Author(s):

Prathamesh Churi ◽

Ambika Pawar ◽

Antonio-José Moreno-Guerrero

Keyword(s):

Healthcare System ◽

Large Population ◽

Healthcare Systems ◽

Healthcare Management ◽

Healthcare Sector ◽

Sensitive Data ◽

Data Breaches ◽

Healthcare Data ◽

Privacy Issues

Background: According to the renowned and Oscar award-winning American actor and film director Marlon Brando, “privacy is not something that I am merely entitled to, it is an absolute prerequisite.” Privacy threats and data breaches occur daily, and countries are mitigating the consequences caused by privacy and data breaches. The Indian healthcare industry is one of the largest and rapidly developing industry. Overall, healthcare management is changing from disease-centric into patient-centric systems. Healthcare data analysis also plays a crucial role in healthcare management, and the privacy of patient records must receive equal attention. Purpose: This paper mainly presents the utility and privacy factors of the Indian healthcare data and discusses the utility aspect and privacy problems concerning Indian healthcare systems. It defines policies that reform Indian healthcare systems. The case study of the NITI Aayog report is presented to explain how reformation occurs in Indian healthcare systems. Findings: It is found that there have been numerous research studies conducted on Indian healthcare data across all dimensions; however, privacy problems in healthcare, specifically in India, are caused by prevalent complacency, culture, politics, budget limitations, large population, and existing infrastructures. This paper reviews the Indian healthcare system and the applications that drive it. Additionally, the paper also maps that how privacy issues are happening in every healthcare sector in India. Originality/Value: To understand these factors and gain insights, understanding Indian healthcare systems first is crucial. To the best of our knowledge, we found no recent papers that thoroughly reviewed the Indian healthcare system and its privacy issues. The paper is original in terms of its overview of the healthcare system and privacy issues. Social Implications: Privacy has been the most ignored part of the Indian healthcare system. With India being a country with a population of 130 billion, much healthcare data are generated every day. The chances of data breaches and other privacy violations on such sensitive data cannot be avoided as they cause severe concerns for individuals. This paper segregates the healthcare system’s advances and lists the privacy that needs to be addressed first.

Download Full-text