Towards Blockchain-Based Federated Machine Learning: Smart Contract for Model Inference

Vaidotas Drungilas; Evaldas Vaičiukynas; Mantas Jurgelaitis; Rita Butkienė; Lina Čeponienė

doi:10.3390/app11031010

Towards Blockchain-Based Federated Machine Learning: Smart Contract for Model Inference

Applied Sciences ◽

10.3390/app11031010 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1010

Author(s):

Vaidotas Drungilas ◽

Evaldas Vaičiukynas ◽

Mantas Jurgelaitis ◽

Rita Butkienė ◽

Lina Čeponienė

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Network Size ◽

Detection Task ◽

Supervised Machine Learning ◽

Large Dataset ◽

Model Inference ◽

Smart Contract ◽

Machine Learning Applications ◽

The Impact

Federated learning is a branch of machine learning where a shared model is created in a decentralized and privacy-preserving fashion, but existing approaches using blockchain are limited by tailored models. We consider the possibility to extend a set of supported models by introducing the oracle service and exploring the usability of blockchain-based architecture. The investigated architecture combines an oracle service with a Hyperledger Fabric chaincode. We compared two logistic regression implementations in Go language—a pure chaincode and an oracle service—at various data (2–32 k instances) and network (3–13 peers) sizes. Experiments were run to assess the performance of blockchain-based model inference using 2D synthetic and EEG eye state datasets for a supervised machine learning detection task. The benchmarking results showed that the impact on performance is acceptable with the median overhead of oracle service reaching 2–4%, depending on the dimensionality of the dataset. The overhead tends to diminish at large dataset sizes with the runtime depending on the network size linearly, where additional peers increased the runtime by 6.3 and 6.6 s for 2D and EEG datasets, respectively. Demonstrated negligible difference between implementations justifies the flexible choice of model in the blockchain-based federated learning and other machine learning applications.

Download Full-text

Credibility Detection on Twitter News Using Machine Learning Approach

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2021.03.01 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1-10

Author(s):

Marina Azer ◽

◽

Mohamed Taha ◽

Hala H. Zayed ◽

Mahmoud Gadallah

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Learning System ◽

Supervised Machine Learning ◽

Support Vector ◽

Sources Of Information ◽

Supervised Machine Learning Classifiers ◽

False News ◽

The Impact

Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), Knearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.

Download Full-text

Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis

Genes and Immunity ◽

10.1038/gene.2009.110 ◽

2010 ◽

Vol 11 (3) ◽

pp. 199-208 ◽

Cited By ~ 19

Author(s):

F B S Briggs ◽

P P Ramsay ◽

E Madden ◽

J M Norris ◽

V M Holers ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Supervised Machine Learning

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Correction: The path to international medals: A supervised machine learning approach to explore the impact of coach-led sport-specific and non-specific practice

PLoS ONE ◽

10.1371/journal.pone.0244509 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0244509

Author(s):

Michael Barth ◽

Arne Güllich ◽

Christian Raschner ◽

Eike Emrich

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Machine Learning Approach ◽

Specific Practice ◽

The Impact

Download Full-text

Predicting the Mechanical Properties of RCA-Based Concrete Using Supervised Machine Learning Algorithms

Materials ◽

10.3390/ma15020647 ◽

2022 ◽

Vol 15 (2) ◽

pp. 647

Author(s):

Meijun Shang ◽

Hejun Li ◽

Ayaz Ahmad ◽

Waqas Ahmad ◽

Krzysztof Adam Ostrowski ◽

...

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Mean Square Error ◽

Coarse Aggregate ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Environmental Damage ◽

Fine Aggregate ◽

Mean Square ◽

The Impact

Environment-friendly concrete is gaining popularity these days because it consumes less energy and causes less damage to the environment. Rapid increases in the population and demand for construction throughout the world lead to a significant deterioration or reduction in natural resources. Meanwhile, construction waste continues to grow at a high rate as older buildings are destroyed and demolished. As a result, the use of recycled materials may contribute to improving the quality of life and preventing environmental damage. Additionally, the application of recycled coarse aggregate (RCA) in concrete is essential for minimizing environmental issues. The compressive strength (CS) and splitting tensile strength (STS) of concrete containing RCA are predicted in this article using decision tree (DT) and AdaBoost machine learning (ML) techniques. A total of 344 data points with nine input variables (water, cement, fine aggregate, natural coarse aggregate, RCA, superplasticizers, water absorption of RCA and maximum size of RCA, density of RCA) were used to run the models. The data was validated using k-fold cross-validation and the coefficient correlation coefficient (R2), mean square error (MSE), mean absolute error (MAE), and root mean square error values (RMSE). However, the model’s performance was assessed using statistical checks. Additionally, sensitivity analysis was used to determine the impact of each variable on the forecasting of mechanical properties.

Download Full-text

Identification of Primary Antimicrobial Resistance Drivers in Agricultural Nontyphoidal Salmonella enterica Serovars by Using Machine Learning

mSystems ◽

10.1128/msystems.00211-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 1

Author(s):

Finlay Maguire ◽

Muhammad Attiq Rehman ◽

Catherine Carrillo ◽

Moussa S. Diarra ◽

Robert G. Beiko

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Antimicrobial Resistance ◽

Broiler Chicken ◽

Genomic Data ◽

Set Covering ◽

Learning Models ◽

Commercial Chicken ◽

The Impact ◽

Machine Learning Models

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text

A mobile robot platform for supervised machine learning applications

2017 24th International Conference on Mechatronics and Machine Vision in Practice (M2VIP) ◽

10.1109/m2vip.2017.8211472 ◽

2017 ◽

Cited By ~ 1

Author(s):

Frazer K. Noble

Keyword(s):

Machine Learning ◽

Mobile Robot ◽

Supervised Machine Learning ◽

Machine Learning Applications ◽

Robot Platform

Download Full-text

Testing and Validating Two Morphological Flare Predictors by Logistic Regression Machine Learning

Frontiers in Astronomy and Space Sciences ◽

10.3389/fspas.2020.571186 ◽

2021 ◽

Vol 7 ◽

Author(s):

M. B. Korsós ◽

R. Erdélyi ◽

J. Liu ◽

H. Morgan

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Active Regions ◽

Morphological Parameters ◽

Mixed States ◽

Separation Parameter ◽

Prediction Probability ◽

Machine Learning Applications ◽

Joint Prediction ◽

Two Parameters

Whilst the most dynamic solar active regions (ARs) are known to flare frequently, predicting the occurrence of individual flares and their magnitude, is very much a developing field with strong potentials for machine learning applications. The present work is based on a method which is developed to define numerical measures of the mixed states of ARs with opposite polarities. The method yields compelling evidence for the assumed connection between the level of mixed states of a given AR and the level of the solar eruptive probability of this AR by employing two morphological parameters: 1) the separation parameter Sl−f and 2) the sum of the horizontal magnetic gradient GS. In this work, we study the efficiency of Sl−f and GS as flare predictors on a representative sample of ARs, based on the SOHO/MDI-Debrecen Data (SDD) and the SDO/HMI - Debrecen Data (HMIDD) sunspot catalogues. In particular, we investigate about 1,000 ARs in order to test and validate the joint prediction capabilities of the two morphological parameters by applying the logistic regression machine learning method. Here, we confirm that the two parameters with their threshold values are, when applied together, good complementary predictors. Furthermore, the prediction probability of these predictor parameters is given at least 70% a day before.

Download Full-text