Statistical Inference for Clustering Results Interpretation in Clinical Practice

Mapping Intimacies ◽

10.3233/shti210580 ◽

2021 ◽

Author(s):

Alexander Kanonirov ◽

Ksenia Balabaeva ◽

Sergey Kovalchuk

Keyword(s):

Machine Learning ◽

Clinical Practice ◽

Bayesian Inference ◽

Statistical Inference ◽

Clinical Pathways ◽

Learning Models ◽

The Difference ◽

Characteristic Features ◽

Machine Learning Models

The relevance of this study lies in improvement of machine learning models understanding. We present a method for interpreting clustering results and apply it to the case of clinical pathways modeling. This method is based on statistical inference and allows to get the description of the clusters, determining the influence of a particular feature on the difference between them. Based on the proposed approach, it is possible to determine the characteristic features for each cluster. Finally, we compare the method with the Bayesian inference explanation and with the interpretation of medical experts [1].

Download Full-text

Uncovering and Correcting Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis

Diagnostics ◽

10.3390/diagnostics12010040 ◽

2021 ◽

Vol 12 (1) ◽

pp. 40

Author(s):

Meike Nauta ◽

Ricky Walsh ◽

Adam Dubowski ◽

Christin Seifert

Keyword(s):

Machine Learning ◽

Clinical Practice ◽

Skin Cancer ◽

Cancer Diagnosis ◽

Image Inpainting ◽

Relevant Information ◽

Black Box ◽

Training Dataset ◽

Learning Models ◽

Machine Learning Models

Machine learning models have been successfully applied for analysis of skin images. However, due to the black box nature of such deep learning models, it is difficult to understand their underlying reasoning. This prevents a human from validating whether the model is right for the right reasons. Spurious correlations and other biases in data can cause a model to base its predictions on such artefacts rather than on the true relevant information. These learned shortcuts can in turn cause incorrect performance estimates and can result in unexpected outcomes when the model is applied in clinical practice. This study presents a method to detect and quantify this shortcut learning in trained classifiers for skin cancer diagnosis, since it is known that dermoscopy images can contain artefacts. Specifically, we train a standard VGG16-based skin cancer classifier on the public ISIC dataset, for which colour calibration charts (elliptical, coloured patches) occur only in benign images and not in malignant ones. Our methodology artificially inserts those patches and uses inpainting to automatically remove patches from images to assess the changes in predictions. We find that our standard classifier partly bases its predictions of benign images on the presence of such a coloured patch. More importantly, by artificially inserting coloured patches into malignant images, we show that shortcut learning results in a significant increase in misdiagnoses, making the classifier unreliable when used in clinical practice. With our results, we, therefore, want to increase awareness of the risks of using black box machine learning models trained on potentially biased datasets. Finally, we present a model-agnostic method to neutralise shortcut learning by removing the bias in the training dataset by exchanging coloured patches with benign skin tissue using image inpainting and re-training the classifier on this de-biased dataset.

Download Full-text

Shapley Regressions: A Framework for Statistical Inference on Machine Learning Models

SSRN Electronic Journal ◽

10.2139/ssrn.3351091 ◽

2019 ◽

Cited By ~ 5

Author(s):

Andreas Joseph

Keyword(s):

Machine Learning ◽

Statistical Inference ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Comparative Study of Machine Learning Algorithms on Binary Dataset

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-887 ◽

2021 ◽

pp. 137-147

Author(s):

Rajat Puri ◽

Digvijay Patil

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Computation Time ◽

Machine Learning Algorithms ◽

Diabetic Patients ◽

Learning Models ◽

Great Performance ◽

The Difference ◽

The Right ◽

Machine Learning Models

In the world of Machine Learning, there are a lot of machine learning models to choose from for classification and decision making. Choosing the right model requires one to take in consideration various metrics like accuracy, computation time, F1 score, etc. This paper aims at comparing the performance of various such machine learning models. We use the diabetes symptoms dataset for this study. This dataset contains sixteen factors that have been seen in diabetic patients that includes age, gender, obesity, etc. The emphasis is on comparing various Machine Learning models including likes of Decision Trees, Neural Networks, etc. Decision Trees gave the best results with an accuracy of 96% and a computation time of 0.0288 seconds. Gaussian Naive Bayes was the least accurate with an accuracy of 89% and a computation time of 0.39 seconds. The great performance of Decision Trees can be attributed to the fact that the independent factors and output classes are binary and hence classification is easier and more accurate for decision trees. This paper aims at highlighting the difference in performance of various Machine Learning models based on the type of dataset used. Each model has a dataset that is most suited to it for the best possible performance.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.

Download Full-text

A Comparative Study of Machine Learning Models for Stock Market Rate Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.985990 ◽

2019 ◽

Vol 7 (6) ◽

pp. 985-990

Author(s):

reeraksha M S ◽

Bhargavi M S

Keyword(s):

Machine Learning ◽

Stock Market ◽

Comparative Study ◽

Learning Models ◽

Rate Prediction ◽

Market Rate ◽

Machine Learning Models

Download Full-text

An Intelligent Approach for Prediction of Liver Disease using Machine Learning Models

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/568102020 ◽

2020 ◽

Vol 8 (10) ◽

pp. 6974-6983

Keyword(s):

Machine Learning ◽

Liver Disease ◽

Learning Models ◽

Intelligent Approach ◽

Machine Learning Models

Download Full-text

Utilizing Blockchain Technology in Social Media Bot Identification

10.36227/techrxiv.12049374 ◽

2020 ◽

Author(s):

Shreya Reddy ◽

Lisa Ewen ◽

Pankti Patel ◽

Prerak Patel ◽

Ankit Kundal ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Gold Standard ◽

The Internet ◽

Learning Models ◽

Current Time ◽

Machine Learning Methods ◽

Blockchain Technology ◽

Modern Age ◽

Machine Learning Models

As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.

Download Full-text