The Hierarchical Classifier for COVID-19 Resistance Evaluation

Nataliya Shakhovska; Ivan Izonin; Nataliia Melnykova

doi:10.3390/data6010006

The Hierarchical Classifier for COVID-19 Resistance Evaluation

Data ◽

10.3390/data6010006 ◽

2021 ◽

Vol 6 (1) ◽

pp. 6

Author(s):

Nataliya Shakhovska ◽

Ivan Izonin ◽

Nataliia Melnykova

Keyword(s):

Decision Support ◽

Random Forest ◽

Personalized Medicine ◽

Random Processes ◽

Hierarchical Architecture ◽

Frequent Patterns ◽

Features Selection ◽

Clustering And Classification ◽

Dataset Analysis ◽

Hierarchical Classifier

Finding dependencies in the data requires the analysis of relations between dozens of parameters of the studied process and hundreds of possible sources of influence on this process. Dependencies are nondeterministic and therefore modeling requires the use of statistical methods for analyzing random processes. Part of the information is often hidden from observation or not monitored. That is why many difficulties have arisen in the process of analyzing the collected information. The paper aims to find frequent patterns and parameters affected by COVID-19. The novelty of the paper is hierarchical architecture comprises supervised and unsupervised methods. It allows the development of an ensemble of the methods based on k-means clustering and classification. The best classifiers from the ensemble are random forest with 500 trees and XGBoost. Classification for separated clusters gives us higher accuracy on 4% in comparison with dataset analysis. The proposed approach can be used also for personalized medicine decision support in other domains. The features selection allows us to analyze the following features with the highest impact on COVID-19: age, sex, blood group, had influenza.

Download Full-text

Hierarchical ensemble learning method in diversified dataset analysis

Journal of Physics Conference Series ◽

10.1088/1742-6596/2078/1/012027 ◽

2021 ◽

Vol 2078 (1) ◽

pp. 012027

Author(s):

Ze yuan Liu ◽

Xin long Li

Keyword(s):

Machine Learning ◽

Random Forest ◽

Classification Accuracy ◽

Large Data ◽

Training Dataset ◽

Categorical Variables ◽

Ensemble Machine Learning ◽

Dataset Analysis ◽

Stage 1 ◽

Hierarchical Classifier

Abstract The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy’s limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation’s proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly.

Download Full-text

NCOG-31. DEVELOPMENT AND EVALUATION OF PROGNOSTIC MODELS TO PREDICT SURVIVAL OF PATIENTS WITH GLIOBLASTOMA MULTIFORME

Neuro-Oncology ◽

10.1093/neuonc/noaa215.569 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii135-ii136

Author(s):

John Lin ◽

Michelle Mai ◽

Saba Paracha

Keyword(s):

Machine Learning ◽

Decision Support ◽

Glioblastoma Multiforme ◽

Random Forest ◽

Regression Model ◽

Clinical Decision Support ◽

Multivariate Regression ◽

Visual Analysis ◽

Clinical Decision ◽

Prognostic Models

Abstract Glioblastoma multiforme (GBM), the most common form of glioma, is a malignant tumor with a high risk of mortality. By providing accurate survival estimates, prognostic models have been identified as promising tools in clinical decision support. In this study, we produced and validated two machine learning-based models to predict survival time for GBM patients. Publicly available clinical and genomic data from The Cancer Genome Atlas (TCGA) and Broad Institute GDAC Firehouse were obtained through cBioPortal. Random forest and multivariate regression models were created to predict survival. Predictive accuracy was assessed and compared through mean absolute error (MAE) and root mean square error (RMSE) calculations. 619 GBM patients were included in the dataset. There were 381 (62.9%) cases of recurrence/progression and 53 (8.7%) cases of disease-free survival. The MAE and RMSE values were 0.553 and 0.887 years respectively for the random forest regression model, and they were 1.756 and 2.451 years respectively for the multivariate regression model. Both models accurately predicted overall survival. Comparison of models through MAE, RMSE, and visual analysis produced higher accuracy values for random forest than multivariate linear regression. Further investigation on feature selection and model optimization may improve predictive power. These findings suggest that using machine learning in GBM prognostic modeling will improve clinical decision support. *Co-first authors.

Download Full-text

Melanoma important features selection using random forest approach

2013 6th International Conference on Human System Interactions (HSI) ◽

10.1109/hsi.2013.6577857 ◽

2013 ◽

Cited By ~ 4

Author(s):

Wieslaw Paja ◽

Mariusz Wrzesien

Keyword(s):

Random Forest ◽

Features Selection

Download Full-text

A national clinical decision support infrastructure to enable the widespread and consistent practice of genomic and personalized medicine

BMC Medical Informatics and Decision Making ◽

10.1186/1472-6947-9-17 ◽

2009 ◽

Vol 9 (1) ◽

Cited By ~ 69

Author(s):

Kensaku Kawamoto ◽

David F Lobach ◽

Huntington F Willard ◽

Geoffrey S Ginsburg

Keyword(s):

Decision Support ◽

Personalized Medicine ◽

Clinical Decision Support ◽

Clinical Decision ◽

And Personalized Medicine

Download Full-text

Personalized Care: A Clinical Decision Support System for Breast Cancer Screening Using Clustering and Classification

SSRN Electronic Journal ◽

10.2139/ssrn.3134277 ◽

2017 ◽

Author(s):

M Alamelumangai ◽

B Sathiyabhama

Keyword(s):

Breast Cancer ◽

Decision Support ◽

Cancer Screening ◽

Decision Support System ◽

Clinical Decision Support ◽

Support System ◽

Clinical Decision Support System ◽

Clinical Decision ◽

Personalized Care ◽

Clustering And Classification

Download Full-text

Human Activity Recognition Based on Evolution of Features Selection and Random Forest

2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) ◽

10.1109/smc.2019.8913868 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christine Dewi ◽

Rung-Ching Chen

Keyword(s):

Random Forest ◽

Activity Recognition ◽

Human Activity ◽

Human Activity Recognition ◽

Features Selection

Download Full-text

P5668A decision-support tool framework to predict adverse outcome in patients with atrial fibrillation: J-Rhythm registry substudy

European Heart Journal ◽

10.1093/eurheartj/ehz746.0611 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

E Watanabe ◽

T Yamashita ◽

H Inoue ◽

H Atarashi ◽

K Okumura ◽

...

Keyword(s):

Atrial Fibrillation ◽

Decision Support ◽

Random Forest ◽

Major Bleeding ◽

Area Under Curve ◽

Total Mortality ◽

Decision Support Tool ◽

Random Forest Model ◽

Support Tool ◽

Forest Model

Abstract Background Atrial fibrillation (AF) is associated with increased mortality and morbidity. Modelling the risk of thrombosis, major bleeding and total mortality are often limited by the inadequate number of independent predictors. Purpose We compared the predictive accuracy of the decision-support tool framework and conventional risk score in AF patients. Methods We used data of AF patients enrolled into the nationwide AF registry. A random forest model was implemented to predict each outcome, and its predictive power was tested by a 5-fold cross-validation. Results We analyzed 7,937 patients with AF (age 70±10 years, female 31%). The type of AF was paroxysmal (37%), persistent (14%), and permanent (49%). The number of antithrombotic treatments were follows: warfarin only (n=5461), antiplatelet only (n=581), both warfarin and antiplatelet (n=1471) and no antithrombotic agents (424). The mean CHA2DS2-VASc score was 2.8±1.6 and HAS-BLED score was 2.7±1.2, respectively. We selected 20 from 50 clinical parameters and compared by the area-under-curve with the CHA2DS2-VASc score for thromboses and the HAS-BLED score for major bleeding. During the 2-year follow-up, 126 patients (1.6%) had thromboses, 140 (1.8%) had major bleeding, and 195 (2.5%) died. A random forest model had a higher value of the area-under-curve for predicting thromboses compared with the CHA2DS2-VASc (0.66 vs. 0.61, P<0.05), and had a significantly higher area-under-curve for major bleeding compared with the HAS-BLED (0.67 vs. 0.61, P<0.05). The area-under-curve for the all-cause mortality was 0.77. Conclusions A random forest model has a higher accuracy than conventional risk scheme in predicting thromboses and major bleeding, in addition to total mortality.

Download Full-text

E-Health towards ecumenical framework for personalized medicine via Decision Support System

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5626308 ◽

2010 ◽

Cited By ~ 9

Author(s):

I Kouris ◽

C Tsirmpas ◽

S G Mougiakakou ◽

D Iliopoulou ◽

D Koutsouris

Keyword(s):

Decision Support ◽

Personalized Medicine ◽

Decision Support System ◽

Support System

Download Full-text

Video spam comment features selection using machine learning techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i2.pp1046-1053 ◽

2019 ◽

Vol 15 (2) ◽

pp. 1046

Author(s):

Nabilah Alias ◽

Cik Feresa Mohd Foozy ◽

Sofia Najwa Ramli ◽

Naqliyah Zainuddin

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Forest ◽

Feature Detection ◽

Random Tree ◽

Machine Learning Techniques ◽

Decision Table ◽

Features Selection ◽

Video Sharing ◽

Learning Techniques

<p>Nowadays, social media (e.g., YouTube and Facebook) provides connection and interaction between people by posting comments or videos. In fact, comments are a part of contents in a website that can attract spammer to spreading phishing, malware or advertising. Due to existing malicious users that can spread malware or phishing in the comments, this work proposes a technique used for video sharing spam comments feature detection. The first phase of the methodology used in this work is dataset collection. For this experiment, a dataset from UCI Machine Learning repository is used. In the next phase, the development of framework and experimentation. The dataset will be pre-processed using tokenization and lemmatization process. After that, the features to detect spam is selected and the experiments for classification were performed by using six classifiers which are Random Tree, Random Forest, Naïve Bayes, KStar, Decision Table, and Decision Stump. The result shows the highest accuracy is 90.57% and the lowest was 58.86%.</p>

Download Full-text

Optainet-based technique for SVR feature selection and parameters optimization for software cost prediction

MATEC Web of Conferences ◽

10.1051/matecconf/202134801002 ◽

2021 ◽

Vol 348 ◽

pp. 01002

Author(s):

Assia Najm ◽

Abdelali Zakrani ◽

Abdelaziz Marzak

Keyword(s):

Feature Selection ◽

Random Forest ◽

Prediction Models ◽

Project Managers ◽

Parameters Optimization ◽

Support Vector ◽

Features Selection ◽

Selection Methods ◽

Cost Prediction ◽

Software Cost

The software cost prediction is a crucial element for a project’s success because it helps the project managers to efficiently estimate the needed effort for any project. There exist in literature many machine learning methods like decision trees, artificial neural networks (ANN), and support vector regressors (SVR), etc. However, many studies confirm that accurate estimations greatly depend on hyperparameters optimization, and on the proper input feature selection that impacts highly the accuracy of software cost prediction models (SCPM). In this paper, we propose an enhanced model using SVR and the Optainet algorithm. The Optainet is used at the same time for 1-selecting the best set of features and 2-for tuning the parameters of the SVR model. The experimental evaluation was conducted using a 30% holdout over seven datasets. The performance of the suggested model is then compared to the tuned SVR model using Optainet without feature selection. The results were also compared to the Boruta and random forest features selection methods. The experiments show that for overall datasets, the Optainet-based method improves significantly the accuracy of the SVR model and it outperforms the random forest and Boruta feature selection methods.

Download Full-text