Development of a Stacking-Based Ensemble Machine Learning for Detection of Depression in Parkinson’s Disease: Preliminary Research

Haewon Byeon

doi:10.3390/eccm-10857

Development of a Stacking-Based Ensemble Machine Learning for Detection of Depression in Parkinson’s Disease: Preliminary Research

Biology and Life Sciences Forum ◽

10.3390/eccm-10857 ◽

2021 ◽

Vol 9 (1) ◽

pp. 5

Author(s):

Haewon Byeon

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Naive Bayes ◽

Depression Scale ◽

Prognostic Index ◽

Predictive Performance ◽

Naïve Bayes ◽

Outcome Variable

This preliminary study used the stacking ensemble to explore the major elements (factors) which could predict depression in patients with Parkinson’s disease and presented baseline data for developing a nomogram prognostic index for predicting high-risk groups for depression among patients with Parkinson’s disease in the future. Depression, an outcome variable, was divided into “with depression” and “without depression” using the Geriatric Depression Scale-30 (GDS-30). This study developed nine machine learning models (ANN, random forest, naive bayes, CART, ANN+LR, random forest+LR, naive bayes+LR, CART+LR, and random forest+naive bayes+CART+ANN+LR). The predictive performance (e.g., REMS, IA, Ev) of each machine learning model was validated through 10-fold cross-validation. The analysis results showed that the random forest+LR had the best predictive performance: RMSE = 0.16, IA = 0.73, and Ev = 0.48. This study analyzed the normalized importance of the random forest+LR model’s variables (the final model) and confirmed that K-MMSE, K-MoCA, Global CDR, sum of boxes in CDR, total score of UPDRS, motor score of UPDRS, K-IADL, H and Y staging, Schwab and England ADL, and REM and RBD were ten major variables with high weight among predictors of Parkinson’s disease with depression in South Korea. It is necessary as well to develop interpretable machine learning to build a model for predicting depression in patients with Parkinson’s disease that can be used in the medical field.

Download Full-text

Penerapan Klasifikasi Kueri untuk Meningkatkan Efektivitas Mesin Pencari

Seminar Nasional Official Statistics ◽

10.34123/semnasoffstat.v2021i1.914 ◽

2021 ◽

Vol 2021 (1) ◽

pp. 1012-1018

Author(s):

Handy Geraldy ◽

Lutfi Rahmatuti Maghfiroh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting

Dalam menjalankan peran sebagai penyedia data, Badan Pusat Statistik (BPS) memberikan layanan akses data BPS bagi masyarakat. Salah satu layanan tersebut adalah fitur pencarian di website BPS. Namun, layanan pencarian yang diberikan belum memenuhi harapan konsumen. Untuk memenuhi harapan konsumen, salah satu upaya yang dapat dilakukan adalah meningkatkan efektivitas pencarian agar lebih relevan dengan maksud pengguna. Oleh karena itu, penelitian ini bertujuan untuk membangun fungsi klasifikasi kueri pada mesin pencari dan menguji apakah fungsi tersebut dapat meningkatkan efektivitas pencarian. Fungsi klasifikasi kueri dibangun menggunakan model machine learning. Kami membandingkan lima algoritma yaitu SVM, Random Forest, Gradient Boosting, KNN, dan Naive Bayes. Dari lima algoritma tersebut, model terbaik diperoleh pada algoritma SVM. Kemudian, fungsi tersebut diimplementasikan pada mesin pencari yang diukur efektivitasnya berdasarkan nilai precision dan recall. Hasilnya, fungsi klasifikasi kueri dapat mempersempit hasil pencarian pada kueri tertentu, sehingga meningkatkan nilai precision. Namun, fungsi klasifikasi kueri tidak memengaruhi nilai recall.

Download Full-text

Preliminary Screening of COVID-19 Infection Employing Machine Learning Techniques From Simple Blood Profile

International Journal of Quantitative Structure-Property Relationships ◽

10.4018/ijqspr.2021070103 ◽

2021 ◽

Vol 6 (3) ◽

pp. 35-47

Author(s):

Anirudh Reddy Cingireddy ◽

Robin Ghosh ◽

Supratik Kar ◽

Venkata Melapu ◽

Sravanthi Joginipeli ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Albert Einstein ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

Blood Profile ◽

Molecular Tests ◽

Large Populations

Frequent testing of the entire population would help to identify individuals with active COVID-19 and allow us to identify concealed carriers. Molecular tests, antigen tests, and antibody tests are being widely used to confirm COVID-19 in the population. Molecular tests such as the real-time reverse transcription-polymerase chain reaction (rRT-PCR) test will take a minimum of 3 hours to a maximum of 4 days for the results. The authors suggest using machine learning and data mining tools to filter large populations at a preliminary level to overcome this issue. The ML tools could reduce the testing population size by 20 to 30%. In this study, they have used a subset of features from full blood profile which are drawn from patients at Israelita Albert Einstein hospital located in Brazil. They used classification models, namely KNN, logistic regression, XGBooting, naive Bayes, decision tree, random forest, support vector machine, and multilayer perceptron with k-fold cross-validation, to validate the models. Naïve bayes, KNN, and random forest stand out as the most predictive ones with 88% accuracy each.

Download Full-text

Machine Learning Readmission Risk Modeling: A Pediatric Case Study

BioMed Research International ◽

10.1155/2019/8532892 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Patricio Wolff ◽

Manuel Graña ◽

Sebastián A. Ríos ◽

Maria Begoña Yarza

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Naive Bayes ◽

Class Imbalance ◽

Predictive Performance ◽

Naïve Bayes ◽

Distribution Model ◽

Training Dataset ◽

Support Vector ◽

Pediatric Hospital

Background. Hospital readmission prediction in pediatric hospitals has received little attention. Studies have focused on the readmission frequency analysis stratified by disease and demographic/geographic characteristics but there are no predictive modeling approaches, which may be useful to identify preventable readmissions that constitute a major portion of the cost attributed to readmissions.Objective. To assess the all-cause readmission predictive performance achieved by machine learning techniques in the emergency department of a pediatric hospital in Santiago, Chile.Materials. An all-cause admissions dataset has been collected along six consecutive years in a pediatric hospital in Santiago, Chile. The variables collected are the same used for the determination of the child’s treatment administrative cost.Methods. Retrospective predictive analysis of 30-day readmission was formulated as a binary classification problem. We report classification results achieved with various model building approaches after data curation and preprocessing for correction of class imbalance. We compute repeated cross-validation (RCV) with decreasing number of folders to assess performance and sensitivity to effect of imbalance in the test set and training set size.Results. Increase in recall due to SMOTE class imbalance correction is large and statistically significant. The Naive Bayes (NB) approach achieves the best AUC (0.65); however the shallow multilayer perceptron has the best PPV and f-score (5.6 and 10.2, resp.). The NB and support vector machines (SVM) give comparable results if we consider AUC, PPV, and f-score ranking for all RCV experiments. High recall of deep multilayer perceptron is due to high false positive ratio. There is no detectable effect of the number of folds in the RCV on the predictive performance of the algorithms.Conclusions. We recommend the use of Naive Bayes (NB) with Gaussian distribution model as the most robust modeling approach for pediatric readmission prediction, achieving the best results across all training dataset sizes. The results show that the approach could be applied to detect preventable readmissions.

Download Full-text

Can Gut Microbiota Be a Good Predictor for Parkinson’s Disease? A Machine Learning Approach

Brain Sciences ◽

10.3390/brainsci10040242 ◽

2020 ◽

Vol 10 (4) ◽

pp. 242 ◽

Cited By ~ 3

Author(s):

Daniele Pietrucci ◽

Adelaide Teofani ◽

Valeria Unida ◽

Rocco Cerroni ◽

Silvia Biocca ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Gut Microbiota ◽

Biological Data ◽

Machine Learning Algorithms ◽

Support Vector ◽

Published Data ◽

Promising Tool

The involvement of the gut microbiota in Parkinson’s disease (PD), investigated in several studies, identified some common alterations of the microbial community, such as a decrease in Lachnospiraceae and an increase in Verrucomicrobiaceae families in PD patients. However, the results of other bacterial families are often contradictory. Machine learning is a promising tool for building predictive models for the classification of biological data, such as those produced in metagenomic studies. We tested three different machine learning algorithms (random forest, neural networks and support vector machines), analyzing 846 metagenomic samples (472 from PD patients and 374 from healthy controls), including our published data and those downloaded from public databases. Prediction performance was evaluated by the area under curve, accuracy, precision, recall and F-score metrics. The random forest algorithm provided the best results. Bacterial families were sorted according to their importance in the classification, and a subset of 22 families has been identified for the prediction of patient status. Although the results are promising, it is necessary to train the algorithm with a larger number of samples in order to increase the accuracy of the procedure.

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

Machine Learning Algorithms for Biological Targets: Investigating the Error Tolerance in Various Computational Methods

10.31219/osf.io/zkumv ◽

2019 ◽

Author(s):

Thomas M. Kaiser ◽

Pieter B. Burger

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Probabilistic Neural Network ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Learning Models ◽

Bayes Network ◽

Insight Into ◽

Machine Learning Models

Machine learning continues to make strident advances in the prediction of desired properties concerning drug development. Problematically, the efficacy of machine learning in these arenas is reliant upon highly accurate and abundant data. These two limitations, high accuracy and abundance, are often taken together; however, insight into the dataset accuracy limitation of contemporary machine learning algorithms may yield insight into whether non-bench experimental sources of data may be used to generate useful machine learning models where there is a paucity of experimental data. We took highly accurate data across six kinase types, one GPCR, one polymerase, a human protease, and HIV protease, and intentionally introduced error at varying population proportions in the datasets for each target. With the generated error in the data, we explored how the retrospective accuracy of a Naïve Bayes Network, a Random Forest Model, and a Probabilistic Neural Network model decayed as a function of error. Additionally, we explored the ability of a training dataset with an error profile resembling that produced by the Free Energy Perturbation method (FEP+) to generate machine learning models with useful retrospective capabilities. The categorical error tolerance was quite high for a Naïve Bayes Network algorithm averaging 39% error in the training set required to lose predictivity on the test set. Additionally, a Random Forest tolerated a significant degree of categorical error introduced into the training set with an average error of 29% required to lose predictivity. However, we found the Probabilistic Neural Network algorithm did not tolerate as much categorical error requiring an average of 20% error to lose predictivity. Finally, we found that a Naïve Bayes Network and a Random Forest could both use datasets with an error profile resembling that of FEP+. This work demonstrates that computational methods of known error distribution like FEP+ may be useful in generating machine learning models not based on extensive and expensive in vitro-generated datasets.

Download Full-text

SNPs rs11240569, rs708727, and rs823156 in SLC41A1 Do Not Discriminate Between Slovak Patients with Idiopathic Parkinson’s Disease and Healthy Controls: Statistics and Machine-Learning Evidence

International Journal of Molecular Sciences ◽

10.3390/ijms20194688 ◽

2019 ◽

Vol 20 (19) ◽

pp. 4688

Author(s):

Michal Cibulka ◽

Maria Brodnanova ◽

Marian Grendar ◽

Milan Grofik ◽

Egon Kurca ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Susceptibility Locus ◽

Healthy Controls ◽

Diverse Populations ◽

Slovak Population ◽

Asian Populations ◽

Clinical Diagnostic

Gene SLC41A1 (A1) is localized within Parkinson’s disease-(PD)-susceptibility locus PARK16 and encodes for the Na+/Mg2+-exchanger. The association of several A1 SNPs with PD has been studied. Two, rs11240569 and rs823156, have been associated with reduced PD-susceptibility primarily in Asian populations. Here, we examined the association of rs11240569, rs708727, and rs823156 with PD in the Slovak population and their power to discriminate between PD patients and healthy controls. The study included 150 PD patients and 120 controls. Genotyping was performed with the TaqMan® approach. Data were analyzed by conventional statistics and Random Forest machine-learning (ML) algorithm. Individually, none of the three SNPs is associated with an altered risk for PD-onset in Slovaks. However, a combination of genotypes of SNP-triplet GG(rs11240569)/AG(rs708727)/AA(rs823156) is significantly (p < 0.05) more frequent in the PD (13.3%) than in the control (5%) cohort. ML identified the power of the tested SNPs in isolation or of their singlets (joined), duplets and triplets to discriminate between PD-patients and healthy controls as zero. Our data further substantiate differences between diverse populations regarding the association of A1 polymorphisms with PD-susceptibility. Lack of power of the tested SNPs to discriminate between PD and healthy cases render their clinical/diagnostic relevance in the Slovak population negligible.

Download Full-text

Data Driven Approach for Eye Disease Classification with Machine Learning

Applied Sciences ◽

10.3390/app9142789 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2789 ◽

Cited By ~ 3

Author(s):

Sadaf Malik ◽

Nadia Kanwal ◽

Mamoona Naveed Asghar ◽

Mohammad Ali A. Sadiq ◽

Irfan Karamat ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Multiple Features ◽

Standard Format ◽

Free Data

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.

Download Full-text

Prediction of Breast Cancer Using Machine Learning

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190617160834 ◽

2020 ◽

Vol 13 (5) ◽

pp. 901-908

Author(s):

Somil Jain ◽

Puneet Kumar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Prediction Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Breast Cancer Dataset

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.

Download Full-text

Recognition of gasoline in fire debris using machine learning: Part I, Application of Random Forest, Gradient Boosting, Support Vector Machine and Naïve Bayes

Forensic Science International ◽

10.1016/j.forsciint.2021.111146 ◽

2021 ◽

pp. 111146 ◽

Cited By ~ 1

Author(s):

C. Bogdal ◽

R. Schellenberg ◽

O. Höpli ◽

M. Bovens ◽

M. Lory

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting ◽

Support Vector ◽

Fire Debris ◽

In Fire

Download Full-text