Iterative Update of a Random Forest Classifier for Diabetic Retinopathy

Mapping Intimacies ◽

10.3233/faia210136 ◽

2021 ◽

Author(s):

Jordi Pascual-Fontanilles ◽

Aida Valls ◽

Antonio Moreno ◽

Pedro Romero-Aroca

Keyword(s):

Machine Learning ◽

Diabetic Retinopathy ◽

Random Forest ◽

Random Forests ◽

Random Forest Classifier ◽

Diabetic Patients ◽

Similar Data ◽

Machine Learning Classification ◽

Inherent Ambiguity ◽

New Iterative Method

Random Forests are well-known Machine Learning classification mechanisms based on a collection of decision trees. In the last years, they have been applied to assess the risk of diabetic patients to develop Diabetic Retinopathy. The results have been good, despite the unbalance of data between classes and the inherent ambiguity of the problem (patients with similar data may belong to different classes). In this work we propose a new iterative method to update the set of trees in the Random Forest by considering trees generated from the data of the new patients that are visited in the medical centre. With this method, it has been possible to improve the results obtained with standard Random Forests.

Download Full-text

ResNet Based Deep Features and Random Forest Classifier for Diabetic Retinopathy Detection

Sensors ◽

10.3390/s21113883 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3883

Author(s):

Muhammad Kashif Yaqoob ◽

Syed Farooq Ali ◽

Muhammad Bilal ◽

Muhammad Shehzad Hanif ◽

Ubaid M. Al-Saggaf

Keyword(s):

Diabetic Retinopathy ◽

Deep Learning ◽

Random Forest ◽

Macular Edema ◽

Diabetic Macular Edema ◽

State Of The Art ◽

Random Forest Classifier ◽

Eye Disease ◽

Diabetic Patients ◽

The Right

Diabetic retinopathy, an eye disease commonly afflicting diabetic patients, can result in loss of vision if prompt detection and treatment are not done in the early stages. Once the symptoms are identified, the severity level of the disease needs to be classified for prescribing the right medicine. This study proposes a deep learning-based approach, for the classification and grading of diabetic retinopathy images. The proposed approach uses the feature map of ResNet-50 and passes it to Random Forest for classification. The proposed approach is compared with five state-of-the-art approaches using two category Messidor-2 and five category EyePACS datasets. These two categories on the Messidor-2 dataset include ’No Referable Diabetic Macular Edema Grade (DME)’ and ’Referable DME’ while five categories consist of ‘Proliferative diabetic retinopathy’, ‘Severe’, ‘Moderate’, ‘Mild’, and ‘No diabetic retinopathy’. The results show that the proposed approach outperforms compared approaches and achieves an accuracy of 96% and 75.09% for these datasets, respectively. The proposed approach outperforms six existing state-of-the-art architectures, namely ResNet-50, VGG-19, Inception-v3, MobileNet, Xception, and VGG16.

Download Full-text

Human activity recognition based on machine learning classification of smartwatch accelerometer dataset

FME Transaction ◽

10.5937/fme2101225r ◽

2021 ◽

Vol 49 (1) ◽

pp. 225-232

Author(s):

Dušan Radivojević ◽

Nikola Mirkov ◽

Slobodan Maletić

Keyword(s):

Machine Learning ◽

Random Forest ◽

Time Series Data ◽

Test Group ◽

Wearable Devices ◽

Random Forest Classifier ◽

Series Data ◽

Machine Learning Classification ◽

Improving Accuracy

This paper presents two Machine Learning models that classify time series data given from smartwatch accelerometer of observed subjects. For the purpose of classification we use Deep Neural Network and Random Forest classifier algorithms. The comparison of both models shows that they have similar performance with regard to recognition of subject's activities that are used in the test group of the dataset. Training accuracy reaches approximately 95% and 100% for Deep Learning and Random Forest model respectively. Since the validation and recognition, reached about 81% and 75% respectively, a tendency for improving accuracy as a function of number of participants is considered. The influence of data sample precision to the accuracy of the models is examined since the input data could be given from various wearable devices.

Download Full-text

Predicting stroke risk by Migraine using AI

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217656 ◽

2021 ◽

pp. 285-290

Author(s):

Anchal Singh ◽

Dr. Surabhi Thorat

Keyword(s):

Machine Learning ◽

Random Forest ◽

Early Stage ◽

Blood Clot ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Machine Learning Classification ◽

Life Threatening ◽

Confusion Matrices ◽

The Brain

Stroke is a blood clot or bleeds in the brain, which can make permanent damage that has an effect on mobility, cognition, sight or communication. It is the second leading cause of death worldwide and one of the most life- threatening diseases for persons above 65 years. It damages the brain like “heart attack” which damages the heart. Every 4 minutes someone dies of stroke, but up to 80% of stroke can be prevented if we can identify or predict the occurrence of stroke in its early stage. In this paper, I used different types of machine learning algorithms for stroke prediction on the Healthcare Dataset Stroke data. Four types of machine learning classification algorithms were applied; Linear Regression, Confusion matrices, Random Forest Classifier, and Logistic Regression were used to build the stroke prediction model. Support, Precision, Recall, and F1-score were used to calculate performance measures of machine learning models. The results showed that Random Forest Classifier has achieved the best accuracy at 94 % [1].

Download Full-text

Interrogation of Sentiment Perusing with Hash Counting Vectorizer and Term Inverse Frequency Transformer using Machine Learning Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8303.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 3895-3901

Keyword(s):

Machine Learning ◽

Random Forest ◽

Customer Satisfaction ◽

Current Trend ◽

Random Forest Classifier ◽

Gradient Boosting ◽

Svm Classifier ◽

Bayes Classifier ◽

Machine Learning Classification ◽

Tree Classifier

With the fast growing technology, the business is moving towards increasing their profit by interpreting the customer satisfaction. The customer satisfaction can be analyzed in many ways. It is the responsibility of the business to analyze the customer satisfaction in order to improve their turnover and profit. With the current trend, the customers are giving their feedback through mobile and internet. With this overview, this paper attempts to analyze the sentiment of the customer feedback for the movie. The sentiment Analysis on movie Review dataset from the KAGGLE Machine learning repository is used for implementation. The type of sentiment classes is predicted through the following ways. Firstly, the sentiment count for each class is displayed and the top feature words for each sentiment class are also extracted from the dataset. Secondly, the dataset is sampled with counting vectorizer and then fitted with the classifiers like Logistic Regression Classifier, Linear SVM Classifier, Multinomial Naives Bayes Classifier, Gradient Boosting Classifer, Guassian Naive Bayes Classifier, Random Forest Classifier, Decision Tree Classifier and and Extra Tree Classifier. Thirdly, the dataset is sampled with Hashing vectorizer and then fitted with the above specified classifiers. Fourth, the dataset is sampled with TFIFD vectorizer and then fitted with the above specified classifiers. Fifth, the dataset is sampled with TFIFD Transformer and then fitted with the above specified classifiers. Sixth, the Performance analysis of classifiers is performed by analyzing the metrics like Precision, Recall, Fscore and Accuracy. The implementation is carried out using python code in Spyder Anaconda Navigator IP Console. Experimental results shows that the analysis of sentiment done by the random forest classifier is found to be more effective with the Accuracy of 89% for Counting vectorizer and TFIFD transformer, Accuracy of 87% for Hashing vectorizer and Accuracy of 88% for TFIFD vectorizer.

Download Full-text

Amide proton transfer weighted (APTw) imaging based radiomics allows for the differentiation of gliomas from metastases

Scientific Reports ◽

10.1038/s41598-021-85168-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Elisabeth Sartoretti ◽

Thomas Sartoretti ◽

Michael Wyss ◽

Carolin Reischauer ◽

Luuk van Smoorenburg ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brain Tumors ◽

Proton Transfer ◽

Multilayer Perceptron ◽

Random Forest Classifier ◽

Amide Proton ◽

Low Grade ◽

Who Grade ◽

Amide Proton Transfer

AbstractWe sought to evaluate the utility of radiomics for Amide Proton Transfer weighted (APTw) imaging by assessing its value in differentiating brain metastases from high- and low grade glial brain tumors. We retrospectively identified 48 treatment-naïve patients (10 WHO grade 2, 1 WHO grade 3, 10 WHO grade 4 primary glial brain tumors and 27 metastases) with either primary glial brain tumors or metastases who had undergone APTw MR imaging. After image analysis with radiomics feature extraction and post-processing, machine learning algorithms (multilayer perceptron machine learning algorithm; random forest classifier) with stratified tenfold cross validation were trained on features and were used to differentiate the brain neoplasms. The multilayer perceptron achieved an AUC of 0.836 (receiver operating characteristic curve) in differentiating primary glial brain tumors from metastases. The random forest classifier achieved an AUC of 0.868 in differentiating WHO grade 4 from WHO grade 2/3 primary glial brain tumors. For the differentiation of WHO grade 4 tumors from grade 2/3 tumors and metastases an average AUC of 0.797 was achieved. Our results indicate that the use of radiomics for APTw imaging is feasible and the differentiation of primary glial brain tumors from metastases is achievable with a high degree of accuracy.

Download Full-text

Diagnosing asthma and chronic obstructive pulmonary disease with machine learning

Health Informatics Journal ◽

10.1177/1460458217723169 ◽

2017 ◽

Vol 25 (3) ◽

pp. 811-827 ◽

Cited By ~ 15

Author(s):

Dimitris Spathis ◽

Panayiotis Vlamos

Keyword(s):

Machine Learning ◽

Chronic Obstructive Pulmonary Disease ◽

Random Forest ◽

Pulmonary Disease ◽

Clinical Decision Support Systems ◽

Clinical Decision ◽

Forced Expiratory Volume ◽

Random Forest Classifier ◽

Chronic Obstructive ◽

Obstructive Pulmonary Disease

This study examines the clinical decision support systems in healthcare, in particular about the prevention, diagnosis and treatment of respiratory diseases, such as Asthma and chronic obstructive pulmonary disease. The empirical pulmonology study of a representative sample (n = 132) attempts to identify the major factors that contribute to the diagnosis of these diseases. Machine learning results show that in chronic obstructive pulmonary disease’s case, Random Forest classifier outperforms other techniques with 97.7 per cent precision, while the most prominent attributes for diagnosis are smoking, forced expiratory volume 1, age and forced vital capacity. In asthma’s case, the best precision, 80.3 per cent, is achieved again with the Random Forest classifier, while the most prominent attribute is MEF2575.

Download Full-text

Species-specific audio detection: A comparison of three template-based classification algorithms using random forests

10.7287/peerj.preprints.2713 ◽

2017 ◽

Author(s):

Carlos J Corrada Bravo ◽

Rafael Álvarez Berríos ◽

T. Mitchell Aide

Keyword(s):

Random Forest ◽

Random Forests ◽

Random Forest Classifier ◽

Classification Algorithms ◽

Statistical Features ◽

Web Based ◽

Average Accuracy ◽

Species Specific ◽

Web Based System

We developed a web-based cloud-hosted system that allow users to archive, listen, visualize, and annotate recordings. The system also provides tools to convert these annotations into datasets that can be used to train a computer to detect the presence or absence of a species. The algorithm used by the system was selected after comparing the accuracy and efficiency of three variants of a template-based classification. The algorithm computes a similarity vector by comparing a template of a species call with time increments across the spectrogram. Statistical features are extracted from this vector and used as input for a Random Forest classifier that predicts presence or absence of the species in the recording. The fastest algorithm variant had the highest average accuracy and specificity; therefore, it was implemented in the ARBIMON web-based system.

Download Full-text

Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Download Full-text

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Download Full-text

A Machine Learning Approach to Detect Student Dropout at University

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/041062021 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3101-3107

Keyword(s):

Machine Learning ◽

Random Forest ◽

Dropout Rate ◽

Random Forest Classifier ◽

Drop Out ◽

Dropout Rates ◽

Learning Technology ◽

Student Dropout ◽

High Dropout Rate ◽

Academic Plan

In universities, student dropout is a major concern that reflects the university's quality. Some characteristics cause students to drop out of university. A high dropout rate of students affects the university's reputation and the student's careers in the future. Therefore, there's a requirement for student dropout analysis to enhance academic plan and management to scale back student's drop out from the university also on enhancing the standard of the upper education system. The machine learning technique provides powerful methods for the analysis and therefore the prediction of the dropout. This study uses a dataset from a university representative to develop a model for predicting student dropout. In this work, machine- learning models were used to detect dropout rates. Machine learning is being more widely used in the field of knowledge mining diagnostics. Following an examination of certain studies, we observed that dropout detection may be done using several methods. We've even used five dropout detection models. These models are Decision tree, Naïve bayes, Random Forest Classifier, SVM and KNN. We used machine-learning technology to analyze the data, and we discovered that the Random Forest classifier is highly promising for predicting dropout rates, with a training accuracy of 94% and a testing accuracy of 86%.

Download Full-text