Predicting complications of diabetes mellitus through machine learning based on topic modeling: study design (Preprint)

Mapping Intimacies ◽

10.2196/preprints.25550 ◽

2020 ◽

Author(s):

Benedict Han ◽

Jinwook Choi

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Topic Modeling ◽

Supervised Classification ◽

Topic Models ◽

Support Vector ◽

Alcoholic Fatty Liver ◽

Complications Of Diabetes ◽

Clinical Notes ◽

Outpatient Departments

BACKGROUND Predicting the complications of diabetes mellitus from an early stage would be beneficial for its management. Topic modeling is a posterior procedure to estimate semantic objects in a dataset through a statistical approach. The topic model can play the role of a feature set for supervised classification. OBJECTIVE : We performed a study to predict diabetic retinopathy (DMR), diabetic nephropathy (DMN), and non-alcoholic fatty liver disease (NAFLD) from clinical notes using semi-supervised classification based on topic modeling. METHODS : We applied four types of machine learning algorithms for classification: random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), and fully connected artificial neural network (ANN) We reviewed the topic models through statistical analysis to determine whether these topic models are clinically plausible. RESULTS F1 scores were above 0.8 when predicting all kinds of target diseases with all types of classification methods, and above 0.9 using RF or GBM. Hypertension and dyslipidemia seem to be statistically associated with DMR, DMN, and NAFLD. They may be important clues with which we can predict DMR, DMN, and NAFLD. CONCLUSIONS This study showed that complications of diabetes mellitus that are likely to occur later in life can be predicted from the clinical notes of outpatient departments. We believe that this kind of predictive model could be utilized by patients and physicians in outpatient departments as a useful tool, similar to clinical decision support systems.

Download Full-text

A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus

Applied Sciences ◽

10.3390/app11041742 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1742

Author(s):

Ignacio Rodríguez-Rodríguez ◽

José-Víctor Rodríguez ◽

Wai Lok Woo ◽

Bo Wei ◽

Domingo-Javier Pardo-Quiles

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 1 Diabetes ◽

Feature Selection ◽

Blood Glucose ◽

Type 1 Diabetes Mellitus ◽

Support Vector ◽

Chronic Hyperglycemia ◽

Predictive Algorithms

Type 1 diabetes mellitus (DM1) is a metabolic disease derived from falls in pancreatic insulin production resulting in chronic hyperglycemia. DM1 subjects usually have to undertake a number of assessments of blood glucose levels every day, employing capillary glucometers for the monitoring of blood glucose dynamics. In recent years, advances in technology have allowed for the creation of revolutionary biosensors and continuous glucose monitoring (CGM) techniques. This has enabled the monitoring of a subject’s blood glucose level in real time. On the other hand, few attempts have been made to apply machine learning techniques to predicting glycaemia levels, but dealing with a database containing such a high level of variables is problematic. In this sense, to the best of the authors’ knowledge, the issues of proper feature selection (FS)—the stage before applying predictive algorithms—have not been subject to in-depth discussion and comparison in past research when it comes to forecasting glycaemia. Therefore, in order to assess how a proper FS stage could improve the accuracy of the glycaemia forecasted, this work has developed six FS techniques alongside four predictive algorithms, applying them to a full dataset of biomedical features related to glycaemia. These were harvested through a wide-ranging passive monitoring process involving 25 patients with DM1 in practical real-life scenarios. From the obtained results, we affirm that Random Forest (RF) as both predictive algorithm and FS strategy offers the best average performance (Root Median Square Error, RMSE = 18.54 mg/dL) throughout the 12 considered predictive horizons (up to 60 min in steps of 5 min), showing Support Vector Machines (SVM) to have the best accuracy as a forecasting algorithm when considering, in turn, the average of the six FS techniques applied (RMSE = 20.58 mg/dL).

Download Full-text

CAN A MACHINE LEARNING ALGORITHM IDENTIFY SARS-COV-2 VARIANTS BASED ON CONVENTIONAL rRT-PCR? PROOF OF CONCEPT

10.1101/2021.11.12.21266286 ◽

2021 ◽

Author(s):

jorge cabrera Alvargonzalez ◽

Ana Larranaga Janeiro ◽

Sonia Perez ◽

Javier Martinez Torres ◽

Lucia martinez lamas ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Supervised Classification ◽

Learning Algorithm ◽

Support Vector ◽

Classification Algorithms ◽

Machine Learning Algorithm ◽

Proof Of Concept ◽

The Past ◽

Number Of Cycles

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been and remains one of the major challenges humanity has faced thus far. Over the past few months, large amounts of information have been collected that are only now beginning to be assimilated. In the present work, the existence of residual information in the massive numbers of rRT-PCRs that tested positive out of the almost half a million tests that were performed during the pandemic is investigated. This residual information is believed to be highly related to a pattern in the number of cycles that are necessary to detect positive samples as such. Thus, a database of more than 20,000 positive samples was collected, and two supervised classification algorithms (a support vector machine and a neural network) were trained to temporally locate each sample based solely and exclusively on the number of cycles determined in the rRT-PCR of each individual. Finally, the results obtained from the classification show how the appearance of each wave is coincident with the surge of each of the variants present in the region of Galicia (Spain) during the development of the SARS-CoV-2 pandemic and clearly identified with the classification algorithm.

Download Full-text

Predictive Analysis of Diabetes Mellitus Using Machine Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9207 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3449-3452

Author(s):

M. S. Roobini ◽

Y. Sai Satwick ◽

A. Anil Kumar Reddy ◽

M. Lakshmi ◽

D. Deepa ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Early Stage ◽

The Body ◽

Machine Learning Techniques ◽

Support Vector ◽

Pima Indians ◽

Classification Techniques ◽

Learning Techniques ◽

Prediction Of Diabetes

In today’s world diabetes is the major health challenges in India. It is a group of a syndrome that results in too much sugar in the blood. It is a protracted condition that affects the way the body mechanizes the blood sugar. Prevention and prediction of diabetes mellitus is increasingly gaining interest in medical sciences. The aim is how to predict at an early stage of diabetes using different machine learning techniques. In this paper basically, we use well-known classification that are Decision tree, K-Nearest Neighbors, Support Vector Machine, and Random forest. These classification techniques used with Pima Indians diabetes dataset. Therefore, we predict diabetes at different stage and analyze the performance of different classification techniques. We Also proposed a conceptual model for the prediction of diabetes mellitus using different machine learning techniques. In this paper we also compare the accuracy of the different machine learning techniques to finding the diabetes mellitus at early stage.

Download Full-text

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach (Preprint)

10.2196/preprints.18273 ◽

2020 ◽

Author(s):

Sicheng Zhou ◽

Yunpeng Zhao ◽

Jiang Bian ◽

Ann F Haynos ◽

Rui Zhang

Keyword(s):

Machine Learning ◽

Topic Modeling ◽

Short Term Memory ◽

Topic Model ◽

Modeling Method ◽

Mental Illnesses ◽

Computational Method ◽

Supervised Machine Learning ◽

Support Vector ◽

Domain Expert

BACKGROUND Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. OBJECTIVE This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. METHODS We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. RESULTS A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F<sub>1</sub> score=0.89) and then promotional versus published by laypeople (F<sub>1</sub> score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. CONCLUSIONS A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders.

Download Full-text

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

Applied Sciences ◽

10.3390/app11188438 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8438

Author(s):

Muhammad Mujahid ◽

Ernesto Lee ◽

Furqan Rustam ◽

Patrick Bernard Washington ◽

Saleem Ullah ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Online Education ◽

Sentiment Analysis ◽

Topic Modeling ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

E Learning ◽

Machine Learning Models

Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 tweets about e-learning. Machine learning and deep learning approaches have shown their suitability, capability, and potential for image processing, object detection, and natural language processing tasks and text analysis is no exception. Machine learning approaches have been largely used both for annotation and text and sentiment analysis. Keeping in view the adequacy and efficacy of machine learning models, this study adopts TextBlob, VADER (Valence Aware Dictionary for Sentiment Reasoning), and SentiWordNet to analyze the polarity and subjectivity score of tweets’ text. Furthermore, bearing in mind the fact that machine learning models display high classification accuracy, various machine learning models have been used for sentiment classification. Two feature extraction techniques, TF-IDF (Term Frequency-Inverse Document Frequency) and BoW (Bag of Words) have been used to effectively build and evaluate the models. All the models have been evaluated in terms of various important performance metrics such as accuracy, precision, recall, and F1 score. The results reveal that the random forest and support vector machine classifier achieve the highest accuracy of 0.95 when used with Bow features. Performance comparison is carried out for results of TextBlob, VADER, and SentiWordNet, as well as classification results of machine learning models and deep learning models such as CNN (Convolutional Neural Network), LSTM (Long Short Term Memory), CNN-LSTM, and Bi-LSTM (Bidirectional-LSTM). Additionally, topic modeling is performed to find the problems associated with e-learning which indicates that uncertainty of campus opening date, children’s disabilities to grasp online education, and lagging efficient networks for online education are the top three problems.

Download Full-text

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

Dirichlet Allocation

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text

Skin complications of diabetes mellitus revealed by polarized hyperspectral imaging and machine learning

IEEE Transactions on Medical Imaging ◽

10.1109/tmi.2021.3049591 ◽

2021 ◽

pp. 1-1

Author(s):

Viktor Dremin ◽

Zbignevs Marcinkevics ◽

Evgeny Zherebtsov ◽

Alexey Popov ◽

Andris Grabovskis ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Hyperspectral Imaging ◽

Complications Of Diabetes

Download Full-text

Mathematical Models and Machine Learning Algorithms in the Diagnosis of Complications of Type 1 Diabetes Mellitus

Izvestiya of Altai State University ◽

10.14258/izvasu(2021)1-16 ◽

2021 ◽

pp. 97-101

Author(s):

O.S. Krotova ◽

L.A. Khvorova ◽

A.I. Piyanzin

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 1 Diabetes ◽

Type 1 Diabetes Mellitus ◽

Children And Adolescents ◽

Diabetic Polyneuropathy ◽

Medical Data ◽

Classification Model ◽

Complications Of Diabetes

The paper deals with the problem of diabetic polyneuropathy diagnosing. This is one of the earliest and most dangerous complications of diabetes among children and adolescents. The research aims to develop models for diagnosing diabetic polyneuropathy in children and adolescents based on various medical data. The developed models will make it possible to diagnose a complication without using neurophysiological research methods. Therefore, the proposed models can be used in small medical and obstetrical stations in rural areas as well as a support system for making medical decisions. In the course of the study, a review and analysis of scientific publications of domestic and foreign scientists on the topic of the research are carried out. A large set of textual medical data is processed, then a database is created, features are analyzed, and a model is developed to reveal the presence of diabetic polyneuropathy in children and adolescents with type 1 diabetes mellitus. The achieved quality of the classification model allows us to assert that machine learning methods can be used to find hidden dependencies in the development and course of complications of diabetes mellitus.

Download Full-text

Kannada morpheme segmentation using machine learning

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.31.13395 ◽

2018 ◽

Vol 7 (2.31) ◽

pp. 45

Author(s):

Sachi Angle ◽

B Ashwath Rao ◽

S N. Muralikrishna

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Supervised Classification ◽

Morphological Structure ◽

Support Vector ◽

Morphological Form ◽

Morphological Segmentation ◽

Vector Machines ◽

Root Word ◽

Agglutinative Language

This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various methods for evaluation were considered and an accuracy of 85.97% was achieved.

Download Full-text

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

JMIR Medical Informatics ◽

10.2196/18273 ◽

2020 ◽

Vol 8 (10) ◽

pp. e18273

Author(s):

Sicheng Zhou ◽

Yunpeng Zhao ◽

Jiang Bian ◽

Ann F Haynos ◽

Rui Zhang

Keyword(s):

Machine Learning ◽

Topic Modeling ◽

Short Term Memory ◽

Topic Model ◽

Modeling Method ◽

Mental Illnesses ◽

Computational Method ◽

Supervised Machine Learning ◽

Support Vector ◽

Domain Expert

Background Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. Objective This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. Methods We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. Results A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F1 score=0.89) and then promotional versus published by laypeople (F1 score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. Conclusions A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders.

Download Full-text