scholarly journals A multi-class classification model for supporting the diagnosis of type II diabetes mellitus

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9920
Author(s):  
Kuang-Ming Kuo ◽  
Paul Talley ◽  
YuHsi Kao ◽  
Chi Hsien Huang

Background Numerous studies have utilized machine-learning techniques to predict the early onset of type 2 diabetes mellitus. However, fewer studies have been conducted to predict an appropriate diagnosis code for the type 2 diabetes mellitus condition. Further, ensemble techniques such as bagging and boosting have likewise been utilized to an even lesser extent. The present study aims to identify appropriate diagnosis codes for type 2 diabetes mellitus patients by means of building a multi-class prediction model which is both parsimonious and possessing minimum features. In addition, the importance of features for predicting diagnose code is provided. Methods This study included 149 patients who have contracted type 2 diabetes mellitus. The sample was collected from a large hospital in Taiwan from November, 2017 to May, 2018. Machine learning algorithms including instance-based, decision trees, deep neural network, and ensemble algorithms were all used to build the predictive models utilized in this study. Average accuracy, area under receiver operating characteristic curve, Matthew correlation coefficient, macro-precision, recall, weighted average of precision and recall, and model process time were subsequently used to assess the performance of the built models. Information gain and gain ratio were used in order to demonstrate feature importance. Results The results showed that most algorithms, except for deep neural network, performed well in terms of all performance indices regardless of either the training or testing dataset that were used. Ten features and their importance to determine the diagnosis code of type 2 diabetes mellitus were identified. Our proposed predictive model can be further developed into a clinical diagnosis support system or integrated into existing healthcare information systems. Both methods of application can effectively support physicians whenever they are diagnosing type 2 diabetes mellitus patients in order to foster better patient-care planning.

Author(s):  
Xu Chen ◽  
Zhidong Chen ◽  
Daiyun Xu ◽  
Yonghui Lyu ◽  
Yongxiao Li ◽  
...  

G protein-coupled receptor 40 (GPR40), one of the G protein-coupled receptors that are available to sense glucose metabolism, is an attractive target for the treatment of type 2 diabetes mellitus (T2DM). Despite many efforts having been made to discover small-molecule agonists, there is limited research focus on developing peptides acting as GPR40 agonists to treat T2DM. Here, we propose a novel strategy for peptide design to generate and determine potential peptide agonists against GPR40 efficiently. A molecular fingerprint similarity (MFS) model combined with a deep neural network (DNN) and convolutional neural network was applied to predict the activity of peptides constructed by unnatural amino acids (UAAs). Site-directed mutagenesis (SDM) further optimized the peptides to form specific favorable interactions, and subsequent flexible docking showed the details of the binding mechanism between peptides and GPR40. Molecular dynamics (MD) simulations further verified the stability of the peptide–protein complex. The R-square of the machine learning model on the training set and the test set reached 0.87 and 0.75, respectively; and the three candidate peptides showed excellent performance. The strategy based on machine learning and SDM successfully searched for an optimal design with desirable activity comparable with the model agonist in phase III clinical trials.


2018 ◽  
Vol 7 (9) ◽  
pp. 277 ◽  
Author(s):  
Meng-Hsuen Hsieh ◽  
Li-Min Sun ◽  
Cheng-Li Lin ◽  
Meng-Ju Hsieh ◽  
Kyle Sun ◽  
...  

Objectives: Observational studies suggested that patients with type 2 diabetes mellitus (T2DM) presented a higher risk of developing colorectal cancer (CRC). The current study aims to create a deep neural network (DNN) to predict the onset of CRC for patients with T2DM. Methods: We employed the national health insurance database of Taiwan to create predictive models for detecting an increased risk of subsequent CRC development in T2DM patients in Taiwan. We identified a total of 1,349,640 patients between 2000 and 2012 with newly diagnosed T2DM. All the available possible risk factors for CRC were also included in the analyses. The data were split into training and test sets with 97.5% of the patients in the training set and 2.5% of the patients in the test set. The deep neural network (DNN) model was optimized using Adam with Nesterov’s accelerated gradient descent. The recall, precision, F1 values, and the area under the receiver operating characteristic (ROC) curve were used to evaluate predictor performance. Results: The F1, precision, and recall values of the DNN model across all data were 0.931, 0.982, and 0.889, respectively. The area under the ROC curve of the DNN model across all data was 0.738, compared to the ideal value of 1. The metrics indicate that the DNN model appropriately predicted CRC. In contrast, a single variable predictor using adapted the Diabetes Complication Severity Index showed poorer performance compared to the DNN model. Conclusions: Our results indicated that the DNN model is an appropriate tool to predict CRC risk in patients with T2DM in Taiwan.


2016 ◽  
Vol 11 (4) ◽  
pp. 791-799 ◽  
Author(s):  
Rina Kagawa ◽  
Yoshimasa Kawazoe ◽  
Yusuke Ida ◽  
Emiko Shinohara ◽  
Katsuya Tanaka ◽  
...  

Background: Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects. Objective: We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects. Methods: We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value. Results: The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects. Conclusions: We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users’ objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.


Author(s):  
Muhammad Younus ◽  
Md Tahsir Ahmed Munna ◽  
Mirza Mohtashim Alam ◽  
Shaikh Muhammad Allayear ◽  
Sheikh Joly Ferdous Ara

2019 ◽  
Author(s):  
Chia-Hung Kao

BACKGROUND Breast cancer incidence may be higher among patients with type 2 diabetes mellitus (T2DM) compared with the general population. This study evaluated the performance of three models for predicting breast cancer risk in patients with T2DM. OBJECTIVE This study evaluated the performance of three models for predicting breast cancer risk in patients with T2DM. METHODS In total, 1,267,867 patients with newly diagnosed T2DM between 2000 and 2012 were identified from Taiwan National Health Insurance Research Database. By employing their data, we created prediction models for detecting an increased risk of subsequent breast cancer development in T2DM patients. The available potential risk factors for breast cancer were also collected for adjustment in the analyses. The Synthetic Minority Oversampling Technique (SMOTE) was used to augment data points in the minority class. Each data point was randomly allocated to the training and test sets at a ratio of approximate 39:1. The performance of artificial neural network (ANN), logistic regression (LR), and random forest (RF) models were determined using the recall, precision, F1 score, and area under receiver operating characteristic curve (AUC). RESULTS The AUCs of all three models were significantly higher than the area of 0.5 for the null hypothesis (0.959, 0.865, and 0.834 for RF, ANN, and LR models, respectively). The RF model has the largest AUC among all models; moreover, it had the highest values in all other metrics. CONCLUSIONS Although all three models could accurately predict high breast cancer risk in patients with T2DM in Taiwan, the RF model demonstrated the best performance. CLINICALTRIAL This is not a chinical trial.


Sign in / Sign up

Export Citation Format

Share Document