scholarly journals The comparison of machine learning methods for prediction study of type 2 diabetes mellitus’s drug design

2020 ◽  
Author(s):  
Nadya Asanul Husna ◽  
Alhadi Bustamam ◽  
Arry Yanuar ◽  
Devvi Sarwinda ◽  
Oky Hermansyah
2019 ◽  
Vol 25 (4) ◽  
pp. 248 ◽  
Author(s):  
Shahabeddin Abhari ◽  
Sharareh R. Niakan Kalhori ◽  
Mehdi Ebrahimi ◽  
Hajar Hasannejadasl ◽  
Ali Garavand

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Tadao Ooka ◽  
Hiroshi Yokomichi ◽  
Zentaro Yamagata

Abstract Background Major barriers exist in incorporating artificial intelligence into epidemiology, particularly in data interpretation. Thus, we examined the application of highly interpretable machine-learning methods— Random Forest (RF) and Sparse Logistic Regression (SLR)— to a large-scale health check-up dataset, examining the advantages of creating prediction models using these. Methods This study involved 392,791 participants who underwent healthcare checkups in Japan from 1999 to 2018. Participants who received diabetes treatment, or had an HbA1c level of 6.5% or higher, were excluded. The objective variable examined was type 2 diabetes onset over five years. Each prediction model was created using 26 health status items over three consecutive years. We examined three analytical methods to compare their predictive powers: RF, SLR, and a multivariate stepwise logistic regression (MSLR) as a conventional method. Variable Importance (VI) was calculated in the RF analysis, with Standard Regression Coefficients (SRC) being calculated in the SLR and MSLR analyses. Results Predictive accuracy is highest in the SLR model (AUC:0.955), followed by the RF model (AUC:0.949), and then the MSLR model (AUC:0.939). The RF model measures blood glucose, HbA1c, height, red blood cells, and aspartate transaminase with a higher predictive power. In the SLR model, HbA1c, blood glucose, systolic blood pressure, HDL-Cholesterol, and age have higher SRC. Conclusions Machine learning techniques enable more accurate diabetes risk predictions than existing methods and suggest new ways of identifying associated predictors. Key messages Applying machine-learning methods to health check-up data achieves a high accuracy in predicting type 2 diabetes while maintaining data interpretability.


Author(s):  
Yue You ◽  
Svetlana V. Doubova ◽  
Diana Pinto-Masis ◽  
Ricardo Pérez-Cuevas ◽  
Víctor Hugo Borja-Aburto ◽  
...  

Abstract Background The study aimed to assess the performance of a multidisciplinary-team diabetes care program called DIABETIMSS on glycemic control of type 2 diabetes (T2D) patients, by using available observational patient data and machine-learning-based targeted learning methods. Methods We analyzed electronic health records and laboratory databases from the year 2012 to 2016 of T2D patients from six family medicine clinics (FMCs) delivering the DIABETIMSS program, and five FMCs providing routine care. All FMCs belong to the Mexican Institute of Social Security and are in Mexico City and the State of Mexico. The primary outcome was glycemic control. The study covariates included: patient sex, age, anthropometric data, history of glycemic control, diabetic complications and comorbidity. We measured the effects of DIABETIMSS program through 1) simple unadjusted mean differences; 2) adjusted via standard logistic regression and 3) adjusted via targeted machine learning. We treated the data as a serial cross-sectional study, conducted a standard principal components analysis to explore the distribution of covariates among clinics, and performed regression tree on data transformed to use the prediction model to identify patient sub-groups in whom the program was most successful. To explore the robustness of the machine learning approaches, we conducted a set of simulations and the sensitivity analysis with process-of-care indicators as possible confounders. Results The study included 78,894 T2D patients, from which 37,767patients received care through DIABETIMSS. The impact of DIABETIMSS ranged, among clinics, from 2 to 8% improvement in glycemic control, with an overall (pooled) estimate of 5% improvement. T2D patients with fewer complications have more significant benefit from DIABETIMSS than those with more complications. At the FMC’s delivering the conventional model the predicted impacts were like what was observed empirically in the DIABETIMSS clinics. The sensitivity analysis did not change the overall estimate average across clinics. Conclusions DIABETIMSS program had a small, but significant increase in glycemic control. The use of machine learning methods yields both population-level effects and pinpoints the sub-groups of patients the program benefits the most. These methods exploit the potential of routine observational patient data within complex healthcare systems to inform decision-makers.


Author(s):  
Michela Taufer ◽  
Trilce Estrada ◽  
Travis Johnston

This paper presents the survey of three algorithms to transform atomic-level molecular snapshots from molecular dynamics (MD) simulations into metadata representations that are suitable for in situ analytics based on machine learning methods. MD simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have a dramatically higher performance than current systems, generating more data that needs to be analysed (e.g. in terms of number and length of MD trajectories). In the future, the coordination of data generation and analysis can no longer rely on manual, centralized analysis traditionally performed after the simulation is completed or on current data representations that have been defined for traditional visualization tools. Powerful data preparation phases (i.e. phases in which original row data is transformed to concise and still meaningful representations) will need to proceed data analysis phases. Here, we discuss three algorithms for transforming traditionally used molecular representations into concise and meaningful metadata representations. The transformations can be performed locally. The new metadata can be fed into machine learning methods for runtime in situ analysis of larger MD trajectories supported by high-performance computing. In this paper, we provide an overview of the three algorithms and their use for three different applications: protein–ligand docking in drug design; protein folding simulations; and protein engineering based on analytics of protein functions depending on proteins' three-dimensional structures. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Sign in / Sign up

Export Citation Format

Share Document