KinasepKipred: A Predictive Model for Estimating Ligand-Kinase Inhibitor Constant (pKi)

Mapping Intimacies ◽

10.1101/798561 ◽

2019 ◽

Author(s):

KC Govinda ◽

Md Mahmudulla Hassan ◽

Suman Sirimulla

Keyword(s):

Machine Learning ◽

Computational Models ◽

Kinase Inhibitor ◽

Pearson Correlation ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Models ◽

Link Type ◽

Extreme Gradient Boosting ◽

Machine Learning Models

AbstractKinases are one of the most important classes of drug targets for therapeutic use. Algorithms that can accurately predict the drug-kinase inhibitor constant (pKi) of kinases can considerably accelerate the drug discovery process. In this study, we have developed computational models, leveraging machine learning techniques, to predict ligand-kinase (pKi) values. Kinase-ligand inhibitor constant (Ki) data was retrieved from Drug Target Commons (DTC) and Metz databases. Machine learning models were developed based on structural and physicochemical features of the protein and, topological pharmacophore atomic triplets fingerprints of the ligands. Three machine learning models [random forest (RFR), extreme gradient boosting (XGBoost) and artificial neural network (ANN)] were tested for model development. The performance of our models were evaluated using several metrics with 95% confidence interval. RFR model was finally selected based on the evaluation metrics on test datasets and used for web implementation. The best and selected model achieved a Pearson correlation coefficient (R) of 0.887 (0.881, 0.893), root-mean-square error (RMSE) of 0.475 (0.465, 0.486), Concordance index (Con. Index) of 0.854 (0.851, 0.858), and an area under the curve of receiver operating characteristic curve (AUC-ROC) of 0.957 (0.954, 0.960) during the internal 5-fold cross validation.AvailabilityGitHub: https://github.com/sirimullalab/KinasepKipred, Docker: sirimullalab/kinasepkipredImplementationhttps://drugdiscovery.utep.edu/pki/Graphical TOC Entry

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Estimation of Chlorophyll-a Concentrations in Small Water Bodies: Comparison of Fused Gaofen-6 and Sentinel-2 Sensors

Remote Sensing ◽

10.3390/rs14010229 ◽

2022 ◽

Vol 14 (1) ◽

pp. 229

Author(s):

Jiarui Shi ◽

Qian Shen ◽

Yue Yao ◽

Junsheng Li ◽

Fu Chen ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll A ◽

Water Bodies ◽

Gradient Boosting ◽

Learning Models ◽

Small Water ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Sentinel 2 ◽

Small Water Bodies

Chlorophyll-a concentrations in water bodies are one of the most important environmental evaluation indicators in monitoring the water environment. Small water bodies include headwater streams, springs, ditches, flushes, small lakes, and ponds, which represent important freshwater resources. However, the relatively narrow and fragmented nature of small water bodies makes it difficult to monitor chlorophyll-a via medium-resolution remote sensing. In the present study, we first fused Gaofen-6 (a new Chinese satellite) images to obtain 2 m resolution images with 8 bands, which was approved as a good data source for Chlorophyll-a monitoring in small water bodies as Sentinel-2. Further, we compared five semi-empirical and four machine learning models to estimate chlorophyll-a concentrations via simulated reflectance using fused Gaofen-6 and Sentinel-2 spectral response function. The results showed that the extreme gradient boosting tree model (one of the machine learning models) is the most accurate. The mean relative error (MRE) was 9.03%, and the root-mean-square error (RMSE) was 4.5 mg/m3 for the Sentinel-2 sensor, while for the fused Gaofen-6 image, MRE was 6.73%, and RMSE was 3.26 mg/m3. Thus, both fused Gaofen-6 and Sentinel-2 could estimate the chlorophyll-a concentrations in small water bodies. Since the fused Gaofen-6 exhibited a higher spatial resolution and Sentinel-2 exhibited a higher temporal resolution.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Machine Learning Models of Acute Kidney Injury Prediction in Acute Pancreatitis Patients

Gastroenterology Research and Practice ◽

10.1155/2020/3431290 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Cheng Qu ◽

Lin Gao ◽

Xian-qiang Yu ◽

Mei Wei ◽

Guo-quan Fang ◽

...

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Acute Pancreatitis ◽

Logistic Regression ◽

Kidney Injury ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Background. Acute kidney injury (AKI) has long been recognized as a common and important complication of acute pancreatitis (AP). In the study, machine learning (ML) techniques were used to establish predictive models for AKI in AP patients during hospitalization. This is a retrospective review of prospectively collected data of AP patients admitted within one week after the onset of abdominal pain to our department from January 2014 to January 2019. Eighty patients developed AKI after admission (AKI group) and 254 patients did not (non-AKI group) in the hospital. With the provision of additional information such as demographic characteristics or laboratory data, support vector machine (SVM), random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost) were used to build models of AKI prediction and compared to the predictive performance of the classic model using logistic regression (LR). XGBoost performed best in predicting AKI with an AUC of 91.93% among the machine learning models. The AUC of logistic regression analysis was 87.28%. Present findings suggest that compared to the classical logistic regression model, machine learning models using features that can be easily obtained at admission had a better performance in predicting AKI in the AP patients.

Download Full-text

Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction

Neural Computing and Applications ◽

10.1007/s00521-021-06362-3 ◽

2021 ◽

Author(s):

Hai Tao ◽

Salih Muhammad Awadh ◽

Sinan Q. Salih ◽

Shafik S. Shafik ◽

Zaher Mundher Yaseen

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Relative Humidity ◽

Gradient Boosting ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Selection Approach ◽

Feature Selection Approach ◽

Machine Learning Models

Download Full-text

The prediction of asymptomatic carotid atherosclerosis with electronic health records: a comparative study of six machine learning models

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01480-3 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jiaxin Fan ◽

Mengying Chen ◽

Jian Luo ◽

Shusen Yang ◽

Jinming Shi ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Carotid Atherosclerosis ◽

Characteristic Curve ◽

Gradient Boosting ◽

Learning Models ◽

Health Records ◽

Extreme Gradient Boosting ◽

Electronic Health ◽

Machine Learning Models

Abstract Background Screening carotid B-mode ultrasonography is a frequently used method to detect subjects with carotid atherosclerosis (CAS). Due to the asymptomatic progression of most CAS patients, early identification is challenging for clinicians, and it may trigger ischemic stroke. Recently, machine learning has shown a strong ability to classify data and a potential for prediction in the medical field. The combined use of machine learning and the electronic health records of patients could provide clinicians with a more convenient and precise method to identify asymptomatic CAS. Methods Retrospective cohort study using routine clinical data of medical check-up subjects from April 19, 2010 to November 15, 2019. Six machine learning models (logistic regression [LR], random forest [RF], decision tree [DT], eXtreme Gradient Boosting [XGB], Gaussian Naïve Bayes [GNB], and K-Nearest Neighbour [KNN]) were used to predict asymptomatic CAS and compared their predictability in terms of the area under the receiver operating characteristic curve (AUCROC), accuracy (ACC), and F1 score (F1). Results Of the 18,441 subjects, 6553 were diagnosed with asymptomatic CAS. Compared to DT (AUCROC 0.628, ACC 65.4%, and F1 52.5%), the other five models improved prediction: KNN + 7.6% (0.704, 68.8%, and 50.9%, respectively), GNB + 12.5% (0.753, 67.0%, and 46.8%, respectively), XGB + 16.0% (0.788, 73.4%, and 55.7%, respectively), RF + 16.6% (0.794, 74.5%, and 56.8%, respectively) and LR + 18.1% (0.809, 74.7%, and 59.9%, respectively). The highest achieving model, LR predicted 1045/1966 cases (sensitivity 53.2%) and 3088/3566 non-cases (specificity 86.6%). A tenfold cross-validation scheme further verified the predictive ability of the LR. Conclusions Among machine learning models, LR showed optimal performance in predicting asymptomatic CAS. Our findings set the stage for an early automatic alarming system, allowing a more precise allocation of CAS prevention measures to individuals probably to benefit most.

Download Full-text

A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya

10.20944/preprints202010.0186.v1 ◽

2020 ◽

Author(s):

Nelson Yego ◽

Juma Kasozi ◽

Joseph Nkrunziza

Keyword(s):

Machine Learning ◽

Random Forest ◽

Characteristic Curve ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Sampled Data ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.

Download Full-text

Machine learning models for screening carotid atherosclerosis in asymptomatic adults

Scientific Reports ◽

10.1038/s41598-021-01456-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jian Yu ◽

Yan Zhou ◽

Qiong Yang ◽

Xiaoling Liu ◽

Lili Huang ◽

...

Keyword(s):

Machine Learning ◽

Physical Examination ◽

Carotid Atherosclerosis ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Cerebrovascular Events ◽

Extreme Gradient Boosting ◽

Routine Physical Examination ◽

Machine Learning Models

AbstractCarotid atherosclerosis (CAS) is a risk factor for cardiovascular and cerebrovascular events, but duplex ultrasonography isn’t recommended in routine screening for asymptomatic populations according to medical guidelines. We aim to develop machine learning models to screen CAS in asymptomatic adults. A total of 2732 asymptomatic subjects for routine physical examination in our hospital were included in the study. We developed machine learning models to classify subjects with or without CAS using decision tree, random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM) and multilayer perceptron (MLP) with 17 candidate features. The performance of models was assessed on the testing dataset. The model using MLP achieved the highest accuracy (0.748), positive predictive value (0.743), F1 score (0.742), area under receiver operating characteristic curve (AUC) (0.766) and Kappa score (0.445) among all classifiers. It’s followed by models using XGBoost and SVM. In conclusion, the model using MLP is the best one to screen CAS in asymptomatic adults based on the results from routine physical examination, followed by using XGBoost and SVM. Those models may provide an effective and applicable method for physician and primary care doctors to screen asymptomatic CAS without risk factors in general population, and improve risk predictions and preventions of cardiovascular and cerebrovascular events in asymptomatic adults.

Download Full-text

Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study (Preprint)

10.2196/preprints.20578 ◽

2020 ◽

Author(s):

Kaixiang Sheng ◽

Ping Zhang ◽

Xi Yao ◽

Jiawei Li ◽

Yongchun He ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Dialysis Initiation ◽

Gradient Boosting ◽

First Year ◽

Learning Models ◽

High Risk Patients ◽

Extreme Gradient Boosting ◽

Risk Patients ◽

Machine Learning Models

BACKGROUND The first-year survival rate among patients undergoing hemodialysis remains poor. Current mortality risk scores for patients undergoing hemodialysis employ regression techniques and have limited applicability and robustness. OBJECTIVE We aimed to develop a machine learning model utilizing clinical factors to predict first-year mortality in patients undergoing hemodialysis that could assist physicians in classifying high-risk patients. METHODS Training and testing cohorts consisted of 5351 patients from a single center and 5828 patients from 97 renal centers undergoing hemodialysis (incident only). The outcome was all-cause mortality during the first year of dialysis. Extreme gradient boosting was used for algorithm training and validation. Two models were established based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2), and 10-fold cross-validation was applied to each model. The area under the curve (AUC), sensitivity (recall), specificity, precision, balanced accuracy, and F1 score were used to assess the predictive ability of the models. RESULTS In the training and testing cohorts, 585 (10.93%) and 764 (13.11%) patients, respectively, died during the first-year follow-up. Of 42 candidate features, the 15 most important features were selected. The performance of model 1 (AUC 0.83, 95% CI 0.78-0.84) was similar to that of model 2 (AUC 0.85, 95% CI 0.81-0.86). CONCLUSIONS We developed and validated 2 machine learning models to predict first-year mortality in patients undergoing hemodialysis. Both models could be used to stratify high-risk patients at the early stages of dialysis.

Download Full-text

Evaluating the Effect of Topical Atropine Use for Myopia Control on Intraocular Pressure by Using Machine Learning

Journal of Clinical Medicine ◽

10.3390/jcm10010111 ◽

2020 ◽

Vol 10 (1) ◽

pp. 111

Author(s):

Tzu-En Wu ◽

Hsin-An Chen ◽

Mao-Jhen Jhou ◽

Yen-Ning Chen ◽

Ting-Jen Chang ◽

...

Keyword(s):

Machine Learning ◽

Intraocular Pressure ◽

Total Duration ◽

Multivariate Adaptive Regression Splines ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Myopia Control

Atropine is a common treatment used in children with myopia. However, it probably affects intraocular pressure (IOP) under some conditions. Our research aims to analyze clinical data by using machine learning models to evaluate the effect of 19 important factors on intraocular pressure (IOP) in children with myopia treated with topical atropine. The data is collected on 1545 eyes with spherical equivalent (SE) less than −10.0 diopters (D) treated with atropine for myopia control. Four machine learning models, namely multivariate adaptive regression splines (MARS), classification and regression tree (CART), random forest (RF), and eXtreme gradient boosting (XGBoost), were used. Linear regression (LR) was used for benchmarking. The 10-fold cross-validation method was used to estimate the performance of the five methods. The main outcome measure is that the 19 important factors associated with atropine use that may affect IOP are evaluated using machine learning models. Endpoint IOP at the last visit was set as the target variable. The results show that the top five significant variables, including baseline IOP, recruitment duration, age, total duration and previous cumulative dosage, were identified as most significant for evaluating the effect of atropine use for treating myopia on IOP. We can conclude that the use of machine learning methods to evaluate factors that affect IOP in children with myopia treated with topical atropine is promising. XGBoost is the best predictive model, and baseline IOP is the most accurate predictive factor for endpoint IOP among all machine learning approaches.

Download Full-text