A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting

Muhammad Syafrudin; Ganjar Alfian; Norma Latif Fitriyani; Muhammad Anshari; Tony Hadibarata; Agung Fatwanto; Jongtae Rhee

doi:10.3390/math8091590

A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting

Mathematics ◽

10.3390/math8091590 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1590

Author(s):

Muhammad Syafrudin ◽

Ganjar Alfian ◽

Norma Latif Fitriyani ◽

Muhammad Anshari ◽

Tony Hadibarata ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Prediction Model ◽

Prediction Models ◽

Self Care ◽

Gradient Boosting ◽

Children With Disability ◽

Study Results ◽

Extreme Gradient Boosting ◽

Care Problems

Detecting self-care problems is one of important and challenging issues for occupational therapists, since it requires a complex and time-consuming process. Machine learning algorithms have been recently applied to overcome this issue. In this study, we propose a self-care prediction model called GA-XGBoost, which combines genetic algorithms (GAs) with extreme gradient boosting (XGBoost) for predicting self-care problems of children with disability. Selecting the feature subset affects the model performance; thus, we utilize GA to optimize finding the optimum feature subsets toward improving the model’s performance. To validate the effectiveness of GA-XGBoost, we present six experiments: comparing GA-XGBoost with other machine learning models and previous study results, a statistical significant test, impact analysis of feature selection and comparison with other feature selection methods, and sensitivity analysis of GA parameters. During the experiments, we use accuracy, precision, recall, and f1-score to measure the performance of the prediction models. The results show that GA-XGBoost obtains better performance than other prediction models and the previous study results. In addition, we design and develop a web-based self-care prediction to help therapist diagnose the self-care problems of children with disabilities. Therefore, appropriate treatment/therapy could be performed for each child to improve their therapeutic outcome.

Download Full-text

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database

Diagnostics ◽

10.3390/diagnostics11060943 ◽

2021 ◽

Vol 11 (6) ◽

pp. 943

Author(s):

Joung Ouk (Ryan) Kim ◽

Yong-Suk Jeong ◽

Jin Ho Kim ◽

Jong-Weon Lee ◽

Dougho Park ◽

...

Keyword(s):

Machine Learning ◽

Health Insurance ◽

Prediction Model ◽

National Health Insurance ◽

National Health ◽

Prediction Models ◽

Characteristic Curve ◽

Health Screening ◽

Gradient Boosting ◽

Extreme Gradient Boosting

Background: This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets. Methods: We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20–I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared. Results: The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index. Conclusions: Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models.

Download Full-text

Modelling treatment benefit for bexmarilimab (an anti-Clever-1 antibody and a novel macrophage checkpoint inhibitor) using phase I first-in-man trial data.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e14530 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e14530-e14530

Author(s):

Petri Bono ◽

Jussi Ekström ◽

Matti K Karvonen ◽

Jami Mandelin ◽

Jussi Koivunen

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Phase I ◽

Missing Values ◽

Prediction Models ◽

Checkpoint Inhibitor ◽

Prediction Performance ◽

Gradient Boosting ◽

Treatment Benefit ◽

Extreme Gradient Boosting

e14530 Background: Bexmarilimab, an investigational immunotherapeutic antibody targeting Clever-1, is currently investigated in phase I/II MATINS study (NCT03733990) for advanced solid tumors. Machine learning (ML) based models combining extensive data could be generated to predict treatment responses to this first-in-class macrophage checkpoint inhibitor. Methods: 58 baseline features from 30 patients included in the part 1 of phase I/II MATINS trial were included in ML modelling. Seven patients were classified as benefitting from the therapy by RECIST 1.1 (PR or SD response in target or non-target lesions). Initial feature selection was done using a combination of domain knowledge and removal of features with several missing values resulting in 20 clinically relevant features from 25 patients. The remaining data was standardized and feature selection using variance analysis (ANOVA) based on F-values between response and features was performed. With this approach, the number of features could be further reduced as the prediction performance increased until the most important features were included in the model. Several prediction models were trained, and prediction performance evaluated using leave-one-out cross-validation (LOOCV), with and without SMOTE oversampling of the positive class of the training data inside each LOOCV fold. In LOOCV the prediction model was trained 25 times. Stacked meta classifier with SMOTE oversampling combining three classifiers: elastic-net logistic regression, random forest and extreme gradient boosting was chosen as the best performing prediction model. Results: Seven baseline features were associated with bexmarilimab treatment benefit. Increasing bexmarilimab dose and high tumor FoxP3 cells showed positive benefit. On contrary, high baseline blood neutrophils, CD4, T-cells, B-cells, and CXCL10 indicated negative relationship to the treatment benefit. The ML model trained with these seven features performed well in LOOCV as 6/7 benefitting and 16/18 non-benefitting were classified correctly, and all considered classification performance metrics were good. In feature importance analysis, low baseline CXCL10 and neutrophils were characterized as the most important predictors for treatment benefit with values of 0.19 and 0.16. Conclusions: This study highlights possibility of using ML models in predicting treatment benefit for novel cancer drugs such as bexmarilimab and boost the clinical development. These findings are in line of expected immune activation of bexmarilimab treatment. The generated ML models should be further validated in a larger patient cohort. Clinical trial information: NCT03733990.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Evaluating Variable Selection and Machine Learning Algorithms for Estimating Forest Heights by Combining Lidar and Hyperspectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090507 ◽

2020 ◽

Vol 9 (9) ◽

pp. 507

Author(s):

Sanjiwana Arjasakusuma ◽

Sandiaga Swahyu Kusuma ◽

Stuart Phinn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Principal Component ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forest Height ◽

Extreme Gradient Boosting

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.

Download Full-text

Property Rental Price Prediction Using the Extreme Gradient Boosting Algorithm

IJIIS: International Journal of Informatics and Information Systems ◽

10.47738/ijiis.v3i2.65 ◽

2020 ◽

Vol 3 (2) ◽

pp. 54-59

Author(s):

Marco Febriadi Kokasih ◽

Adi Suryaputra Paramita

Keyword(s):

Prediction Model ◽

Prediction Models ◽

Gradient Boosting ◽

Fair Price ◽

Price Prediction ◽

Online Marketplace ◽

Extreme Gradient Boosting ◽

Property Owners ◽

Boosting Algorithm

Online marketplace in the field of property renting like Airbnb is growing. Many property owners have begun renting out their properties to fulfil this demand. Determining a fair price for both property owners and tourists is a challenge. Therefore, this study aims to create a software that can create a prediction model for property rent price. Variable that will be used for this study is listing feature, neighbourhood, review, date and host information. Prediction model is created based on the dataset given by the user and processed with Extreme Gradient Boosting algorithm which then will be stored in the system. The result of this study is expected to create prediction models for property rent price for property owners and tourists consideration when considering to rent a property. In conclusion, Extreme Gradient Boosting algorithm is able to create property rental price prediction with the average of RMSE of 10.86 or 13.30%.

Download Full-text

Machine Learning-Based Three-Month Outcome Prediction in Acute Ischemic Stroke: A Single Cerebrovascular-Specialty Hospital Study in South Korea

Diagnostics ◽

10.3390/diagnostics11101909 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1909

Author(s):

Dougho Park ◽

Eunhwan Jeong ◽

Haejong Kim ◽

Hae Wook Pyun ◽

Haemin Kim ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Functional Outcome ◽

Outcome Prediction ◽

Prediction Models ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.

Download Full-text

A Predictive Model Based on Machine Learning for the Early Detection of Late-Onset Neonatal Sepsis: Development and Observational Study (Preprint)

10.2196/preprints.15965 ◽

2019 ◽

Author(s):

Wongeun Song ◽

Se Young Jung ◽

Hyunyoung Baek ◽

Chang Won Choi ◽

Young Hwa Jung ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intensive Care ◽

High Resolution ◽

Prediction Model ◽

Neonatal Sepsis ◽

Vital Sign ◽

Prediction Models ◽

Late Onset ◽

Model Based

BACKGROUND Neonatal sepsis is associated with most cases of mortalities and morbidities in the neonatal intensive care unit (NICU). Many studies have developed prediction models for the early diagnosis of bloodstream infections in newborns, but there are limitations to data collection and management because these models are based on high-resolution waveform data. OBJECTIVE The aim of this study was to examine the feasibility of a prediction model by using noninvasive vital sign data and machine learning technology. METHODS We used electronic medical record data in intensive care units published in the Medical Information Mart for Intensive Care III clinical database. The late-onset neonatal sepsis (LONS) prediction algorithm using our proposed forward feature selection technique was based on NICU inpatient data and was designed to detect clinical sepsis 48 hours before occurrence. The performance of this prediction model was evaluated using various feature selection algorithms and machine learning models. RESULTS The performance of the LONS prediction model was found to be comparable to that of the prediction models that use invasive data such as high-resolution vital sign data, blood gas estimations, blood cell counts, and pH levels. The area under the receiver operating characteristic curve of the 48-hour prediction model was 0.861 and that of the onset detection model was 0.868. The main features that could be vital candidate markers for clinical neonatal sepsis were blood pressure, oxygen saturation, and body temperature. Feature generation using kurtosis and skewness of the features showed the highest performance. CONCLUSIONS The findings of our study confirmed that the LONS prediction model based on machine learning can be developed using vital sign data that are regularly measured in clinical settings. Future studies should conduct external validation by using different types of data sets and actual clinical verification of the developed model.

Download Full-text

Prediction of Masked Hypertension and Masked Uncontrolled Hypertension Using Machine Learning

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.778306 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ming-Hui Hung ◽

Ling-Chieh Shih ◽

Yu-Ching Wang ◽

Hsin-Bang Leu ◽

Po-Hsun Huang ◽

...

Keyword(s):

Machine Learning ◽

Clinical Characteristics ◽

Prediction Models ◽

External Validation ◽

Uncontrolled Hypertension ◽

Gradient Boosting ◽

Masked Hypertension ◽

Internal Validation ◽

Hypertensive Patients ◽

Extreme Gradient Boosting

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit.Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN).Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively).Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.

Download Full-text

Dementia risks identified by vocal features via telephone conversations: A novel machine learning prediction model

PLoS ONE ◽

10.1371/journal.pone.0253988 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0253988

Author(s):

Akihiro Shimoda ◽

Yue Li ◽

Hana Hayashi ◽

Naoki Kondo

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Predictive Performance ◽

Gradient Boosting ◽

Validation Data ◽

Audio File ◽

Extreme Gradient Boosting ◽

Audio Data ◽

Data Files ◽

Audio Files

Due to difficulty in early diagnosis of Alzheimer’s disease (AD) related to cost and differentiated capability, it is necessary to identify low-cost, accessible, and reliable tools for identifying AD risk in the preclinical stage. We hypothesized that cognitive ability, as expressed in the vocal features in daily conversation, is associated with AD progression. Thus, we have developed a novel machine learning prediction model to identify AD risk by using the rich voice data collected from daily conversations, and evaluated its predictive performance in comparison with a classification method based on the Japanese version of the Telephone Interview for Cognitive Status (TICS-J). We used 1,465 audio data files from 99 Healthy controls (HC) and 151 audio data files recorded from 24 AD patients derived from a dementia prevention program conducted by Hachioji City, Tokyo, between March and May 2020. After extracting vocal features from each audio file, we developed machine-learning models based on extreme gradient boosting (XGBoost), random forest (RF), and logistic regression (LR), using each audio file as one observation. We evaluated the predictive performance of the developed models by describing the receiver operating characteristic (ROC) curve, calculating the areas under the curve (AUCs), sensitivity, and specificity. Further, we conducted classifications by considering each participant as one observation, computing the average of their audio files’ predictive value, and making comparisons with the predictive performance of the TICS-J based questionnaire. Of 1,616 audio files in total, 1,308 (81.0%) were randomly allocated to the training data and 308 (19.1%) to the validation data. For audio file-based prediction, the AUCs for XGboost, RF, and LR were 0.863 (95% confidence interval [CI]: 0.794–0.931), 0.882 (95% CI: 0.840–0.924), and 0.893 (95%CI: 0.832–0.954), respectively. For participant-based prediction, the AUC for XGboost, RF, LR, and TICS-J were 1.000 (95%CI: 1.000–1.000), 1.000 (95%CI: 1.000–1.000), 0.972 (95%CI: 0.918–1.000) and 0.917 (95%CI: 0.918–1.000), respectively. There was difference in predictive accuracy of XGBoost and TICS-J with almost approached significance (p = 0.065). Our novel prediction model using the vocal features of daily conversations demonstrated the potential to be useful for the AD risk assessment.

Download Full-text

Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction

Neural Computing and Applications ◽

10.1007/s00521-021-06362-3 ◽

2021 ◽

Author(s):

Hai Tao ◽

Salih Muhammad Awadh ◽

Sinan Q. Salih ◽

Shafik S. Shafik ◽

Zaher Mundher Yaseen

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Relative Humidity ◽

Gradient Boosting ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Selection Approach ◽

Feature Selection Approach ◽

Machine Learning Models

Download Full-text