Decision Tree‐PLS (DT‐PLS) algorithm for the development of process: Specific local prediction models

2019 ◽  
Vol 35 (4) ◽  
Author(s):  
Harini Narayanan ◽  
Michael Sokolov ◽  
Alessandro Butté ◽  
Massimo Morbidelli
2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


2015 ◽  
Vol 54 (06) ◽  
pp. 560-567 ◽  
Author(s):  
K. Zhu ◽  
Z. Lou ◽  
J. Zhou ◽  
N. Ballester ◽  
P. Parikh ◽  
...  

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Big Data and Analytics in Healthcare”.Background: Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners.Objectives: Explore the use of conditional logistic regression to increase the prediction accuracy.Methods: We analyzed an HCUP statewide in-patient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models.Results: The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 – 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures.Conclusions: It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5777
Author(s):  
Esraa Eldesouky ◽  
Mahmoud Bekhit ◽  
Ahmed Fathalla ◽  
Ahmad Salah ◽  
Ahmed Ali

The use of underwater wireless sensor networks (UWSNs) for collaborative monitoring and marine data collection tasks is rapidly increasing. One of the major challenges associated with building these networks is handover prediction; this is because the mobility model of the sensor nodes is different from that of ground-based wireless sensor network (WSN) devices. Therefore, handover prediction is the focus of the present work. There have been limited efforts in addressing the handover prediction problem in UWSNs and in the use of ensemble learning in handover prediction for UWSNs. Hence, we propose the simulation of the sensor node mobility using real marine data collected by the Korea Hydrographic and Oceanographic Agency. These data include the water current speed and direction between data. The proposed simulation consists of a large number of sensor nodes and base stations in a UWSN. Next, we collected the handover events from the simulation, which were utilized as a dataset for the handover prediction task. Finally, we utilized four machine learning prediction algorithms (i.e., gradient boosting, decision tree (DT), Gaussian naive Bayes (GNB), and K-nearest neighbor (KNN)) to predict handover events based on historically collected handover events. The obtained prediction accuracy rates were above 95%. The best prediction accuracy rate achieved by the state-of-the-art method was 56% for any UWSN. Moreover, when the proposed models were evaluated on performance metrics, the measured evolution scores emphasized the high quality of the proposed prediction models. While the ensemble learning model outperformed the GNB and KNN models, the performance of ensemble learning and decision tree models was almost identical.


2020 ◽  
Vol 12 (23) ◽  
pp. 9790
Author(s):  
Sanghoon Lee ◽  
Keunho Choi ◽  
Donghee Yoo

The government makes great efforts to maintain the soundness of policy funds raised by the national budget and lent to corporate. In general, previous research on the prediction of company insolvency has dealt with large and listed companies using financial information with conventional statistical techniques. However, small- and medium-sized enterprises (SMEs) do not have to undergo mandatory external audits, and the quality of accounting information is low due to weak internal control. To overcome this problem, we developed an insolvency prediction model for SMEs using data mining techniques and technological feasibility assessment information as non-financial information. We divided the dataset into two types of data based on three years of corporate age. The synthetic minority over-sampling technique (SMOTE) was used to solve the data imbalance that occurred at this time. Six insolvency prediction models were created using logistic regression, a decision tree, an artificial neural network, and an ensemble (i.e., boosting) of each algorithm. By applying a boosted decision tree, the best accuracies of 69.1% and 82.7% were derived, and by applying a decision tree, nine and seven influential factors affected the insolvency of SMEs established for fewer than three years and more than three years, respectively. In addition, we derived several insolvency rules for the two types of SMEs from the decision tree-based prediction model and proposed ways to enhance the health of loans given to potentially insolvent companies using these derived rules. The results of this study show that it is possible to predict SMEs’ insolvency using data mining techniques with technological feasibility assessment information and find meaningful rules related to insolvency.


2014 ◽  
Vol 931-932 ◽  
pp. 1467-1471 ◽  
Author(s):  
Jaree Thongkam ◽  
Vatinee Sukmak

A psychiatric readmission is argued to be an adverse outcome because it is costly and occurs when relapse to the illness is so severe. An analysis of systematic models in readmission data can provide useful insight into the quicker and sicker patients with schizophrenia. This research aims to develop and investigate schizophrenia readmission prediction models using data mining techniques including decision tree, Random Tree, Random Forests, AdaBoost, Bagging and a combination of AdaBoost with decision tree, AdaBoost with Random Tree, AdaBoost with Random Forests, Bagging with decision tree, Bagging with Random Tree and Bagging with Random Forests. The experimental results successfully showed that AdaBoost with decision tree has the highest precision, recall and F-measure up to 98.11%, 98.79% and 98.41%, respectively.


2020 ◽  
Vol 6 ◽  
pp. e275
Author(s):  
Binti Solihah ◽  
Azhari Azhari ◽  
Aina Musdholifah

Background A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. Methods This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. Result The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen.


2018 ◽  
Vol 1 (1) ◽  
pp. 16-24
Author(s):  
Ni Wayan Wardani ◽  
Gede Rasben Dantes ◽  
Gede Indrawan

Customer is a very important asset for retail companies. This is the reason why retail companies should plan and use a fairly clear strategy in treating customers. With the large number of customers, the problem that must be faced is how to identify the characteristics of all customers and able to retain existing customers in order not to stop buying and moving to a competitor retail company. By applying the concept of CRM, a company can identify customers by segmenting customers while also being able to implement customer retention programs by predicting potential churn on each customer class. The data used comes from UD.Mawar Sari. Customer segmentation process uses RFM model to get customer class. UD. Mawar Sari customer class is dormant, everyday, golden and superstar. The construction of prediction models using the Decision Tree C4.5. The application of the prediction model obtains performance results, that is: Dormant: Recall 97.51%, Precision 75.18%, Accuracy 76.18%. Everyday: Recall 100%, Precision 99.04%, Accuracy 99.04%.  Golden: Recall 100%, Precision 98.84%, Accuracy 98.84%. Superstar: Recall 96.15%, Precision 99.43%, Accuracy 95.63%. Results of the evaluation with confusion matrix it can be concluded that the dormant customer class is a potentially churn customer class.


Author(s):  
Mochammad Agus Afrianto ◽  
Meditya Wasesa

Background: Literature in the peer-to-peer accommodation has put a substantial focus on accommodation listings' price determinants. Developing prediction models related to the demand for accommodation listings is vital in revenue management because accurate price and demand forecasts will help determine the best revenue management responses.Objective: This study aims to develop prediction models to determine the booking likelihood of accommodation listings.Methods: Using an Airbnb dataset, we developed four machine learning models, namely Logistics Regression, Decision Tree, K-Nearest Neighbor (KNN), and Random Forest Classifiers. We assessed the models using the AUC-ROC score and the model development time by using the ten-fold three-way split and the ten-fold cross-validation procedures.Results: In terms of average AUC-ROC score, the Random Forest Classifiers outperformed other evaluated models. In three-ways split procedure, it had a 15.03% higher AUC-ROC score than Decision Tree, 2.93 % higher than KNN, and 2.38% higher than Logistics Regression. In the cross-validation procedure, it has a 26,99% higher AUC-ROC score than Decision Tree, 4.41 % higher than KNN, and 3.31% higher than Logistics Regression.  It should be noted that the Decision Tree model has the lowest AUC-ROC score, but it has the smallest model development time.Conclusion: The performance of random forest models in predicting booking likelihood of accommodation listings is the most superior. The model can be used by peer-to-peer accommodation owners to improve their revenue management responses. 


2020 ◽  
Vol 28 (6) ◽  
pp. 1273-1291
Author(s):  
Nesreen El-Rayes ◽  
Ming Fang ◽  
Michael Smith ◽  
Stephen M. Taylor

Purpose The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes. Design/methodology/approach A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition. Findings Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm. Practical implications This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models. Originality/value This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.


Sign in / Sign up

Export Citation Format

Share Document