Logistic regression and classification tree methods as elements of diagnosis in cardiology

2016 ◽  
Vol 70 ◽  
pp. 154-162
Author(s):  
Anna Spychała ◽  
Michał Skrzypek ◽  
Ewa Niewiadomska
2008 ◽  
Vol 4 (2) ◽  
pp. 77-83 ◽  
Author(s):  
Howell Sasser ◽  
Marcy Nussbaum ◽  
Michael Beuhler ◽  
Marsha Ford

2018 ◽  
Author(s):  
Παντελής Σταυρούλιας

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).


2021 ◽  
Author(s):  
Christian A Betancourt ◽  
Panagiota Kitsantas ◽  
Deborah G Goldberg ◽  
Beth A Hawks

ABSTRACT Introduction Military veterans continue to struggle with addiction even after receiving treatment for substance use disorders (SUDs). Identifying factors that may influence SUD relapse upon receiving treatment in veteran populations is crucial for intervention and prevention efforts. The purpose of this study was to examine risk factors that contribute to SUD relapse upon treatment completion in a sample of U.S. veterans using logistic regression and classification tree analysis. Materials and Methods Data from the 2017 Treatment Episode Data Set—Discharge (TEDS-D) included 40,909 veteran episode observations. Descriptive statistics and multivariable logistic regression analysis were conducted to determine factors associated with SUD relapse after treatment discharge. Classification trees were constructed to identify high-risk subgroups for substance use after discharge from treatment for SUDs. Results Approximately 94% of the veterans relapsed upon discharge from outpatient or residential SUD treatment. Veterans aged 18-34 years old were significantly less likely to relapse than the 35-64 age group (odds ratio [OR] 0.73, 95% confidence interval [CI]: 0.66, 0.82), while males were more likely than females to relapse (OR 1.55, 95% CI: 1.34, 1.79). Unemployed veterans (OR 1.92, 95% CI: 1.67, 2.22) or veterans not in the labor force (OR 1.29, 95% CI: 1.13, 1.47) were more likely to relapse than employed veterans. Homeless vs. independently housed veterans had 3.26 (95% CI: 2.55, 4.17) higher odds of relapse after treatment. Veterans with one arrest vs. none were more likely to relapse (OR 1.52, 95% CI: 1.19, 1.95). Treatment completion was critical to maintain sobriety, as every other type of discharge led to more than double the odds of relapse. Veterans who received care at 24-hour detox facilities were 1.49 (95% CI: 1.23, 1.80) times more likely to relapse than those at rehabilitative/residential treatment facilities. Classification tree analysis indicated that homelessness upon discharge was the most important predictor in SUD relapse among veterans. Conclusion Aside from numerous challenges that veterans face after leaving military service, SUD relapse is intensified by risk factors such as homelessness, unemployment, and insufficient SUD treatment. As treatment and preventive care for SUD relapse is an active field of study, further research on SUD relapse among homeless veterans is necessary to better understand the epidemiology of substance addiction among this vulnerable population. The findings of this study can inform healthcare policy and practices targeting veteran-tailored treatment programs to improve SUD treatment completion and lower substance use after treatment.


2021 ◽  
Author(s):  
Li Lu Wei ◽  
Yu jian

Abstract Background Hypertension is a common chronic disease in the world, and it is also a common basic disease of cardiovascular and brain complications. Overweight and obesity are the high risk factors of hypertension. In this study, three statistical methods, classification tree model, logistic regression model and BP neural network, were used to screen the risk factors of hypertension in overweight and obese population, and the interaction of risk factors was conducted Analysis, for the early detection of hypertension, early diagnosis and treatment, reduce the risk of hypertension complications, have a certain clinical significance.Methods The classification tree model, logistic regression model and BP neural network model were used to screen the risk factors of hypertension in overweight and obese people.The specificity, sensitivity and accuracy of the three models were evaluated by receiver operating characteristic curve (ROC). Finally, the classification tree CRT model was used to screen the related risk factors of overweight and obesity hypertension, and the non conditional logistic regression multiplication model was used to quantitatively analyze the interaction.Results The Youden index of ROC curve of classification tree model, logistic regression model and BP neural network model were 39.20%,37.02% ,34.85%, the sensitivity was 61.63%, 76.59%, 82.85%, the specificity was 77.58%, 60.44%, 52.00%, and the area under curve (AUC) was 0.721, 0.734,0.733, respectively. There was no significant difference in AUC between the three models (P>0.05). Classification tree CRT model and logistic regression multiplication model suggested that the interaction between NAFLD and FPG was closely related to the prevalence of overweight and obese hypertension.Conclusion NAFLD,FPG,age,TG,UA, LDL-C were the risk factors of hypertension in overweight and obese people. The interaction between NAFLD and FPG increased the risk of hypertension.


Worldwide, breast cancer is the leading type of cancer in women accounting for 25% of all cases. Survival rates in the developed countries are comparatively higher with that of developing countries. This had led to the importance of computer aided diagnostic methods for early detection of breast cancer disease. This eventually reduces the death rate. This paper intents the scope of the biomarker that can be used to predict the breast cancer from the anthropometric data. This experimental study aims at computing and comparing various classification models (Binary Logistic Regression, Ball Vector Machine (BVM), C4.5, Partial Least Square (PLS) for Classification, Classification Tree, Cost sensitive Classification Tree, Cost sensitive Decision Tree, Support Vector Machine for Classification, Core Vector Machine, ID3, K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Log-Reg TRIRLS, Multi Layer Perceptron (MLP), Multinomial Logistic Regression (MLR), Naïve Bayes (NB), PLS for Discriminant Analysis, PLS for LDA, Random Tree (RT), Support Vector Machine SVM) for the UCI Coimbra breast cancer dataset. The feature selection algorithms (Backward Logit, Fisher Filtering, Forward Logit, ReleifF, Step disc) are worked out to find out the minimum attributes that can achieve a better accuracy. To ascertain the accuracy results, the Jack-knife cross validation method for the algorithms is conducted and validated. The Core vector machine classification algorithm outperforms the other nineteen algorithms with an accuracy of 82.76%, sensitivity of 76.92% and specificity of 87.50% for the selected three attributes, Age, Glucose and Resistin using ReleifF feature selection algorithm.


Author(s):  
Michaela Staňková ◽  
David Hampel

This article focuses on the problem of binary classification of 902 small- and medium‑sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.


Author(s):  
Dennis Foung

Use of algorithms and data mining approaches are not new to Industry 4.0. However, these may not be common for students and educators in higher education. This chapter compares various classification techniques: classification tree, logistic regression, and artificial neural networks (ANN). The comparison focuses on each method's accuracy, algorithm, and practicality in higher education. This study made use of a dataset from two academic writing courses in a university in Hong Kong with more than 5,000 records. Results suggest that classification trees and logistic regression can be easily used in the higher education context, but ANN may not be applicable in higher educational settings. The research team suggests that higher education administrators take this research forward and design platforms to realize these classification algorithms to predict at-risk students.


Sign in / Sign up

Export Citation Format

Share Document