Αποτίμηση κινδύνων των ευρωπαϊκών τραπεζών και συστήματα έγκαιρης προειδοποίησης

Mapping Intimacies ◽

10.12681/eadd/44525 ◽

2018 ◽

Author(s):

Παντελής Σταυρούλιας

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Cross Validation ◽

Multinomial Logistic Regression ◽

Classification Tree ◽

Probit Regression ◽

Macroprudential Policy ◽

Policy Makers ◽

Out Of Sample ◽

Sample Testing

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).

Download Full-text

Identification of Bio-Markers for Breast Cancer Detection through Data Mining Methods

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1141.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 763-769

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Logistic Regression ◽

Feature Selection ◽

Discriminant Analysis ◽

Classification Tree ◽

Partial Least Square ◽

Diagnostic Methods ◽

Support Vector ◽

Breast Cancer Dataset

Worldwide, breast cancer is the leading type of cancer in women accounting for 25% of all cases. Survival rates in the developed countries are comparatively higher with that of developing countries. This had led to the importance of computer aided diagnostic methods for early detection of breast cancer disease. This eventually reduces the death rate. This paper intents the scope of the biomarker that can be used to predict the breast cancer from the anthropometric data. This experimental study aims at computing and comparing various classification models (Binary Logistic Regression, Ball Vector Machine (BVM), C4.5, Partial Least Square (PLS) for Classification, Classification Tree, Cost sensitive Classification Tree, Cost sensitive Decision Tree, Support Vector Machine for Classification, Core Vector Machine, ID3, K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Log-Reg TRIRLS, Multi Layer Perceptron (MLP), Multinomial Logistic Regression (MLR), Naïve Bayes (NB), PLS for Discriminant Analysis, PLS for LDA, Random Tree (RT), Support Vector Machine SVM) for the UCI Coimbra breast cancer dataset. The feature selection algorithms (Backward Logit, Fisher Filtering, Forward Logit, ReleifF, Step disc) are worked out to find out the minimum attributes that can achieve a better accuracy. To ascertain the accuracy results, the Jack-knife cross validation method for the algorithms is conducted and validated. The Core vector machine classification algorithm outperforms the other nineteen algorithms with an accuracy of 82.76%, sensitivity of 76.92% and specificity of 87.50% for the selected three attributes, Age, Glucose and Resistin using ReleifF feature selection algorithm.

Download Full-text

Derivation of Large Sample Efficiency of Multinomial Logistic Regression Compared to Multiple Group Discriminant Analysis

Biostatistics ◽

10.1007/978-94-009-4794-8_10 ◽

1987 ◽

pp. 177-197 ◽

Cited By ~ 1

Author(s):

Shelley B. Bull ◽

Allan Donner

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Multinomial Logistic Regression ◽

Large Sample ◽

Multiple Group

Download Full-text

A COMPARISON OF THE PREDICTIVE POWERS OF TENURE CHOICES BETWEEN PROPERTY OWNERSHIP AND RENTING

International Journal of Strategic Property Management ◽

10.3846/ijspm.2019.7064 ◽

2019 ◽

Vol 23 (2) ◽

pp. 130-141 ◽

Cited By ~ 4

Author(s):

Chun-Chang Lee ◽

Chih-Min Liang ◽

Yang-Tung Liu

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Performance Prediction ◽

Predictive Power ◽

Cross Validation ◽

Binary Logistic Regression ◽

Analysis Model ◽

Linear Modeling ◽

Binary Logistic Regression Model ◽

Hit Rate

This paper compares the predictive powers of hierarchical generalized linear modeling (HGLM), logistic regression, and discriminant analysis with regard to tenure choices between buying property and renting property by sampling the residents of the Greater Taipei area. The results imply that the hit rate and other indicators included in HGLM have better predictive power with regard to tenure choices than the binary logistic regression model and the discriminant analysis model. That is, using HGLM to process nested data can increase prediction accuracy regarding household tenure choices. Furthermore, cross-validation is performed to analyze hit rate stability. The hit rate sequencing from this cross-validation is found to be consistent with the HGLM results, implying that the comparison of the three models in terms of hit rate performance prediction in this study is stable and reliable.

Download Full-text

PERBANDINGAN REGRESI LOGISTIK MULTINOMINAL DAN ANALISIS DISKRIMINAN

Journal of Islamic Tourism, Halal Food, Islamic Traveling, and Creative Economy ◽

10.21274/ar-rehla.v1i2.4820 ◽

2021 ◽

Vol 1 (2) ◽

pp. 135-150

Author(s):

Rifda Nabila ◽

Risdiana Himmati ◽

Rendra Erdkhadifa

Keyword(s):

Logistic Regression ◽

Regression Analysis ◽

Discriminant Analysis ◽

Logistic Regression Analysis ◽

Multinomial Logistic Regression ◽

Regression Method ◽

Destination Image ◽

Tourism Destination ◽

Multinomial Logistic Regression Analysis ◽

Logistic Regression Method

Abstrak: Tujuan dari penelitian ini adalah untuk membandingkan analisis regresi logistik multinomial dan analisis diskriminan untuk mengelompokkan keputusan kunjungan wisata halal di Jawa Tengah berdasarkan ketepatan pengelompokan. Analisis statistik yang digunakan adalah regresi logistik multinomial dan analisis diskriminan. Kedua analisis tersebut dapat digunakan sebagai metode pengelompokan objek, sehingga keduanya dapat dibandingkan berdasarkan ketepatan pengelompokkannya. Penelitian ini membandingkan analisis regresi logistik multinomial dan analisis diskriminan dalam pengelompokan keputusan kunjungan wisata halal. Data yang digunakan adalah worship facilities, halalness, general Islamic mortality, dan tourism destination image. Hasil analisis menggunakan metode regresi logistik multinomial menunjukkan faktor-faktor yang secara signifikan mempengaruhi pengelompokan keputusan kunjungan wisata halal adalah variabel tourism destination image, variabel halalness, dan variabel general Islamic morality. Sedangkan dengan analisis diskriminan menunjukkan bahwa semua variabel prediktor yakni worship facilities, halalness, general Islamic mortality, dan tourism destination image memberikan pengaruh secara signifikan terhadap pengklasifikasian keputusan mengunjungi destinasi wisata halal. Penelitian ini menunjukkan bahwa metode regresi logistik multinomial lebih baik untuk pengelompokkan keputusan kunjungan wisata halal dibandingan metode analisis diskriminan, dengan presetnase ketepatan pengelompokkan pada metode regresi logit multinomial sebesar 59,5% dan analisis diskriminan sebesar 53,5%. Analisis regresi logistik multinominal lebih mudah digunakan dalam proses pengelompokan keputusan kunjuangan wisata halal karena tidak mempertimbangkan asumsi yang harus dipenuhi. Kata Kunci: Analisis Diskriminan; Regresi Logistik Multinominal; Keputusan Mengunjungi Abstract: The purpose of this study is to compare multinomial logistic regression analysis and discriminant analysis to classify decisions on halal tourism visits in Central Java based on grouping accuracy. Statistical analysis used is multinomial logistic regression and discriminant analysis. The two analyzes can be used as a method of grouping objects, so that they can be compared based on the accuracy of the grouping. This study compares multinomial logistic regression analysis and discriminant analysis in grouping decisions for halal tourism visits. The data used are worship facilities, halalness, general Islamic mortality, and tourism destination image. The results of the analysis using the multinomial logistic regression method show that the factors that significantly influence the grouping of decisions for halal tourism visits are the tourism destination image variable, the halalness variable, and the general Islamic morality variable. Meanwhile, discriminant analysis shows that all predictor variables namely worship facilities, halalness, general Islamic mortality, and tourism destination image have a significant influence on the classification of decisions to visit halal tourist destinations. This study shows that the multinomial logistic regression method is better for grouping decisions on halal tourist visits than the discriminant analysis method, with a preset percentage of grouping accuracy in the multinomial logit regression method of 59.5% and discriminant analysis of 53.5%. Multinominal logistic regression analysis is easier to use in the process of grouping halal tourism travel decisions because it does not consider the assumptions that must be met. Keywords: Discriminant Analysis; Multinomial Logistic Regression; Visiting decision.

Download Full-text

Cross-validation and out-of-sample testing of physical activity intensity predictions with a wrist-worn accelerometer

Journal of Applied Physiology ◽

10.1152/japplphysiol.00760.2017 ◽

2018 ◽

Vol 124 (5) ◽

pp. 1284-1293 ◽

Cited By ~ 9

Author(s):

Alexander H. K. Montoye ◽

Bradford S. Westgate ◽

Morgan R. Fonley ◽

Karin A. Pfeiffer

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Cross Validation ◽

Learning Models ◽

Data Set ◽

Feature Sets ◽

Activity Intensity ◽

Out Of Sample ◽

Sample Testing ◽

Machine Learning Models

Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3–78.5%) had higher accuracy than other machine learning models (70.9–76.4%) and accuracy similar to combination methods (77.0–77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6–92.8% vs. 87.8–91.9% in MSU data set; 79.3–80.2% vs. 76.7–78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0–77.4% vs. 74.1–79.1% in MSU data set; 76.1–77.0% vs. 75.5–77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5–78.5% vs. 53.6–70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.

Download Full-text

A Meta-Learning Approach to Predicting Financial Statement Fraud

Journal of Emerging Technologies in Accounting ◽

10.2308/jeta.2009.6.1.5 ◽

2009 ◽

Vol 6 (1) ◽

pp. 5-26 ◽

Cited By ~ 3

Author(s):

Thomas E. McKee

Keyword(s):

Neural Network ◽

Logistic Regression ◽

Cross Validation ◽

Learning Algorithm ◽

Classification Tree ◽

Algorithm Design ◽

Financial Fraud ◽

Public Data ◽

Meta Learning ◽

Percent Accuracy

ABSTRACT: An “ultimate learning algorithm” is one that produces models that closely match the real world’s underlying distribution of functions. To try to create such an algorithm, researchers typically employ manual algorithm design with cross-validation. It has been shown that cross-validation is not a viable way to construct an ultimate learning algorithm. For machine learning researchers, “meta-learning” should be more desirable than manual algorithm design with cross-validation. Meta-learning is concerned with gaining knowledge about learning methodologies. One meta-learning approach involves evaluating the suitability of various algorithms for a learning task in order to select an appropriate algorithm. An alternative approach is to incorporate predictions from base algorithms as features to be evaluated by subsequent algorithms. This paper reports on exploratory research that implemented the latter approach as a three-layer stacked generalization model using neural networks, logistic regression, and classification tree algorithms to predict all categories of financial fraud. The purpose was to see if this form of meta-learning offered significant benefits for financial fraud prediction. Fifteen possible financial fraud predictors were identified based on a theoretical fraud model from prior research. Only public data for these possible predictors were obtained from U.S. Securities and Exchange Commission filings from the period 1995–2002 for a sample of 50 fraud and 50 non-fraud companies. These data were selected for the year prior to when the fraud was initiated. These variables were used to create a variety of neural network, logistic regression, and classification tree models while using holdout sample and cross-validation techniques. A 71.4 percent accurate neural network model was then stacked into a logistic regression model, increasing the prediction accuracy to 76.5 percent. The logistic regression model was subsequently stacked into a classification tree model to achieve an 83 percent accuracy rate. These results compared favorably to two prior neural network studies, also employing only public data, which achieved 63 percent accuracy rates. Model results were also analyzed via probability-adjusted overall error rates, relative misclassification costs, and receiver operating characteristics. The increase in classification accuracy from 71 percent to 83 percent, the decline in estimated overall error rate from 0.0057 to 0.0035, and the decline in relative misclassification costs from 2.79 to 0.58 suggest that benefits were achieved by the meta-learning stacking approach. Further research into the meta-learning stacking approach appears warranted.

Download Full-text

Linear Discriminant Analysis for Prediction of Group Membership: A User-Friendly Primer

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245919849378 ◽

2019 ◽

Vol 2 (3) ◽

pp. 250-263 ◽

Cited By ~ 2

Author(s):

Peter Boedeker ◽

Nathan T. Kearns

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Best Practice ◽

Nearest Neighbor ◽

Multinomial Logistic Regression ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Prior Specification

In psychology, researchers are often interested in the predictive classification of individuals. Various models exist for such a purpose, but which model is considered a best practice is conditional on attributes of the data. Under certain conditions, linear discriminant analysis (LDA) has been shown to perform better than other predictive methods, such as logistic regression, multinomial logistic regression, random forests, support-vector machines, and the K-nearest neighbor algorithm. The purpose of this Tutorial is to provide researchers who already have a basic level of statistical training with a general overview of LDA and an example of its implementation and interpretation. Decisions that must be made when conducting an LDA (e.g., prior specification, choice of cross-validation procedures) and methods of evaluating case classification (posterior probability, typicality probability) and overall classification (hit rate, Huberty’s I index) are discussed. LDA for prediction is described from a modern Bayesian perspective, as opposed to its original derivation. A step-by-step example of implementing and interpreting LDA results is provided. All analyses were conducted in R, and the script is provided; the data are available online.

Download Full-text

Comparing performance of multinomial logistic regression and discriminant analysis for monitoring access to care for acute myocardial infarction

Journal of Clinical Epidemiology ◽

10.1016/s0895-4356(01)00505-4 ◽

2002 ◽

Vol 55 (4) ◽

pp. 400-406 ◽

Cited By ~ 11

Author(s):

Monir Hossain ◽

Steven Wright ◽

Laura A. Petersen

Keyword(s):

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Discriminant Analysis ◽

Access To Care ◽

Multinomial Logistic Regression

Download Full-text

The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects

Journal of Molecular Structure THEOCHEM ◽

10.1016/s0166-1280(02)00622-x ◽

2003 ◽

Vol 622 (1-2) ◽

pp. 97-111 ◽

Cited By ~ 48

Author(s):

Andrew P Worth ◽

Mark T.D Cronin

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Human Health ◽

Health Effects ◽

Classification Tree ◽

Classification Models ◽

Classification Tree Analysis ◽

Tree Analysis ◽

Human Health Effects

Download Full-text

The Efficiency of Multinomial Logistic Regression Compared with Multiple Group Discriminant Analysis

Journal of the American Statistical Association ◽

10.1080/01621459.1987.10478548 ◽

1987 ◽

Vol 82 (400) ◽

pp. 1118-1122 ◽

Cited By ~ 20

Author(s):

Shelley B. Bull ◽

Allan Donner

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Multinomial Logistic Regression ◽

Multiple Group

Download Full-text