Αποτίμηση κινδύνων των ευρωπαϊκών τραπεζών και συστήματα έγκαιρης προειδοποίησης

2018 ◽  
Author(s):  
Παντελής Σταυρούλιας

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).

Worldwide, breast cancer is the leading type of cancer in women accounting for 25% of all cases. Survival rates in the developed countries are comparatively higher with that of developing countries. This had led to the importance of computer aided diagnostic methods for early detection of breast cancer disease. This eventually reduces the death rate. This paper intents the scope of the biomarker that can be used to predict the breast cancer from the anthropometric data. This experimental study aims at computing and comparing various classification models (Binary Logistic Regression, Ball Vector Machine (BVM), C4.5, Partial Least Square (PLS) for Classification, Classification Tree, Cost sensitive Classification Tree, Cost sensitive Decision Tree, Support Vector Machine for Classification, Core Vector Machine, ID3, K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Log-Reg TRIRLS, Multi Layer Perceptron (MLP), Multinomial Logistic Regression (MLR), Naïve Bayes (NB), PLS for Discriminant Analysis, PLS for LDA, Random Tree (RT), Support Vector Machine SVM) for the UCI Coimbra breast cancer dataset. The feature selection algorithms (Backward Logit, Fisher Filtering, Forward Logit, ReleifF, Step disc) are worked out to find out the minimum attributes that can achieve a better accuracy. To ascertain the accuracy results, the Jack-knife cross validation method for the algorithms is conducted and validated. The Core vector machine classification algorithm outperforms the other nineteen algorithms with an accuracy of 82.76%, sensitivity of 76.92% and specificity of 87.50% for the selected three attributes, Age, Glucose and Resistin using ReleifF feature selection algorithm.


2019 ◽  
Vol 23 (2) ◽  
pp. 130-141 ◽  
Author(s):  
Chun-Chang Lee ◽  
Chih-Min Liang ◽  
Yang-Tung Liu

This paper compares the predictive powers of hierarchical generalized linear modeling (HGLM), logistic regression, and discriminant analysis with regard to tenure choices between buying property and renting property by sampling the residents of the Greater Taipei area. The results imply that the hit rate and other indicators included in HGLM have better predictive power with regard to tenure choices than the binary logistic regression model and the discriminant analysis model. That is, using HGLM to process nested data can increase prediction accuracy regarding household tenure choices. Furthermore, cross-validation is performed to analyze hit rate stability. The hit rate sequencing from this cross-validation is found to be consistent with the HGLM results, implying that the comparison of the three models in terms of hit rate performance prediction in this study is stable and reliable.


Author(s):  
Rifda Nabila ◽  
Risdiana Himmati ◽  
Rendra Erdkhadifa

Abstrak: Tujuan dari penelitian ini adalah untuk membandingkan analisis regresi logistik multinomial dan analisis diskriminan untuk mengelompokkan keputusan kunjungan wisata halal di Jawa Tengah berdasarkan ketepatan pengelompokan. Analisis statistik yang digunakan adalah regresi logistik multinomial dan analisis diskriminan. Kedua analisis tersebut dapat digunakan sebagai metode pengelompokan objek, sehingga keduanya dapat dibandingkan berdasarkan ketepatan pengelompokkannya. Penelitian ini membandingkan analisis regresi logistik multinomial dan analisis diskriminan dalam pengelompokan keputusan kunjungan wisata halal. Data yang digunakan adalah worship facilities, halalness, general Islamic mortality, dan tourism destination image. Hasil analisis menggunakan metode regresi logistik multinomial menunjukkan faktor-faktor yang secara signifikan mempengaruhi pengelompokan keputusan kunjungan wisata halal adalah variabel tourism destination image, variabel halalness, dan variabel general Islamic morality. Sedangkan dengan analisis diskriminan menunjukkan bahwa semua variabel prediktor yakni worship facilities, halalness, general Islamic mortality, dan tourism destination image memberikan pengaruh secara signifikan terhadap pengklasifikasian keputusan mengunjungi destinasi wisata halal. Penelitian ini menunjukkan bahwa metode regresi logistik multinomial lebih baik untuk pengelompokkan keputusan kunjungan wisata halal dibandingan metode analisis diskriminan, dengan presetnase ketepatan pengelompokkan pada metode regresi logit multinomial sebesar 59,5%  dan analisis diskriminan sebesar 53,5%. Analisis regresi logistik multinominal lebih mudah digunakan dalam proses pengelompokan keputusan kunjuangan wisata halal karena tidak mempertimbangkan asumsi yang harus dipenuhi. Kata Kunci: Analisis Diskriminan; Regresi Logistik Multinominal; Keputusan Mengunjungi   Abstract: The purpose of this study is to compare multinomial logistic regression analysis and discriminant analysis to classify decisions on halal tourism visits in Central Java based on grouping accuracy. Statistical analysis used is multinomial logistic regression and discriminant analysis. The two analyzes can be used as a method of grouping objects, so that they can be compared based on the accuracy of the grouping. This study compares multinomial logistic regression analysis and discriminant analysis in grouping decisions for halal tourism visits. The data used are worship facilities, halalness, general Islamic mortality, and tourism destination image. The results of the analysis using the multinomial logistic regression method show that the factors that significantly influence the grouping of decisions for halal tourism visits are the tourism destination image variable, the halalness variable, and the general Islamic morality variable. Meanwhile, discriminant analysis shows that all predictor variables namely worship facilities, halalness, general Islamic mortality, and tourism destination image have a significant influence on the classification of decisions to visit halal tourist destinations. This study shows that the multinomial logistic regression method is better for grouping decisions on halal tourist visits than the discriminant analysis method, with a preset percentage of grouping accuracy in the multinomial logit regression method of 59.5% and discriminant analysis of 53.5%. Multinominal logistic regression analysis is easier to use in the process of grouping halal tourism travel decisions because it does not consider the assumptions that must be met. Keywords: Discriminant Analysis; Multinomial Logistic Regression; Visiting decision.


2018 ◽  
Vol 124 (5) ◽  
pp. 1284-1293 ◽  
Author(s):  
Alexander H. K. Montoye ◽  
Bradford S. Westgate ◽  
Morgan R. Fonley ◽  
Karin A. Pfeiffer

Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3–78.5%) had higher accuracy than other machine learning models (70.9–76.4%) and accuracy similar to combination methods (77.0–77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6–92.8% vs. 87.8–91.9% in MSU data set; 79.3–80.2% vs. 76.7–78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0–77.4% vs. 74.1–79.1% in MSU data set; 76.1–77.0% vs. 75.5–77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5–78.5% vs. 53.6–70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.


2009 ◽  
Vol 6 (1) ◽  
pp. 5-26 ◽  
Author(s):  
Thomas E. McKee

ABSTRACT: An “ultimate learning algorithm” is one that produces models that closely match the real world’s underlying distribution of functions. To try to create such an algorithm, researchers typically employ manual algorithm design with cross-validation. It has been shown that cross-validation is not a viable way to construct an ultimate learning algorithm. For machine learning researchers, “meta-learning” should be more desirable than manual algorithm design with cross-validation. Meta-learning is concerned with gaining knowledge about learning methodologies. One meta-learning approach involves evaluating the suitability of various algorithms for a learning task in order to select an appropriate algorithm. An alternative approach is to incorporate predictions from base algorithms as features to be evaluated by subsequent algorithms. This paper reports on exploratory research that implemented the latter approach as a three-layer stacked generalization model using neural networks, logistic regression, and classification tree algorithms to predict all categories of financial fraud. The purpose was to see if this form of meta-learning offered significant benefits for financial fraud prediction. Fifteen possible financial fraud predictors were identified based on a theoretical fraud model from prior research. Only public data for these possible predictors were obtained from U.S. Securities and Exchange Commission filings from the period 1995–2002 for a sample of 50 fraud and 50 non-fraud companies. These data were selected for the year prior to when the fraud was initiated. These variables were used to create a variety of neural network, logistic regression, and classification tree models while using holdout sample and cross-validation techniques. A 71.4 percent accurate neural network model was then stacked into a logistic regression model, increasing the prediction accuracy to 76.5 percent. The logistic regression model was subsequently stacked into a classification tree model to achieve an 83 percent accuracy rate. These results compared favorably to two prior neural network studies, also employing only public data, which achieved 63 percent accuracy rates. Model results were also analyzed via probability-adjusted overall error rates, relative misclassification costs, and receiver operating characteristics. The increase in classification accuracy from 71 percent to 83 percent, the decline in estimated overall error rate from 0.0057 to 0.0035, and the decline in relative misclassification costs from 2.79 to 0.58 suggest that benefits were achieved by the meta-learning stacking approach. Further research into the meta-learning stacking approach appears warranted.


2019 ◽  
Vol 2 (3) ◽  
pp. 250-263 ◽  
Author(s):  
Peter Boedeker ◽  
Nathan T. Kearns

In psychology, researchers are often interested in the predictive classification of individuals. Various models exist for such a purpose, but which model is considered a best practice is conditional on attributes of the data. Under certain conditions, linear discriminant analysis (LDA) has been shown to perform better than other predictive methods, such as logistic regression, multinomial logistic regression, random forests, support-vector machines, and the K-nearest neighbor algorithm. The purpose of this Tutorial is to provide researchers who already have a basic level of statistical training with a general overview of LDA and an example of its implementation and interpretation. Decisions that must be made when conducting an LDA (e.g., prior specification, choice of cross-validation procedures) and methods of evaluating case classification (posterior probability, typicality probability) and overall classification (hit rate, Huberty’s I index) are discussed. LDA for prediction is described from a modern Bayesian perspective, as opposed to its original derivation. A step-by-step example of implementing and interpreting LDA results is provided. All analyses were conducted in R, and the script is provided; the data are available online.


Sign in / Sign up

Export Citation Format

Share Document