scholarly journals Feature Selection in a Credit Scoring Model

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 746
Author(s):  
Juan Laborda ◽  
Seyong Ryoo

This paper proposes different classification algorithms—logistic regression, support vector machine, K-nearest neighbors, and random forest—in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.

2019 ◽  
Vol 35 (2) ◽  
pp. 371-394 ◽  
Author(s):  
Diwakar Tripathi ◽  
Damodar Reddy Edla ◽  
Ramalingaswamy Cheruku ◽  
Venkatanareshbabu Kuppili

2012 ◽  
Vol 235 ◽  
pp. 419-422 ◽  
Author(s):  
Bo Tang ◽  
Sai Bing Qiu

The general credit scoring model is to solve the two classification problems, but in real life we often encounter multiple classification problems. This paper proposes a multi-class support vector machine, which can solve multiple classification problems in the behavior assessment model.


Author(s):  
Wirot Yotsawat ◽  
Pakaket Wattuya ◽  
Anongnart Srivihok

<span>Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants.</span>


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Dayu Xu ◽  
Xuyao Zhang ◽  
Junguo Hu ◽  
Jiahao Chen

This paper mainly discusses the hybrid application of ensemble learning, classification, and feature selection (FS) algorithms simultaneously based on training data balancing for helping the proposed credit scoring model perform more effectively, which comprises three major stages. Firstly, it conducts preprocessing for collected credit data. Then, an efficient feature selection algorithm based on adaptive elastic net is employed to reduce the weakly related or uncorrelated variables to get high-quality training data. Thirdly, a novel ensemble strategy is proposed to make the imbalanced training data set balanced for each extreme learning machine (ELM) classifier. Finally, a new weighting method for single ELM classifiers in the ensemble model is established with respect to their classification accuracy based on generalized fuzzy soft sets (GFSS) theory. A novel cosine-based distance measurement algorithm of GFSS is also proposed to calculate the weights of each ELM classifier. To confirm the efficiency of the proposed ensemble credit scoring model, we implemented experiments with real-world credit data sets for comparison. The process of analysis, outcomes, and mathematical tests proved that the proposed model is capable of improving the effectiveness of classification in average accuracy, area under the curve (AUC), H-measure, and Brier’s score compared to all other single classifiers and ensemble approaches.


Sign in / Sign up

Export Citation Format

Share Document