Credit Scoring Using Ensemble Machine Learning

Algorithmic fairness in credit scoring

Oxford Review of Economic Policy ◽

10.1093/oxrep/grab020 ◽

2021 ◽

Vol 37 (3) ◽

pp. 585-617

Author(s):

Teresa Bono ◽

Karen Croxson ◽

Adam Giles

Keyword(s):

Machine Learning ◽

Credit Scoring ◽

Large Data ◽

Error Rates ◽

The Past ◽

Ensemble Machine Learning ◽

Hidden Patterns ◽

Credit Scoring Model ◽

Distributional Impacts ◽

Modelling Approach

Abstract The use of machine learning as an input into decision-making is on the rise, owing to its ability to uncover hidden patterns in large data and improve prediction accuracy. Questions have been raised, however, about the potential distributional impacts of these technologies, with one concern being that they may perpetuate or even amplify human biases from the past. Exploiting detailed credit file data for 800,000 UK borrowers, we simulate a switch from a traditional (logit) credit scoring model to ensemble machine-learning methods. We confirm that machine-learning models are more accurate overall. We also find that they do as well as the simpler traditional model on relevant fairness criteria, where these criteria pertain to overall accuracy and error rates for population subgroups defined along protected or sensitive lines (gender, race, health status, and deprivation). We do observe some differences in the way credit-scoring models perform for different subgroups, but these manifest under a traditional modelling approach and switching to machine learning neither exacerbates nor eliminates these issues. The paper discusses some of the mechanical and data factors that may contribute to statistical fairness issues in the context of credit scoring.

Download Full-text

Fintech Credit Scoring Techniques for Evaluating P2P Loan Applications – A Python Machine Learning Ensemble Approach

International Journal of Smart Business and Technology ◽

10.21742/ijsbt.2018.6.1.04 ◽

2018 ◽

Vol 6 (1) ◽

Keyword(s):

Machine Learning ◽

Credit Scoring ◽

Ensemble Approach

Download Full-text

Prediction of Pipe Performance with Ensemble Machine Learning Based Approaches

2017 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC) ◽

10.1109/sdpc.2017.84 ◽

2017 ◽

Author(s):

Fang Shi ◽

Zheng Liu ◽

Eric Li

Keyword(s):

Machine Learning ◽

Ensemble Machine Learning

Download Full-text

A Novel Ensemble Machine Learning Method to Detect Phishing Attack

2020 IEEE 23rd International Multitopic Conference (INMIC) ◽

10.1109/inmic50486.2020.9318210 ◽

2020 ◽

Author(s):

Abdul Basit ◽

Maham Zafar ◽

Abdul Rehman Javed ◽

Zunera Jalil

Keyword(s):

Machine Learning ◽

Machine Learning Method ◽

Learning Method ◽

Ensemble Machine Learning

Download Full-text

A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201954 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9471-9484

Author(s):

Yilun Jin ◽

Yanan Liu ◽

Wenyu Zhang ◽

Shuai Zhang ◽

Yu Lou

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Credit Scoring ◽

Imbalanced Data ◽

Ensemble Model ◽

Selective Sampling ◽

Machine Learning Methods ◽

Multi Stage ◽

Proposed Model ◽

New Feature

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.

Download Full-text

Ensemble Machine Learning Assisted Reservoir Characterization Using Field Production Data–An Offshore Field Case Study

Energies ◽

10.3390/en14041052 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1052

Author(s):

Baozhong Wang ◽

Jyotsna Sharma ◽

Jianhua Chen ◽

Patricia Persaud

Keyword(s):

Machine Learning ◽

Random Forest ◽

Reservoir Characterization ◽

Time Lapse ◽

Production Data ◽

Oil Saturation ◽

Ensemble Machine Learning ◽

Input Parameters ◽

Saturation Profiles ◽

Field Production

Estimation of fluid saturation is an important step in dynamic reservoir characterization. Machine learning techniques have been increasingly used in recent years for reservoir saturation prediction workflows. However, most of these studies require input parameters derived from cores, petrophysical logs, or seismic data, which may not always be readily available. Additionally, very few studies incorporate the production data, which is an important reflection of the dynamic reservoir properties and also typically the most frequently and reliably measured quantity throughout the life of a field. In this research, the random forest ensemble machine learning algorithm is implemented that uses the field-wide production and injection data (both measured at the surface) as the only input parameters to predict the time-lapse oil saturation profiles at well locations. The algorithm is optimized using feature selection based on feature importance score and Pearson correlation coefficient, in combination with geophysical domain-knowledge. The workflow is demonstrated using the actual field data from a structurally complex, heterogeneous, and heavily faulted offshore reservoir. The random forest model captures the trends from three and a half years of historical field production, injection, and simulated saturation data to predict future time-lapse oil saturation profiles at four deviated well locations with over 90% R-square, less than 6% Root Mean Square Error, and less than 7% Mean Absolute Percentage Error, in each case.

Download Full-text