Logistic Regression Model Using Scheimpflug-Placido Cornea Topographer Parameters to Diagnose Keratoconus

Journal of Ophthalmology ◽

10.1155/2021/5528927 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Emre Altinkurt ◽

Ozkan Avci ◽

Orkun Muftuoglu ◽

Adem Ugurlu ◽

Zafer Cebeci ◽

...

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Roc Curve ◽

Logistic Regression Model ◽

Corneal Thickness ◽

Computer Algorithms ◽

Roc Curve Analysis ◽

Data Set ◽

Prediction Ability ◽

Keratorefractive Surgery

Purpose. Diagnose keratoconus by establishing an effective logistic regression model from the data obtained with a Scheimpflug-Placido cornea topographer. Methods. Topographical parameters of 125 eyes of 70 patients diagnosed with keratoconus by clinical or topographical findings were compared with 120 eyes of 63 patients who were defined as keratorefractive surgery candidates. The receiver operating character (ROC) curve analysis was performed to determine the diagnostic ability of the topographic parameters. The data set of parameters with an AUROC (area under the ROC curve) value greater than 0.9 was analyzed with logistic regression analysis (LRA) to determine the most predictive model that could diagnose keratoconus. A logit formula of the model was built, and the logit values of every eye in the study were calculated according to this formula. Then, an ROC analysis of the logit values was done. Results. Baiocchi Calossi Versaci front index (BCVf) had the highest AUROC value (0.976) in the study. The LRA model, which had the highest prediction ability, had 97.5% accuracy, 96.8% sensitivity, and 99.2% specificity. The most significant parameters were found to be BCVf ( p = 0.001 ), BCVb (Baiocchi Calossi Versaci back) ( p = 0.002 ), posterior rf (apical radius of the flattest meridian of the aspherotoric surface in 4.5 mm diameter of the cornea) ( p = 0.005 ), central corneal thickness ( p = 0.072 ), and minimum corneal thickness ( p = 0.494 ). Conclusions. The LRA model can distinguish keratoconus corneas from normal ones with high accuracy without the need for complex computer algorithms.

Download Full-text

Combined pattern of childhood psycho-behavioral characteristics in patients with schizophrenia: a retrospective study in Japan

BMC Psychiatry ◽

10.1186/s12888-021-03049-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yukiko Hamasaki ◽

Takao Nakayama ◽

Takatoshi Hikida ◽

Toshiya Murai

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Roc Curve ◽

Child Behavior Checklist ◽

Logistic Regression Model ◽

Scientific Evidence ◽

Behavioral Characteristics ◽

Roc Curve Analysis ◽

Genetic Studies ◽

Retrospective Assessment

Abstract Background Although epidemiological and genetic studies have provided scientific evidence that places schizophrenia into the framework of early neurodevelopmental disorders, the psycho-behavioral characteristics of children that later go on to develop schizophrenia have not been sufficiently clarified. This study aimed to retrospectively identify characteristics specific to patients with schizophrenia during childhood via their guardians’ reporting of these characteristics. Methods Participants included 54 outpatients with schizophrenia in their twenties who fulfilled DSM-IV-TR criteria. Additionally, 192 normal healthy subjects participated as sex- and age-matched controls. The guardians of all participants were recruited to rate participants’ childhood characteristics from 6 to 8 years of age on a modified version of the Child Behavior Checklist (CBCL), which was used as a retrospective assessment questionnaire. Using t-tests, logistic regression, and Receiver Operating Characteristic (ROC) curve analysis, we estimated the psycho-behavioral characteristics specific to schizophrenia during childhood. Using the obtained logistic regression model, we prototyped a risk-predicting algorithm based on the CBCL scores. Results Among the eight CBCL subscale t-scores, “withdrawn” (p = 0.002), “thought problems” (p = 0.001), and “lack of aggressive behavior” (p = 0.002) were each significantly associated with the later diagnosis of schizophrenia, although none of these mean scores were in the clinical range at the time of childhood. The algorithm of the logistic regression model, based on eight CBCL subscales, had an area under the ROC curve of 82.8% (95% CI: 76–89%), which indicated that this algorithm’s prediction of late development of schizophrenia has moderate accuracy. Conclusions The results suggest that according to guardian reports, participants showed psycho-behavioral characteristics during childhood, different to those of healthy controls, which could be predictive of the later development of schizophrenia. Our newly developed algorithm is available to use in future studies to further test its validity.

Download Full-text

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

Meteorological and hydrological conditions triggering rockfall events in Germany

10.5194/egusphere-egu21-5367 ◽

2021 ◽

Author(s):

Katrin Nissen ◽

Stefan Rupp ◽

Björn Guse ◽

Uwe Ulbrich ◽

Sergiy Vorogushyn ◽

...

Keyword(s):

Logistic Regression ◽

Soil Moisture ◽

Statistical Model ◽

Regression Model ◽

Logistic Regression Model ◽

Daily Precipitation ◽

Hydrologic Model ◽

Data Set ◽

Hydrological Conditions ◽

Landslide Database

In this study we present the results of a logistic regression model aimed at describing changes in probabilities for rockfall events in Germany in response to changes in meteorological and hydrological conditions.The rockfall events for this study are taken from the landslide database for Germany (Damm and Klose, 2015). The meteorological variables we tested as predictors for the logistic regression model are daily precipitation from the REGNIE data set (Rauthe et al. 2013), hourly precipitation from the RADKLIM radar climatology (Winterrath et al., 2018) and temperature from the E-OBS data set (Cornes et al., 2018). As there is no observational soil moisture data set covering the entire country, we used soil moisture modelled with the state-of-the-art hydrological model mHM (Samaniego et al. 2010), which was calibrated using gauge measurements.In order to select the best statistical model we tested a large number of physically plausible combinations of meteorological and hydrological predictors. Each model was checked using cross-validation. The decision on the final model was based on the value of the logarithmic skill score and on expert judgement.The final statistical model includes the local percentile of daily precipitation, total relative soil moisture and freeze-thawing cycles in the previous weeks as predictors. It was found that daily precipitation is the most important parameter in the model. An increase of daily precipitation from its median to its 80th percentile approximately doubles the probability for a rockfall event. Higher soil moisture and the occurrence of freeze-thaw cycles also increase the probability for rockfall events.&#160; Cornes, R. C. et al., 2018: An ensemble version of the E&#8208;OBS temperature and precipitation data sets. Journal of Geophysical Research: Atmospheres, 123, 9391&#8211; 9409.Damm, B., Klose, M., 2015. The landslide database for Germany: Closing the gap at national level. Geomorphology 249, 82&#8211;93Rauthe, M. et al., 2013: A Central European precipitation climatology &#8211; Part I: Generation and validation of a high-reso-lution gridded daily data set (HYRAS), Vol. 22(3), p 235&#8211;256.Samaniego, L. et al., 2010: Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale. Water Resour. Res., 46,W05523Winterrath, T. et al., 2018: RADKLIM Version 2017.002: Reprocessed gauge-adjusted radar data, one-hour precipitation sums (RW), DOI: 10.5676/DWD/RADKLIM_RW_V2017.002.

Download Full-text

Prognostic CT Features and Prediction Model of Patients With Primary Hepatocellular Carcinomas Undergoing Partial Hepatectomy

10.21203/rs.3.rs-949721/v1 ◽

2021 ◽

Author(s):

Cuiping Zhou ◽

Xiaohua Ban ◽

Huijun Hu ◽

Qiuxia Yang ◽

Rong Zhang ◽

...

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Clinical Outcomes ◽

Partial Hepatectomy ◽

Logistic Regression Model ◽

Univariate Analysis ◽

Binary Logistic Regression ◽

Roc Curve Analysis ◽

Hepatocellular Carcinomas ◽

Ct Features

Abstract Background: Hepatocellular carcinoma (HCC) is the most common primary malignant tumor in the liver. Partial hepatectomy is one of the most effective therapies for HCC but suffer from the high recurrence rate. At present, the studies of association between clinical outcomes and CT features of patients with HCCs undergoing partial hepatectomy are still limited. The purpose of this study is to determine the predictive CT features and establish a model for predicting relapse or metastasis in patients with primary hepatocellular carcinomas (HCCs) undergoing partial hepatectomy.Methods: The clinical data and CT features of 112 patients with histopathologically confirmed primary HCCs were retrospectively reviewed. The clinical outcomes were categorized into two groups according to whether relapse or metastasis occurred within 2 years after partial hepatectomy. The association between clinical outcomes and CT features including tumour size, margin, shape, vascular invasion (VI), arterial phase hyperenhancement, washout appearance, capsule appearance, satellite lesion, involvement segment, cirrhosis, peritumoral enhancement and necrosis was analyzed using univariate analysis and binary logistic regression. Then establish logistic regression model, followed by receiver operating characteristic (ROC) curve analysis.Results: CT features including tumor size, margin, shape, VI, washout appearance, satellite lesion, involvement segment, peritumoral enhancement and necrosis were associated with clinical outcomes, as determined by univariate analysis (P<0.05). Only tumor margin and VI remained independent risk factors in binary logistic regression analysis (OR=6.41 and 10.92 respectively). The logistic regression model was logit(p)=-1.55+1.86 margin +2.39 VI. ROC curve analysis showed that the area under curve of the obtained logistic regression model was 0.887(95% CI：0.827-0.947).Conclusion: Patients with ill-defined margin or VI of HCCs were independent risk predictors of poor clinical outcome after partial hepatectomy. The model as logit(p)= -1.55+1.86 margin +2.39 VI was a good predictor of the clinical outcomes.

Download Full-text

Prognostic Value of Pre-operative Plasma NT-proBNP Combined With Creatinine in Early Outcomes After Adult Cardiac Valve Surgery

10.21203/rs.3.rs-541308/v1 ◽

2021 ◽

Author(s):

Tianyuan Li ◽

Hanjun Cao ◽

Liangchao Qu ◽

Dingde Long ◽

Xiaoping Zhu

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Prognostic Value ◽

Logistic Regression Model ◽

Valve Surgery ◽

Curve Analysis ◽

Cardiac Valve ◽

Plasma Creatinine ◽

Roc Curve Analysis ◽

Cardiac Valve Surgery

Abstract Objective To assess to prognostic value of pre-operative plasma NT-proBNP combined with creatinine in early outcomes after adult cardiac valve surgery. Methods A total of 125 patients who underwent cardiac valve surgery in the first affiliated hospital of nanchang university between October 2016 and October 2018 were retrospectively reviewed. including age, gender, weight, height, pre-operative plasma creatinine, preoperative plasma NT-proBNP, number of valves involved, pre-operative EF and early postoperative outcomes. The independent pre-operative factors that have a significant impact on early post-operative outcomes after adult cardiac valve surgery were investigated. Prognostic value in early outcomes after adult cardiac valve surgery was analyzed by ROC curve analysis. Results preoperative plasma creatinine, preoperative plasma NT-proBNP and the number of valves involved in the complication group were significantly higher than that in non-complication group;BMI and pre-operative EF in the complication group was lower than that in the non-complication group ,with a statistically significant difference(P<0.05). Factors having P-value < 0.15 in the bivariable logistic regression model were entered into a multivariable logistic regression model. The multivariate logistic regression analysis indicated that the preoperative plasma creatinine, preoperative plasma NT-proBNP,BMI and the number of valves involved were correlated with the early postoperative outcomes, and the differences were statistically significant (P < 0.05). ROC curve analysis was used to explore the predictive performance. Results in ROC curve analysis, the AUC for the preoperative plasma NT-proBNP was 0.806 (95% CI 0.712～0.900，P<0.00). Logistic regression model found that the predictive value increased after adding the pre-operative plasma creatinine.the joint prediction AUC was 0.843, the sensitivity and specificity were 85.0%, 72.4% respectively. Conclusion The elevated NT-proBNP and creatinine levels were independently correlated with the early post-operative outcomes, were two promising prognostic predictors for predicting the worse clinical outcomes . The pre-operative plasma NT-proBNP and the plasma creatinine combination was determined to help identify high-risk patients and make appropriate clinical decisions.

Download Full-text

Exploring the Value of Lung Texture Features in Distinguishing Usual and Non-specific Interstitial Pneumonia

10.21203/rs.3.rs-533242/v1 ◽

2021 ◽

Author(s):

Xinhui Chen ◽

Ge Cheng ◽

Xinguan Yang ◽

Yuting Liao ◽

Zhipeng Zhou

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Interstitial Pneumonia ◽

Roc Curve ◽

Logistic Regression Model ◽

Usual Interstitial Pneumonia ◽

Texture Features ◽

Feature Reduction ◽

Multivariate Logistic Regression ◽

Lung Segmentation

Abstract Backgground At present, the most common types of interstitial pneumonia are usual interstitial pneumonia (UIP) and non-specific interstitial pneumonia (NSIP), and different types have different prognosis. In addition, if there is a mixture of different classifications, it will be difficult for radiologists to diagnose, and it will make clinical treatment difficult. Therefore, clinicians urgently need new imaging methods to solve such problems. This article aims to explore the CT lung texture images of UIP and NSIP to provide evidence for the identification of UIP and NSIP. Methods A retrospective analysis of 96 cases of interstitial pneumonia diagnosed by the Department of Pathology and the Affiliated Hospital of Guilin Medical College. Among them, there are 40 cases of UIP and 56 cases of NSIP. All patients are scanned by CT. Lung Intelligence Kit was utilized to perform lung segmentation and texture feature extraction. Variance analysis, least absolute shrinkage and selection operator (Lasso) and multivariate logistic regression were used to select effective features. Finally, a multivariate logistic regression model was constructed to identify two kinds of interstitial pneumonia. Receiver operating characteristic (ROC) curve, area under the curve (AUC), sensitivity, specificity were used to evaluate performance of the constructed model. We used the LK software to segment the two sets of lungs. Feature calculation and selection were performed on the data of the two groups of interstitial pneumonia after lung segmentation, the logistic regression model was established for the selected features, and the ROC curve was drawn. Results A total of 100 texture features are extracted from the whole lung segmented by LK, and finally 8 features are left after feature reduction. The above-mentioned values of UIP and NSIP of the training group are greater than those of the test group. Conclusions It is possible to distinguish UIP and NSIP by using LK software to extract lung texture in CT images.

Download Full-text

An Artificial Neural Network–Based Pediatric Mortality Risk Score: Development and Performance Evaluation Using Data From a Large North American Registry (Preprint)

10.2196/preprints.24079 ◽

2020 ◽

Author(s):

Niema Ghanad Poor ◽

Nicholas C West ◽

Rama Syamala Sreepada ◽

Srinivas Murthy ◽

Matthias Görges

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Mortality Risk ◽

Regression Models ◽

Logistic Regression Model ◽

Single Layer ◽

Ann Model ◽

Data Set ◽

Using Data ◽

Better Than

BACKGROUND In the pediatric intensive care unit (PICU), quantifying illness severity can be guided by risk models to enable timely identification and appropriate intervention. Logistic regression models, including the pediatric index of mortality 2 (PIM-2) and pediatric risk of mortality III (PRISM-III), produce a mortality risk score using data that are routinely available at PICU admission. Artificial neural networks (ANNs) outperform regression models in some medical fields. OBJECTIVE In light of this potential, we aim to examine ANN performance, compared to that of logistic regression, for mortality risk estimation in the PICU. METHODS The analyzed data set included patients from North American PICUs whose discharge diagnostic codes indicated evidence of infection and included the data used for the PIM-2 and PRISM-III calculations and their corresponding scores. We stratified the data set into training and test sets, with approximately equal mortality rates, in an effort to replicate real-world data. Data preprocessing included imputing missing data through simple substitution and normalizing data into binary variables using PRISM-III thresholds. A 2-layer ANN model was built to predict pediatric mortality, along with a simple logistic regression model for comparison. Both models used the same features required by PIM-2 and PRISM-III. Alternative ANN models using single-layer or unnormalized data were also evaluated. Model performance was compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC) and their empirical 95% CIs. RESULTS Data from 102,945 patients (including 4068 deaths) were included in the analysis. The highest performing ANN (AUROC 0.871, 95% CI 0.862-0.880; AUPRC 0.372, 95% CI 0.345-0.396) that used normalized data performed better than PIM-2 (AUROC 0.805, 95% CI 0.801-0.816; AUPRC 0.234, 95% CI 0.213-0.255) and PRISM-III (AUROC 0.844, 95% CI 0.841-0.855; AUPRC 0.348, 95% CI 0.322-0.367). The performance of this ANN was also significantly better than that of the logistic regression model (AUROC 0.862, 95% CI 0.852-0.872; AUPRC 0.329, 95% CI 0.304-0.351). The performance of the ANN that used unnormalized data (AUROC 0.865, 95% CI 0.856-0.874) was slightly inferior to our highest performing ANN; the single-layer ANN architecture performed poorly and was not investigated further. CONCLUSIONS A simple ANN model performed slightly better than the benchmark PIM-2 and PRISM-III scores and a traditional logistic regression model trained on the same data set. The small performance gains achieved by this two-layer ANN model may not offer clinically significant improvement; however, further research with other or more sophisticated model designs and better imputation of missing data may be warranted. CLINICALTRIAL

Download Full-text

Improving Geospatial Agreement by Hybrid Optimization in Logistic Regression-Based Landslide Susceptibility Modelling

Frontiers in Earth Science ◽

10.3389/feart.2021.713803 ◽

2021 ◽

Vol 9 ◽

Author(s):

Deliang Sun ◽

Haijia Wen ◽

Jiahui Xu ◽

Yalan Zhang ◽

Danzhou Wang ◽

...

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Landslide Susceptibility ◽

Logistic Regression Model ◽

Cross Validation ◽

Dominant Factor ◽

Mountainous Area ◽

Prediction Ability ◽

Before And After ◽

Fold Cross Validation

This study aims to develop a logistic regression model of landslide susceptibility based on GeoDetector for dominant-factor screening and 10-fold cross validation for training sample optimization. First, Fengjie county, a typical mountainous area, was selected as the study area since it experienced 1,522 landslides from 2001 to 2016. Second, 22 factors were selected as the initial conditioning factors, and a geospatial database was established with a grid of 30 m precision. Factor detection of the geographic detector and the stepwise regression method included in logistic regression were used to screen out the dominant factors from the database. Then, based on the sample dataset with a 1:10 ratio of landslides and nonlandslides, 10-fold cross validation was used to select the optimized sample to train the logistic regression model of landslide susceptibility in the study area. Finally, the accuracy and efficiency of the two models before and after screening out the dominant factors were evaluated and compared. The results showed that the total accuracy of the two models was both more than 0.9, and the area under the curve value of the receiver operating characteristic curve was more than 0.8, indicating that the models before and after screening factor both had high reliability and good prediction ability. Besides, the screened factors had an active leading role in the geospatial distribution of the historical landslide, indicating that the screened dominant factors have individual rationality. Improving the geospatial agreement between landslide susceptibility and actual landslide-prone by the screening of dominant factors and the optimization of the training samples, a simple, efficient, and reliable logistic-regression–based landslide susceptibility model can be constructed.

Download Full-text

Early Detection of Severe Functional Impairment Among Adolescents With Major Depression Using Logistic Classifier

Frontiers in Public Health ◽

10.3389/fpubh.2020.622007 ◽

2021 ◽

Vol 8 ◽

Author(s):

I.-Ming Chiu ◽

Wenhua Lu ◽

Fangming Tian ◽

Daniel Hart

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Model ◽

Logistic Regression Model ◽

Age Groups ◽

Recall Rate ◽

Training Data ◽

Statistical Tool ◽

Data Set ◽

Severe Impairment

Machine learning is about finding patterns and making predictions from raw data. In this study, we aimed to achieve two goals by utilizing the modern logistic regression model as a statistical tool and classifier. First, we analyzed the associations between Major Depressive Episode with Severe Impairment (MDESI) in adolescents with a list of broadly defined sociodemographic characteristics. Using findings from the logistic model, the second and ultimate goal was to identify the potential MDESI cases using a logistic model as a classifier (i.e., a predictive mechanism). Data on adolescents aged 12–17 years who participated in the National Survey on Drug Use and Health (NSDUH), 2011–2017, were pooled and analyzed. The logistic regression model revealed that compared with males and adolescents aged 12-13, females and those in the age groups of 14-15 and 16-17 had higher risk of MDESI. Blacks and Asians had lower risk of MDESI than Whites. Living in single-parent household, having less authoritative parents, having negative school experiences further increased adolescents' risk of having MDESI. The predictive model successfully identified 66% of the MDESI cases (recall rate) and accurately identified 72% of the MDESI and MDESI-free cases (accuracy rate) in the training data set. The rates of both recall and accuracy remained about the same (66 and 72%) using the test data. Results from this study confirmed that the logistic model, when used as a classifier, can identify potential cases of MDESI in adolescents with acceptable recall and reasonable accuracy rates. The algorithmic identification of adolescents at risk for depression may improve prevention and intervention.

Download Full-text

Analysis of Individual Loan Defaults Using Logit under Supervised Machine Learning Approach

Asian Journal of Probability and Statistics ◽

10.9734/ajpas/2019/v3i430100 ◽

2019 ◽

pp. 1-12

Author(s):

Dominic M. Obare ◽

Gladys G. Njoroge ◽

Moses M. Muraya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Test Data ◽

Logistic Regression Model ◽

Functional Form ◽

Supervised Machine Learning ◽

Data Set ◽

Machine Learning Approach ◽

Loan Defaults

Financial institutions have a large amount of data on their borrowers, which can be used to predict the probability of borrowers defaulting their loan or not. Some of the models that have been used to predict individual loan defaults include linear discriminant analysis models and extreme value theory models. These models are parametric in nature since they assume that the response being investigated takes a particular functional form. However, there is a possibility that the functional form used to estimate the response is very different from the actual functional form of the response. The purpose of this research was to analyze individual loan defaults in Kenya using the logistic regression model. The data used in this study was obtained from equity bank of Kenya for the period between 2006 to 2016. A random sample of 1000 loan applicants whose loans had been approved by equity bank of Kenya during this period was obtained. Data obtained was on the credit history, purpose of the loan, loan amount, nature of the saving account, employment status, sex of the applicant, age of the applicant, security used when acquiring the loan and the area of residence of the applicant (rural or urban). This study employed a quantitative research design, it deals with individual loans defaults as group characteristics of a borrower. The data was pre-processed by seeding using R- Software and then split into training dataset and test data set. The train data was used to train the logistic regression model by employing Supervised machine learning approach. The R-statistical software was used for the analysis of the data. The test data set was used to do cross-validation of the developed logistic model which later was used for analysis prediction of individual loan defaults. This study focused on the analysis of individual loan defaults in Kenya using the logistic regression model in Machine learning. The logistic regression model predicted 303 defaults from train data set, 122 non-defaults and misclassified loans were 56 and 69. The model had an accuracy of 0.7727 with the train data and 0.7333 with the test data. The logistic regression model showed a precision of 0.8440 and 0.8244 with the train and test data respectively. The performance of the model with both the train and test data was illustrated using a plot of train errors and test errors against sample size on the same axes. The plot showed that the performance of the model increases with an increase in sample size. The study recommended the use of logistic regression in conjunction with supervised machine learning approach in loan default prediction in financial institutions and also more research should be carried out on ensemble methods of loan defaults prediction in order to increase the prediction accuracy.

Download Full-text