OUSL Research Repository

ANALYSIS OF LOAN DELINQUENCY PREDICTION BASED ON MULTINOMIAL LOGISTIC REGRESSION AND RANDOM FOREST

Show simple item record

dc.contributor.author Hettiarachchi, H. A. P. L. J.
dc.contributor.author Punchi-Manage, Ruwan
dc.date.accessioned 2025-12-02T06:51:49Z
dc.date.available 2025-12-02T06:51:49Z
dc.date.issued 2025
dc.identifier.uri http://repository.ou.ac.lk/handle/94ousl/3664
dc.description.abstract The banking sector plays a critical role in fostering economic growth by extending credit to individuals and businesses; however, effective loan portfolio management remains a persistent challenge due to the risks associated with Non-Performing Assets (NPAs). Rising NPAs, often driven by economic downturns, inadequate risk assessment, and external shocks, pose a significant threat to financial stability. To address these challenges, this study explores predictive modeling approaches for loan delinquency classification by employing the Multinomial Logistic Regression Model and the Random Forest model applied to 43,644 loan records from a Sri Lankan bank. The analysis categorizes loan performance into four levels, ranging from A0 (performing loans) to D0 (severely delinquent loans), using several financial and demographic variables. Multinomial logistic regression shows that interest rate and loan age are the most influential predictors. The model indicates that a 1% rise in interest rate increases the risk of delinquency by 5.7 21.1%, while each additional month of loan age amplifies the likelihood by 27 76%. Delays in recovery also significantly elevate risk for severely delinquent loans, with each additional day of delay associated with a 1.2% increase in default probability. The model demonstrates excellent discriminatory power at the performance extremes (AUC: 0.987 for A0 and 0.996 for D0). The Random Forest model considers the loan status as a binary (current vs. delinquent) variable. An 80:20 training-testing split was used for data analysis. The performance of the model was evaluated using a confusion matrix, AUC-ROC curves, and accuracy metrics on a testing set. The Random Forest model pidenterforms better in overall predictive accuracy, with a low 1.55% out-of-bag error rate. It achieves 99.4% accuracy in classifying current loans and 96.6% for delinquent loans. Variable importance analysis confirms loan age, recovery date, and interest rate as dominant predictors. Our study is cross-sectional, with predictor variables measured at a defined observation point for each loan. Class imbalance is a limitation; to address this, we plan to apply class weighting and evaluate model performance using accuracy metrics in future work. Collectively, the models underscore the predictive strength of time-dependent variables (loan age and recovery delays) and financial indicators (interest rates, outstanding amounts). While multinomial logistic regression offers nuanced insight into risk progression across multiple categories, Random Forest delivers robust binary classification performance. en_US
dc.language.iso en en_US
dc.publisher The Open University of Sri Lanka en_US
dc.subject loan delinquency en_US
dc.subject multinomial logistic regression en_US
dc.title ANALYSIS OF LOAN DELINQUENCY PREDICTION BASED ON MULTINOMIAL LOGISTIC REGRESSION AND RANDOM FOREST en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search OUSL Research


Browse

My Account