| dc.description.abstract |
The banking sector plays a critical role in fostering economic growth by extending
credit to individuals and businesses; however, effective loan portfolio management
remains a persistent challenge due to the risks associated with Non-Performing
Assets (NPAs). Rising NPAs, often driven by economic downturns, inadequate
risk assessment, and external shocks, pose a significant threat to financial stability.
To address these challenges, this study explores predictive modeling approaches
for loan delinquency classification by employing the Multinomial Logistic
Regression Model and the Random Forest model applied to 43,644 loan records
from a Sri Lankan bank. The analysis categorizes loan performance into four
levels, ranging from A0 (performing loans) to D0 (severely delinquent loans),
using several financial and demographic variables. Multinomial logistic regression
shows that interest rate and loan age are the most influential predictors. The model
indicates that a 1% rise in interest rate increases the risk of delinquency by 5.7
21.1%, while each additional month of loan age amplifies the likelihood by 27
76%. Delays in recovery also significantly elevate risk for severely delinquent
loans, with each additional day of delay associated with a 1.2% increase in default
probability. The model demonstrates excellent discriminatory power at the
performance extremes (AUC: 0.987 for A0 and 0.996 for D0). The Random Forest
model considers the loan status as a binary (current vs. delinquent) variable. An
80:20 training-testing split was used for data analysis. The performance of the
model was evaluated using a confusion matrix, AUC-ROC curves, and accuracy
metrics on a testing set. The Random Forest model pidenterforms better in overall
predictive accuracy, with a low 1.55% out-of-bag error rate. It achieves 99.4%
accuracy in classifying current loans and 96.6% for delinquent loans. Variable
importance analysis confirms loan age, recovery date, and interest rate as dominant
predictors. Our study is cross-sectional, with predictor variables measured at a
defined observation point for each loan. Class imbalance is a limitation; to address
this, we plan to apply class weighting and evaluate model performance using
accuracy metrics in future work. Collectively, the models underscore the predictive
strength of time-dependent variables (loan age and recovery delays) and financial
indicators (interest rates, outstanding amounts). While multinomial logistic
regression offers nuanced insight into risk progression across multiple categories,
Random Forest delivers robust binary classification performance. |
en_US |