Bank Credit Scoring
A credit risk assessment system that predicts loan default probability using machine learning, enabling data-driven lending decisions for financial institutions.
Overview
A credit risk assessment system that predicts loan default probability using machine learning, enabling data-driven lending decisions for financial institutions.
Problem
Traditional credit scoring relies on limited features and linear assumptions. Banks need a more accurate, explainable model that can assess default risk across diverse customer profiles while complying with fairness requirements.
Dataset
Structured loan application data including demographic features, credit history, income, employment status, and loan characteristics. Preprocessing handles missing values, categorical encoding, and outlier treatment.
Architecture
LightGBM gradient boosting model with SHAP (SHapley Additive exPlanations) for model interpretability. Feature importance analysis identifies the top predictors of default. A calibrated probability output enables risk-tiered decision making.
Training
Trained with 5-fold stratified cross-validation. Class weights adjusted to handle imbalance. SHAP values computed on the test set to provide per-prediction explanations, supporting regulatory explainability requirements.
Results
Achieved AUC-ROC of 0.89 on the holdout set. SHAP analysis revealed credit utilization ratio and payment history as the strongest default predictors. Model explanations enabled loan officers to understand and trust model decisions.