Bank Credit Scoring

A credit risk assessment system that predicts loan default probability using machine learning, enabling data-driven lending decisions for financial institutions.

PythonScikit-learnLightGBMPandasSeabornSHAP

Overview

A credit risk assessment system that predicts loan default probability using machine learning, enabling data-driven lending decisions for financial institutions.

Problem

Traditional credit scoring relies on limited features and linear assumptions. Banks need a more accurate, explainable model that can assess default risk across diverse customer profiles while complying with fairness requirements.

Dataset

Structured loan application data including demographic features, credit history, income, employment status, and loan characteristics. Preprocessing handles missing values, categorical encoding, and outlier treatment.

Architecture

LightGBM gradient boosting model with SHAP (SHapley Additive exPlanations) for model interpretability. Feature importance analysis identifies the top predictors of default. A calibrated probability output enables risk-tiered decision making.

Training

Trained with 5-fold stratified cross-validation. Class weights adjusted to handle imbalance. SHAP values computed on the test set to provide per-prediction explanations, supporting regulatory explainability requirements.

Results

Achieved AUC-ROC of 0.89 on the holdout set. SHAP analysis revealed credit utilization ratio and payment history as the strongest default predictors. Model explanations enabled loan officers to understand and trust model decisions.

GitHub Repository

View on GitHub