Fraud Detection System
An end-to-end machine learning pipeline for detecting fraudulent transactions using ensemble methods and advanced feature engineering on imbalanced financial datasets.
Overview
An end-to-end machine learning pipeline for detecting fraudulent transactions using ensemble methods and advanced feature engineering on imbalanced financial datasets.
Problem
Financial fraud causes billions in losses annually. Traditional rule-based systems struggle with evolving fraud patterns and generate high false-positive rates that frustrate legitimate customers. The challenge is building a model that catches fraud with high recall while keeping false positives low on a severely imbalanced dataset.
Dataset
Transactional dataset with features including transaction amount, time, merchant category, and anonymized behavioral signals. The dataset exhibits severe class imbalance (~0.17% fraud rate), requiring careful resampling strategies.
Architecture
A stacked ensemble combining XGBoost and Random Forest classifiers, with a logistic regression meta-learner. SMOTE oversampling is applied to the training set to address class imbalance. Feature engineering extracts velocity features, time-based patterns, and statistical aggregates per user.
Training
Models trained with stratified k-fold cross-validation to preserve class distribution across folds. Hyperparameters tuned via Bayesian optimization. Threshold tuning applied post-training to optimize the precision-recall trade-off for the fraud class.
Results
Achieved 94% recall on the fraud class with a precision of 87%, significantly outperforming the baseline logistic regression. The ensemble reduced false negatives by 31% compared to a single XGBoost model.