Back to Projects
Financial Services / ML

Bank Customer Churn Prediction

A classification system that identifies banking customers at risk of churning, providing relationship managers with early warning signals and the feature-level insight needed to design effective retention interventions.

Type
Classification
Domain
Retail Banking
Methods
Ensemble, SHAP
Status
Completed

The Challenge

Customer acquisition in retail banking is significantly more expensive than retention, yet many institutions lack systematic early-warning capability for identifying at-risk customers. By the time a customer closes their account, the retention window has passed.

The challenge is not just predicting who will leave, but understanding why, so that relationship managers can tailor their intervention to the specific drivers of each customer's dissatisfaction.

Approach

01
Data Exploration
Analysed customer demographic, transactional, and product holding data. Identified key features correlated with churn and examined class imbalance characteristics.
02
Feature Engineering
Created behavioural features including product usage trends, transaction frequency changes, and tenure-adjusted engagement metrics.
03
Model Development
Built and compared multiple classifiers with particular attention to handling class imbalance through SMOTE, class weighting, and threshold optimisation.
04
Interpretability
Applied SHAP analysis to provide feature-level explanations for churn predictions, enabling targeted retention strategies rather than generic interventions.
CHURN PREDICTION
SHAP + ENSEMBLE
ALL CUSTOMERS
ENGAGEMENT DECLINING
HIGH CHURN RISK
PREDICTED CHURN

Results

Predictive
Early identification of at-risk customers
Explainable
SHAP-driven feature importance per customer
Strategic
Retention recommendations by churn driver

The model successfully identified customers at elevated churn risk with sufficient lead time for intervention. SHAP analysis revealed that the primary churn drivers varied significantly across customer segments, confirming that a one-size-fits-all retention approach would be ineffective and justifying personalised intervention strategies.

Technology Stack

Python Scikit-learn XGBoost SHAP SMOTE Pandas Matplotlib
Interested in this work or something similar?