Decision Tree & Random Forest Implementation for Customer Retention
| Model | Accuracy | ROC AUC | F1 Score | Precision | Recall |
|---|---|---|---|---|---|
| Random Forest | 0.861 | 0.740 | 0.609 | 0.736 | 0.530 |
| Best Tuned Tree | 0.846 | 0.760 | 0.589 | 0.694 | 0.695 |
| Simple Tree (Gini) | 0.787 | 0.678 | 0.490 | 0.490 | 0.490 |
The 79:21 ratio significantly affected model performance, particularly for identifying churners. Balanced class weighting improved minority class detection.
NumOfProducts_2.0 emerged as the most critical predictor (35% importance), followed by Age (25%) and Balance (10%).
Random Forest excelled in precision and overall accuracy, whilst Best Tuned Tree showed superior recall for identifying actual churners.
Optimal depth of 6 and minimum samples of 4 per leaf balanced model complexity with generalisation capability.
Recommendation: Deploy Random Forest model for general churn prediction due to its superior precision (73.6%) and balanced performance across metrics.
Alternative: Use Best Tuned Tree for high-risk customer identification where maximising recall (69.5%) is critical to capture more potential churners.
Age-Based Targeting: Develop specialised retention programmes for customers aged 45+ who show highest churn propensity.
Product Portfolio Optimisation: Encourage customers towards the 2-product sweet spot through targeted cross-selling campaigns.
Data Quality Enhancement: Implement robust data collection processes to reduce missing values and improve model accuracy.
Real-Time Monitoring: Deploy models in production with continuous monitoring and monthly retraining schedules.
Ensemble Enhancement: Explore gradient boosting methods (XGBoost, LightGBM) for potentially improved performance on imbalanced datasets.
Feature Engineering: Develop interaction features between Age and Product holdings to capture more nuanced customer behaviour patterns.
ROI Calculation: Estimate that reducing false negatives by 20% could retain an additional 100+ customers annually, worth approximately £2.5M in lifetime value.
Cost-Benefit Analysis: Balance precision vs recall based on cost of retention campaigns (£50-200 per customer) versus lifetime customer value (£25,000 average).
High Priority (0-3 months): Deploy Random Forest model, implement age-based customer segmentation
Medium Priority (3-6 months): Enhance data collection processes, develop product portfolio strategies
Long-term (6-12 months): Advanced ensemble methods, comprehensive feature engineering, automated retraining pipeline