Student Retention Prediction System
A supervised learning system that identifies students at risk of dropping out by combining academic performance data with demographic and engagement indicators, enabling institutions to intervene before it is too late.
The Challenge
Universities lose talented students every year because at-risk cases are not identified early enough for intervention. By the time a student formally withdraws, the decision has usually been building for months, signalled by patterns in attendance, grades, engagement, and personal circumstances.
The data to predict these outcomes often exists across fragmented systems but is rarely synthesised into a unified early-warning signal that student support teams can act on proactively.
Approach
Results
The core challenge in this dataset was severe class imbalance - dropout cases formed a small minority of total records, making them inherently difficult to detect. Initial models achieved recall of just 0.44 on the dropout class, meaning more than half of at-risk students were being missed entirely.
Through iterative feature engineering, resampling strategies, and threshold optimisation, XGBoost improved dropout recall from 0.44 to 0.81 (80.89%), correctly flagging four in five students who would ultimately withdraw. The F1 score rose from 0.12 to 0.21, reflecting the difficulty of simultaneously improving precision on a heavily imbalanced target.
SHAP values provided clear, per-student explanations of which factors were driving each risk prediction, giving support teams actionable insight rather than opaque scores. The neural network captured additional non-linear patterns but at the cost of reduced interpretability, making XGBoost the recommended production model for institutional deployment.