Exploring the intersection of machine learning, data science, and visual storytelling through comprehensive analytical projects
Comprehensive time series forecasting demonstrating classical and modern approaches
An in-depth analysis of Nielsen BookScan data comparing SARIMA, LSTM, XGBoost, and hybrid models for forecasting book sales. Achieved optimal results with Sequential Hybrid model (MAE: 144.99) for stable patterns and classical SARIMA (MAE: 415.45) for volatile data.
Diverse analytical approaches across multiple domains
Supervised and unsupervised learning applications
Neural networks and advanced architectures
Time series and statistical thinking
Interactive learning demonstrations
Processed 951,668 e-commerce orders from 2012-2016 across five continents
Created RFM features: Frequency, Recency, CLV, Average Unit Cost, Customer Age
Applied K-means (k=5) and hierarchical clustering with PCA/t-SNE visualisation
Identified 5 distinct segments with tailored marketing strategies
Analysed student data across application, engagement, and academic stages
Implemented XGBoost and Neural Networks with stratified sampling
Identified UnauthorisedAbsenceCount as top predictor in Stage 2
Tuned models achieving 95% recall for engagement-based predictions
Analysed 19,535 samples of engine functionality metrics
Applied IQR method identifying 2.16% outliers across features
Implemented One-Class SVM and Isolation Forest algorithms
Selected Isolation Forest (5% contamination) for real-time monitoring
Processed 4,601 emails with 57 features from Spambase dataset
Built 64-32-1 sequential model with ReLU and Sigmoid activations
Trained with Adam optimizer using binary cross-entropy loss
Achieved 92.3% accuracy on test set validation
Real-time adjustment of learning rate, batch size, and epochs
Dynamic visualisation of loss curves and accuracy trends
Side-by-side comparison of different parameter configurations
Clear explanations of parameter effects on model performance
Interactive visualisations of common statistical distributions
Visual demonstrations of p-values and confidence intervals
Animated explanations of sampling distributions and CLT
Interactive Bayesian updating and prior/posterior visualisations