My Data Voyage | Data Science Portfolio

My Data Voyage

QA Lead Transitioning to Data Science | Cambridge Level 7 Certificate 2025 | Python + ML/NLP Portfolio

Featured Projects

Capstone Project (University of Cambridge 2025) - Prototype G-SIB Risk Assessment System

Built an end-to-end prototype analysing 81 public quarterly financial reports & earnings-call transcripts of three Global Systemically Important Banks (2023–2025). Combined advanced NLP (FinBERT, FinLLaMA, BERTopic, VADER) with structured financial metrics extraction and ARIMA forecasting. Produced an interactive dashboard presenting regulatory-style risk insights. Technologies: Python, FinBERT, VADER, BERTopic, GPT-2, ARIMA modelling, Sentence Transformers, BART, HDBSCAN clustering. Prototype built using only publicly available data – no affiliation with or delivery to the Bank of England.

NLP Customer Review Sentiment Analysis for Wellness Centre

Developed an NLP solution to analyse customer feedback sentiment. Implemented text preprocessing techniques and trained a BERT-based model to classify sentiment with 92% accuracy. Technologies: Python, NLTK, Transformers, PyTorch.

Technologies: Python, NLTK, Transformers, PyTorch.

Project Portfolio

Customer Segmentation

Analysed a dataset through exploration and preprocessing, conducted feature engineering, determined the optimal number of clusters (k), and applied machine learning models to segment customers effectively.

Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms

Student Dropout Prediction

Conducted phased data exploration, preprocessing, and feature engineering. Built and compared predictive models using XGBoost and a neural network to forecast student dropout rates with high accuracy.

Technologies: Python, XGBoost, TensorFlow, Pandas

Statistical Hypothesis Testing

Applied statistical hypothesis testing to evaluate organisational data scenarios. Explored the differences between correlation and causation in data analysis.

Technologies: Python, Statistical Methods

Anomaly Detection

Explored a dataset to identify patterns, preprocessed data, and performed feature engineering. Applied statistical techniques and machine learning algorithms to detect anomalies, followed by a detailed report summarising findings and recommendations.

Technologies: Python, Pandas, Scikit-learn, Statistical Methods

Time Series Forecasting

Analysed historical sales data using time series decomposition, feature engineering, and ARIMA modeling to forecast future demand. Achieved 15% improvement in forecast accuracy over baseline methods.

Technologies: Python, Statsmodels, Prophet, Pandas

Neural Network Project

Designed and implemented a deep neural network architecture from scratch. Applied forward and backward propagation algorithms, optimised hyperparameters, and achieved state-of-the-art performance on classification tasks.

Technologies: Python, TensorFlow, Keras, NumPy, Matplotlib

Explore More

Movie Review Sentiment Classification System

Advanced Neural Network Architecture Visualisation

Interactive Neural Network Learning Demonstrator

Hyperparameter Optimisation Visual Analytics

Foundation Neural Network Model Explorer

Deep Learning Network Implementation Framework

Automated Hyperparameter Tuning Pipeline

Comprehensive Model Evaluation Metrics Suite

Gradient Descent Optimiser Comparative Analysis

Titanic Survival Prediction Neural Network

Supervised Learning Algorithm Implementation

Advanced Time Series Forecasting Models

Customer Behavioural Segmentation Analytics

Maritime Engine Anomaly Detection System

Statistical Hypothesis Testing Framework

Student Retention Predictive Analytics

Advanced NLP Sentiment Classification Engine

Custom Deep Learning Architecture Design

Baltimore Police ARIMA Crime Forecasting System

Baltimore Crime Patterns Time Series Analysis

RNN Model Comparison for Text Classification

Decision Tree Analysis with SHAP Interpretation

Bank Customer Churn Prediction System

Neural Network Optimiser Performance Analysis

Titanic Survivor Prediction Optimisation Study

Neural Network Manual Propagation Framework

Regression vs Classification Decision Framework

Automobile Price Prediction with PCA Analysis

Advanced Dimensionality Reduction Visualisation

Comprehensive Automobile Price Analysis Guide

Customer Loyalty Predictive Analytics System

Medical Insurance Cost Correlation Analysis

Statistical Hypothesis Testing Analysis Dashboard

Technical Skills

Python & Data Science Stack

Pandas • NumPy • Scikit-learn • TensorFlow • PyTorch • Jupyter • Git • Production-ready ML pipelines • Automated/scalable workflows

Machine Learning & Deep Learning

Supervised & unsupervised learning • XGBoost • Random Forests • SVM • Neural networks (custom architectures, forward/backward propagation, gradient descent) • Ensemble methods • Clustering (K-means, DBSCAN, HDBSCAN, hierarchical)

Natural Language Processing & Generative AI

Hugging Face Transformers • FinBERT • FinLLaMA • BERT • BERTopic • VADER • GPT-2 • BART • Sentence Transformers • spaCy • NLTK • Text classification & sentiment analysis (92 % accuracy on customer-review dataset)

Time-Series Analysis & Forecasting

ARIMA/SARIMA • Prophet • LSTM • Statsmodels • Decomposition techniques • Demand & financial forecasting (15 % accuracy improvement vs baseline on book-sales project)

Anomaly Detection

Isolation Forests • Autoencoders • Statistical methods • Real-time maritime/engine anomaly detection project

Model Evaluation & Optimisation

Hyperparameter tuning (Grid, Random, Bayesian) • ROC-AUC • Precision-Recall • Custom business metrics • SHAP interpretability • A/B testing

Feature Engineering & Dimensionality Reduction

Feature creation/selection • PCA • t-SNE • UMAP • Autoencoders • High-dimensional data processing

Data Visualisation & BI

Matplotlib • Seaborn • Plotly • Power BI • Interactive dashboards • Business intelligence reporting

Statistical Analysis & Hypothesis Testing

Parametric & non-parametric tests • Correlation & causal inference • Model validation

MLOps & Deployment Fundamentals

Experiment tracking • Model versioning • Drift detection concepts • Automated retraining basics (academic & portfolio exposure)

MLOps & Model Deployment

Model versioning • Experiment tracking • Deployment pipelines • Drift detection • Automated retraining

Customer & Business Analytics

RFM analysis • Cohort analysis • Behavioural segmentation • Retention optimisation • Targeted marketing insights

Visualisation Gallery

A selection of my data visualisation techniques

Optimal cluster count determination using the elbow method for anomaly detection

Principal Component Analysis for anomaly identification

Multi-feature boxplot visualisation for examining distribution patterns across customer metrics

Plot showing k-means clustering

Violin plot illustrating SHAP values

Visual representation of hierarchical clustering showing cluster distances

Contact Me

Interested in working together? Fill out the form below, and I'll get back to you promptly.

Form was sent successfully!

Location

Based in London, UK

old homepage (below) removded on 9.12.25

My Data Voyage | Data Science Portfolio

My Data Voyage

QA Lead Transitioning to Data Science | Cambridge Level 7 Certificate 2025 | Python + ML/NLP Portfolio

Featured Projects

Capstone Project (University of Cambridge 2025) - Prototype G-SIB Risk Assessment System

NLP Customer Review Sentiment Analysis for Wellness Centre

Technologies: Python, NLTK, Transformers, PyTorch.

Project Portfolio

Customer Segmentation

Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms

Student Dropout Prediction

Technologies: Python, XGBoost, TensorFlow, Pandas

Statistical Hypothesis Testing

Applied statistical hypothesis testing to evaluate organisational data scenarios. Explored the differences between correlation and causation in data analysis.

Technologies: Python, Statistical Methods

Anomaly Detection

Technologies: Python, Pandas, Scikit-learn, Statistical Methods

Time Series Forecasting

Analysed historical sales data using time series decomposition, feature engineering, and ARIMA modeling to forecast future demand. Achieved 15% improvement in forecast accuracy over baseline methods.

Technologies: Python, Statsmodels, Prophet, Pandas

Neural Network Project

Technologies: Python, TensorFlow, Keras, NumPy, Matplotlib

Explore More

Movie Review Sentiment Classification System

Advanced Neural Network Architecture Visualisation

Interactive Neural Network Learning Demonstrator

Hyperparameter Optimisation Visual Analytics

Foundation Neural Network Model Explorer

Deep Learning Network Implementation Framework

Automated Hyperparameter Tuning Pipeline

Comprehensive Model Evaluation Metrics Suite

Gradient Descent Optimiser Comparative Analysis

Titanic Survival Prediction Neural Network

Supervised Learning Algorithm Implementation

Advanced Time Series Forecasting Models

Customer Behavioural Segmentation Analytics

Maritime Engine Anomaly Detection System

Statistical Hypothesis Testing Framework

Student Retention Predictive Analytics

Advanced NLP Sentiment Classification Engine

Custom Deep Learning Architecture Design

Baltimore Police ARIMA Crime Forecasting System

Baltimore Crime Patterns Time Series Analysis

RNN Model Comparison for Text Classification

Decision Tree Analysis with SHAP Interpretation

Bank Customer Churn Prediction System

Neural Network Optimiser Performance Analysis

Titanic Survivor Prediction Optimisation Study

Neural Network Manual Propagation Framework

Regression vs Classification Decision Framework

Automobile Price Prediction with PCA Analysis

Advanced Dimensionality Reduction Visualisation

Comprehensive Automobile Price Analysis Guide

Customer Loyalty Predictive Analytics System

Medical Insurance Cost Correlation Analysis

Statistical Hypothesis Testing Analysis Dashboard

Technical Skills

Python & Data Science Stack

Pandas • NumPy • Scikit-learn • TensorFlow • PyTorch • Jupyter • Git • Production-ready ML pipelines • Automated/scalable workflows

Machine Learning & Deep Learning

Natural Language Processing & Generative AI

Time-Series Analysis & Forecasting

ARIMA/SARIMA • Prophet • LSTM • Statsmodels • Decomposition techniques • Demand & financial forecasting (15 % accuracy improvement vs baseline on book-sales project)

Anomaly Detection

Isolation Forests • Autoencoders • Statistical methods • Real-time maritime/engine anomaly detection project

Model Evaluation & Optimisation

Hyperparameter tuning (Grid, Random, Bayesian) • ROC-AUC • Precision-Recall • Custom business metrics • SHAP interpretability • A/B testing

Feature Engineering & Dimensionality Reduction

Feature creation/selection • PCA • t-SNE • UMAP • Autoencoders • High-dimensional data processing

Data Visualisation & BI

Matplotlib • Seaborn • Plotly • Power BI • Interactive dashboards • Business intelligence reporting

Statistical Analysis & Hypothesis Testing

Parametric & non-parametric tests • Correlation & causal inference • Model validation

MLOps & Deployment Fundamentals

Experiment tracking • Model versioning • Drift detection concepts • Automated retraining basics (academic & portfolio exposure)

MLOps & Model Deployment

Model versioning • Experiment tracking • Deployment pipelines • Drift detection • Automated retraining

Customer & Business Analytics

RFM analysis • Cohort analysis • Behavioural segmentation • Retention optimisation • Targeted marketing insights

Visualisation Gallery

A selection of my data visualisation techniques

Optimal cluster count determination using the elbow method for anomaly detection

Principal Component Analysis for anomaly identification

Multi-feature boxplot visualisation for examining distribution patterns across customer metrics

Plot showing k-means clustering

Violin plot illustrating SHAP values

Visual representation of hierarchical clustering showing cluster distances

Contact Me

Interested in working together? Fill out the form below, and I'll get back to you promptly.

Form was sent successfully!

Location

Based in London, UK