My Data Voyage | Data Science Portfolio

My Data Voyage

QA Lead Transitioning to Data Science | Cambridge Level 7 Certificate 2025 | Python + ML/NLP Portfolio

Project Portfolio

Customer Segmentation

Analysed a dataset through exploration and preprocessing, conducted feature engineering, determined the optimal number of clusters (k), and applied machine learning models to segment customers effectively.

Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms

Student Dropout Prediction

Conducted phased data exploration, preprocessing, and feature engineering. Built and compared predictive models using XGBoost and a neural network to forecast student dropout rates with high accuracy.

Technologies: Python, XGBoost, TensorFlow, Pandas

Statistical Hypothesis Testing

Applied statistical hypothesis testing to evaluate organisational data scenarios. Explored the differences between correlation and causation in data analysis.

Technologies: Python, Statistical Methods

Anomaly Detection

Explored a dataset to identify patterns, preprocessed data, and performed feature engineering. Applied statistical techniques and machine learning algorithms to detect anomalies, followed by a detailed report summarising findings and recommendations.

Technologies: Python, Pandas, Scikit-learn, Statistical Methods

Time Series Forecasting

Analysed historical sales data using time series decomposition, feature engineering, and ARIMA modeling to forecast future demand. Achieved 15% improvement in forecast accuracy over baseline methods.

Technologies: Python, Statsmodels, Prophet, Pandas

Neural Network Project

Designed and implemented a deep neural network architecture from scratch. Applied forward and backward propagation algorithms, optimised hyperparameters, and achieved state-of-the-art performance on classification tasks.

Technologies: Python, TensorFlow, Keras, NumPy, Matplotlib

Technical Skills

Python & Data Science Stack

Pandas • NumPy • Scikit-learn • TensorFlow • PyTorch • Jupyter • Git • Production-ready ML pipelines • Automated/scalable workflows

Machine Learning & Deep Learning

Supervised & unsupervised learning • XGBoost • Random Forests • SVM • Neural networks (custom architectures, forward/backward propagation, gradient descent) • Ensemble methods • Clustering (K-means, DBSCAN, HDBSCAN, hierarchical)

Natural Language Processing & Generative AI

Hugging Face Transformers • FinBERT • FinLLaMA • BERT • BERTopic • VADER • GPT-2 • BART • Sentence Transformers • spaCy • NLTK • Text classification & sentiment analysis (92 % accuracy on customer-review dataset)

Time-Series Analysis & Forecasting

ARIMA/SARIMA • Prophet • LSTM • Statsmodels • Decomposition techniques • Demand & financial forecasting (15 % accuracy improvement vs baseline on book-sales project)

Anomaly Detection

Isolation Forests • Autoencoders • Statistical methods • Real-time maritime/engine anomaly detection project

Model Evaluation & Optimisation

Hyperparameter tuning (Grid, Random, Bayesian) • ROC-AUC • Precision-Recall • Custom business metrics • SHAP interpretability • A/B testing

Feature Engineering & Dimensionality Reduction

Feature creation/selection • PCA • t-SNE • UMAP • Autoencoders • High-dimensional data processing

Data Visualisation & BI

Matplotlib • Seaborn • Plotly • Power BI • Interactive dashboards • Business intelligence reporting

Statistical Analysis & Hypothesis Testing

Parametric & non-parametric tests • Correlation & causal inference • Model validation

MLOps & Deployment Fundamentals

Experiment tracking • Model versioning • Drift detection concepts • Automated retraining basics (academic & portfolio exposure)

MLOps & Model Deployment

Model versioning • Experiment tracking • Deployment pipelines • Drift detection • Automated retraining

Customer & Business Analytics

RFM analysis • Cohort analysis • Behavioural segmentation • Retention optimisation • Targeted marketing insights

Visualisation Gallery

A selection of my data visualisation techniques

Contact Me

Interested in working together? Fill out the form below, and I'll get back to you promptly.

Form was sent successfully!

Location

Based in London, UK

old homepage (below) removded on 9.12.25
My Data Voyage | Data Science Portfolio

My Data Voyage

QA Lead Transitioning to Data Science | Cambridge Level 7 Certificate 2025 | Python + ML/NLP Portfolio

Project Portfolio

Customer Segmentation

Analysed a dataset through exploration and preprocessing, conducted feature engineering, determined the optimal number of clusters (k), and applied machine learning models to segment customers effectively.

Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms

Student Dropout Prediction

Conducted phased data exploration, preprocessing, and feature engineering. Built and compared predictive models using XGBoost and a neural network to forecast student dropout rates with high accuracy.

Technologies: Python, XGBoost, TensorFlow, Pandas

Statistical Hypothesis Testing

Applied statistical hypothesis testing to evaluate organisational data scenarios. Explored the differences between correlation and causation in data analysis.

Technologies: Python, Statistical Methods

Anomaly Detection

Explored a dataset to identify patterns, preprocessed data, and performed feature engineering. Applied statistical techniques and machine learning algorithms to detect anomalies, followed by a detailed report summarising findings and recommendations.

Technologies: Python, Pandas, Scikit-learn, Statistical Methods

Time Series Forecasting

Analysed historical sales data using time series decomposition, feature engineering, and ARIMA modeling to forecast future demand. Achieved 15% improvement in forecast accuracy over baseline methods.

Technologies: Python, Statsmodels, Prophet, Pandas

Neural Network Project

Designed and implemented a deep neural network architecture from scratch. Applied forward and backward propagation algorithms, optimised hyperparameters, and achieved state-of-the-art performance on classification tasks.

Technologies: Python, TensorFlow, Keras, NumPy, Matplotlib

Technical Skills

Python & Data Science Stack

Pandas • NumPy • Scikit-learn • TensorFlow • PyTorch • Jupyter • Git • Production-ready ML pipelines • Automated/scalable workflows

Machine Learning & Deep Learning

Supervised & unsupervised learning • XGBoost • Random Forests • SVM • Neural networks (custom architectures, forward/backward propagation, gradient descent) • Ensemble methods • Clustering (K-means, DBSCAN, HDBSCAN, hierarchical)

Natural Language Processing & Generative AI

Hugging Face Transformers • FinBERT • FinLLaMA • BERT • BERTopic • VADER • GPT-2 • BART • Sentence Transformers • spaCy • NLTK • Text classification & sentiment analysis (92 % accuracy on customer-review dataset)

Time-Series Analysis & Forecasting

ARIMA/SARIMA • Prophet • LSTM • Statsmodels • Decomposition techniques • Demand & financial forecasting (15 % accuracy improvement vs baseline on book-sales project)

Anomaly Detection

Isolation Forests • Autoencoders • Statistical methods • Real-time maritime/engine anomaly detection project

Model Evaluation & Optimisation

Hyperparameter tuning (Grid, Random, Bayesian) • ROC-AUC • Precision-Recall • Custom business metrics • SHAP interpretability • A/B testing

Feature Engineering & Dimensionality Reduction

Feature creation/selection • PCA • t-SNE • UMAP • Autoencoders • High-dimensional data processing

Data Visualisation & BI

Matplotlib • Seaborn • Plotly • Power BI • Interactive dashboards • Business intelligence reporting

Statistical Analysis & Hypothesis Testing

Parametric & non-parametric tests • Correlation & causal inference • Model validation

MLOps & Deployment Fundamentals

Experiment tracking • Model versioning • Drift detection concepts • Automated retraining basics (academic & portfolio exposure)

MLOps & Model Deployment

Model versioning • Experiment tracking • Deployment pipelines • Drift detection • Automated retraining

Customer & Business Analytics

RFM analysis • Cohort analysis • Behavioural segmentation • Retention optimisation • Targeted marketing insights

Visualisation Gallery

A selection of my data visualisation techniques

Contact Me

Interested in working together? Fill out the form below, and I'll get back to you promptly.

Form was sent successfully!

Location

Based in London, UK