QA Lead Transitioning to Data Science | Cambridge Level 7 Certificate 2025 | Python + ML/NLP Portfolio
Built an end-to-end prototype analysing 81 public quarterly financial reports & earnings-call transcripts of three Global Systemically Important Banks (2023–2025). Combined advanced NLP (FinBERT, FinLLaMA, BERTopic, VADER) with structured financial metrics extraction and ARIMA forecasting. Produced an interactive dashboard presenting regulatory-style risk insights. Technologies: Python, FinBERT, VADER, BERTopic, GPT-2, ARIMA modelling, Sentence Transformers, BART, HDBSCAN clustering. Prototype built using only publicly available data – no affiliation with or delivery to the Bank of England.
Developed an NLP solution to analyse customer feedback sentiment. Implemented text preprocessing techniques and trained a BERT-based model to classify sentiment with 92% accuracy. Technologies: Python, NLTK, Transformers, PyTorch.
Technologies: Python, NLTK, Transformers, PyTorch.
Analysed a dataset through exploration and preprocessing, conducted feature engineering, determined the optimal number of clusters (k), and applied machine learning models to segment customers effectively.
Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms
Conducted phased data exploration, preprocessing, and feature engineering. Built and compared predictive models using XGBoost and a neural network to forecast student dropout rates with high accuracy.
Technologies: Python, XGBoost, TensorFlow, Pandas
Applied statistical hypothesis testing to evaluate organisational data scenarios. Explored the differences between correlation and causation in data analysis.
Technologies: Python, Statistical Methods
Explored a dataset to identify patterns, preprocessed data, and performed feature engineering. Applied statistical techniques and machine learning algorithms to detect anomalies, followed by a detailed report summarising findings and recommendations.
Technologies: Python, Pandas, Scikit-learn, Statistical Methods
Analysed historical sales data using time series decomposition, feature engineering, and ARIMA modeling to forecast future demand. Achieved 15% improvement in forecast accuracy over baseline methods.
Technologies: Python, Statsmodels, Prophet, Pandas
Designed and implemented a deep neural network architecture from scratch. Applied forward and backward propagation algorithms, optimised hyperparameters, and achieved state-of-the-art performance on classification tasks.
Technologies: Python, TensorFlow, Keras, NumPy, Matplotlib
Pandas • NumPy • Scikit-learn • TensorFlow • PyTorch • Jupyter • Git • Production-ready ML pipelines • Automated/scalable workflows
Supervised & unsupervised learning • XGBoost • Random Forests • SVM • Neural networks (custom architectures, forward/backward propagation, gradient descent) • Ensemble methods • Clustering (K-means, DBSCAN, HDBSCAN, hierarchical)
Hugging Face Transformers • FinBERT • FinLLaMA • BERT • BERTopic • VADER • GPT-2 • BART • Sentence Transformers • spaCy • NLTK • Text classification & sentiment analysis (92 % accuracy on customer-review dataset)
ARIMA/SARIMA • Prophet • LSTM • Statsmodels • Decomposition techniques • Demand & financial forecasting (15 % accuracy improvement vs baseline on book-sales project)
Isolation Forests • Autoencoders • Statistical methods • Real-time maritime/engine anomaly detection project
Hyperparameter tuning (Grid, Random, Bayesian) • ROC-AUC • Precision-Recall • Custom business metrics • SHAP interpretability • A/B testing
Feature creation/selection • PCA • t-SNE • UMAP • Autoencoders • High-dimensional data processing
Matplotlib • Seaborn • Plotly • Power BI • Interactive dashboards • Business intelligence reporting
Parametric & non-parametric tests • Correlation & causal inference • Model validation
Experiment tracking • Model versioning • Drift detection concepts • Automated retraining basics (academic & portfolio exposure)
Model versioning • Experiment tracking • Deployment pipelines • Drift detection • Automated retraining
RFM analysis • Cohort analysis • Behavioural segmentation • Retention optimisation • Targeted marketing insights
A selection of my data visualisation techniques
Interested in working together? Fill out the form below, and I'll get back to you promptly.
Based in London, UK
QA Lead Transitioning to Data Science | Cambridge Level 7 Certificate 2025 | Python + ML/NLP Portfolio
Built an end-to-end prototype analysing 81 public quarterly financial reports & earnings-call transcripts of three Global Systemically Important Banks (2023–2025). Combined advanced NLP (FinBERT, FinLLaMA, BERTopic, VADER) with structured financial metrics extraction and ARIMA forecasting. Produced an interactive dashboard presenting regulatory-style risk insights. Technologies: Python, FinBERT, VADER, BERTopic, GPT-2, ARIMA modelling, Sentence Transformers, BART, HDBSCAN clustering. Prototype built using only publicly available data – no affiliation with or delivery to the Bank of England.
Developed an NLP solution to analyse customer feedback sentiment. Implemented text preprocessing techniques and trained a BERT-based model to classify sentiment with 92% accuracy. Technologies: Python, NLTK, Transformers, PyTorch.
Technologies: Python, NLTK, Transformers, PyTorch.
Analysed a dataset through exploration and preprocessing, conducted feature engineering, determined the optimal number of clusters (k), and applied machine learning models to segment customers effectively.
Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms
Conducted phased data exploration, preprocessing, and feature engineering. Built and compared predictive models using XGBoost and a neural network to forecast student dropout rates with high accuracy.
Technologies: Python, XGBoost, TensorFlow, Pandas
Applied statistical hypothesis testing to evaluate organisational data scenarios. Explored the differences between correlation and causation in data analysis.
Technologies: Python, Statistical Methods
Explored a dataset to identify patterns, preprocessed data, and performed feature engineering. Applied statistical techniques and machine learning algorithms to detect anomalies, followed by a detailed report summarising findings and recommendations.
Technologies: Python, Pandas, Scikit-learn, Statistical Methods
Analysed historical sales data using time series decomposition, feature engineering, and ARIMA modeling to forecast future demand. Achieved 15% improvement in forecast accuracy over baseline methods.
Technologies: Python, Statsmodels, Prophet, Pandas
Designed and implemented a deep neural network architecture from scratch. Applied forward and backward propagation algorithms, optimised hyperparameters, and achieved state-of-the-art performance on classification tasks.
Technologies: Python, TensorFlow, Keras, NumPy, Matplotlib
Pandas • NumPy • Scikit-learn • TensorFlow • PyTorch • Jupyter • Git • Production-ready ML pipelines • Automated/scalable workflows
Supervised & unsupervised learning • XGBoost • Random Forests • SVM • Neural networks (custom architectures, forward/backward propagation, gradient descent) • Ensemble methods • Clustering (K-means, DBSCAN, HDBSCAN, hierarchical)
Hugging Face Transformers • FinBERT • FinLLaMA • BERT • BERTopic • VADER • GPT-2 • BART • Sentence Transformers • spaCy • NLTK • Text classification & sentiment analysis (92 % accuracy on customer-review dataset)
ARIMA/SARIMA • Prophet • LSTM • Statsmodels • Decomposition techniques • Demand & financial forecasting (15 % accuracy improvement vs baseline on book-sales project)
Isolation Forests • Autoencoders • Statistical methods • Real-time maritime/engine anomaly detection project
Hyperparameter tuning (Grid, Random, Bayesian) • ROC-AUC • Precision-Recall • Custom business metrics • SHAP interpretability • A/B testing
Feature creation/selection • PCA • t-SNE • UMAP • Autoencoders • High-dimensional data processing
Matplotlib • Seaborn • Plotly • Power BI • Interactive dashboards • Business intelligence reporting
Parametric & non-parametric tests • Correlation & causal inference • Model validation
Experiment tracking • Model versioning • Drift detection concepts • Automated retraining basics (academic & portfolio exposure)
Model versioning • Experiment tracking • Deployment pipelines • Drift detection • Automated retraining
RFM analysis • Cohort analysis • Behavioural segmentation • Retention optimisation • Targeted marketing insights
A selection of my data visualisation techniques
Interested in working together? Fill out the form below, and I'll get back to you promptly.
Based in London, UK