Data Scientist | Machine Learning Expert | Insight Generator
Explored a dataset to identify patterns, preprocessed data, and performed feature engineering. Applied statistical techniques and machine learning algorithms to detect anomalies, followed by a detailed report summarising findings and recommendations.
Technologies: Python, Pandas, Scikit-learn, Statistical Methods
View ProjectAnalysed a dataset through exploration and preprocessing, conducted feature engineering, determined the optimal number of clusters (k), and applied machine learning models to segment customers effectively.
Technologies: Python, Scikit-learn, Pandas, Clustering Algorithms
View ProjectConducted phased data exploration, preprocessing, and feature engineering. Built and compared predictive models using XGBoost and a neural network to forecast student dropout rates with high accuracy.
Technologies: Python, XGBoost, TensorFlow, Pandas
View ProjectApplied statistical hypothesis testing to evaluate organizational data scenarios.
View ProjectExplored the differences between correlation and causation in data analysis.
View ProjectDeveloped and interpreted machine learning models for data-driven insights.
View ProjectUtilized non-parametric statistical tests on a dataset.
View ProjectApplied dimensionality reduction techniques like PCA and t-SNE.
View ProjectCompared regression and classification approaches in machine learning.
View ProjectImplemented manual forward and backward propagation in neural networks.
View ProjectConstructed a foundational neural network for predictive modeling.
View ProjectExplored techniques for optimizing model hyperparameters.
View ProjectOptimal cluster count determination using the elbow method for anomaly detection.
Principal Component Analysis with SVM outlier detection for advanced anomaly identification.
Multi-feature boxplot visualisation for examining distribution patterns across customer metrics.
Multi-feature boxplot visualisation for examining distribution patterns across customer metrics.
Plot showing the sum of squared errors (SSE) to determine the optimal number of clusters.
Histogram illustrating the frequency distribution of engine RPM values.
Principal Component Analysis scatter plot with One-Class SVM outlier detection.
Boxplot showing the distribution of average unit costs with outliers.
Visual representation of hierarchical clustering showing cluster distances.