Back to Projects
Deep Learning

Neural Network Design & Hyperparameter Optimisation

Binary classification neural network designed, trained and systematically optimised using TensorFlow and Keras on the UCI Spambase dataset — 4,601 emails across 57 frequency-based features — with grid search over architecture configurations achieving 94.6% test accuracy.

Type
Deep Learning
Domain
Binary Classification
Methods
TensorFlow / Keras, Grid Search, Binary Cross-Entropy
Dataset
UCI Spambase — 4,601 emails, 57 features

The Challenge

Spam classification is a canonical problem in binary classification, but the more durable goal here was to develop genuine architectural intuition: how does layer depth, neuron count, epoch budget and batch size interact? Which choices matter most in practice, and can that be demonstrated empirically rather than asserted from reading documentation?

The UCI Spambase benchmark provides a clean, well-understood target — 4,601 emails labelled spam or legitimate, each described by 57 word-frequency and email-characteristic features — making it ideal for controlled architecture exploration where performance differences are attributable to design choices, not data quality.

Approach

01
Data Preparation
Loaded and preprocessed the UCI Spambase dataset — 4,601 labelled emails with 57 word-frequency and character-frequency features. Applied StandardScaler normalisation, then split 80/20 into training and held-out test sets with stratification to preserve class proportions.
02
Architecture Design
Designed a sequential deep network with two hidden layers — 64 then 32 neurons — both with ReLU activation, and a single sigmoid output node for binary classification. Compiled with Adam optimiser and binary cross-entropy loss, then validated the baseline architecture against the held-out set.
03
Hyperparameter Grid Search
Conducted systematic grid search over epoch count and batch size configurations, tracking validation accuracy across all combinations. Identified the optimal epoch budget (~14) and batch size (~17) that maximised generalisation without overfitting, with training and validation curves plotted for each configuration.
04
Evaluation and Analysis
Evaluated the tuned model on the held-out test set, reporting accuracy, loss, and classification metrics. Analysed the distribution of results across grid search configurations to understand performance sensitivity to each hyperparameter independently.
NEURAL NETWORK
94.6% ACCURACY

Results

94.6%
Test accuracy on 920 held-out emails (80/20 split)
Grid Search
Epoch and batch size systematically explored — optimal: ~14 epochs, batch size ~17
94.5% avg
Average best accuracy across all grid search configurations — consistent, stable performance

The baseline architecture — two hidden layers of 64 and 32 neurons with ReLU activation, Adam optimiser — achieved 94.57% test accuracy from the outset, demonstrating the architecture was well-matched to the problem. Systematic grid search confirmed this result was reproducible and not an artefact of a single lucky training run: the average best accuracy across all configurations was 94.50%, with optimal performance clustering around 14 epochs and a batch size of 17.

The grid search exercise established that epoch count and batch size both influence the accuracy-overfitting trade-off, but that the architecture's fundamental capacity was sufficient — diminishing returns set in quickly beyond the identified optimum. The validation curves for each configuration were plotted to make this sensitivity landscape visible rather than abstract.

Technology Stack

Python TensorFlow Keras Scikit-learn GridSearchCV Matplotlib Pandas
Interested in this work or something similar?