basic-neural-network-visual

Overview

Problem Statement: A company needs a TensorFlow neural network to classify 4,601 emails from the Spambase dataset as spam or non-spam using 57 features.

Approach: I preprocessed the data, built a sequential model with two hidden layers, trained it with Adam optimiser, and evaluated its performance for spam detection.

Project Overview

4,601

Email Samples

Binary classification task

57

Features

Word and character frequencies

3

Layers

2 hidden layers with ReLU

TF

TensorFlow

Keras Sequential API

Data Preparation

1

Loading & Splitting

Loaded 4,601 emails with 57 features and binary spam/non-spam labels. Split the dataset into 80% training (with 10% used for validation) and 20% test sets to ensure proper model evaluation.

X: 57 numerical features (word frequencies, character frequencies)
y: Binary spam label (1 = spam, 0 = non-spam)
Train: 3,680 samples
Test: 921 samples

2

Standardisation

Applied StandardScaler to normalise features to mean=0 and standard deviation=1. This ensures consistent scaling across features for optimal neural network training.

The standardisation was applied only to the training data, and then the same transformation was applied to test data to prevent data leakage.

Why Standardise?

Ensures gradient descent converges more quickly
Prevents features with larger scales from dominating the model
Improves numerical stability during training

Workflow

1

Start: Neural Network Build

Begin the project to build a basic neural network for spam detection.

2

Import Libraries & Data

Load TensorFlow, Keras, scikit-learn, numpy, pandas, and the Spambase dataset.

3

Prepare & Split Data

Split into training and test sets, apply standardisation to normalise features.

4

Define Sequential Model

Create a Sequential model with two hidden layers (64 and 32 neurones) and an output layer.

5

Compile Model

Configure model with Adam optimiser, binary cross-entropy loss, and accuracy metric.

6

Train Model

Train for 10 epochs with batch size of 64, using validation data to monitor performance.

7

Evaluate Performance

Test the model's accuracy and loss on the held-out test dataset.

8

End Activity

Complete the neural network development with performance insights.

Model Architecture

Input Layer

1

2

...

57

Hidden Layer 1

1

2

...

64

Hidden Layer 2

1

2

...

32

Output Layer

1

Model Structure

Implemented a Sequential neural network with three layers: two hidden layers with ReLU activation and an output layer with Sigmoid activation for binary classification.

Layer Configuration:

Input Layer: 57 features (implicitly defined)
Hidden Layer 1: 64 neurones with ReLU activation
Hidden Layer 2: 32 neurones with ReLU activation
Output Layer: 1 neurone with Sigmoid activation

Total Parameters:

5,921 trainable parameters (includes weights and biases)

Model Compilation

Configured the model with appropriate loss function, optimiser, and metrics for binary classification.

Compilation Settings:

Optimiser: Adam (adaptive learning rate optimiser)
Loss Function: Binary Cross-Entropy (appropriate for binary classification)
Metrics: Accuracy (percentage of correctly classified examples)

Why Adam Optimiser?

Adam combines the benefits of two other extensions of stochastic gradient descent: AdaGrad and RMSProp, making it well-suited for a wide range of problems with noisy data.

ReLU Activation (Hidden Layers)

ReLU (Rectified Linear Unit) returns x for positive values and 0 for negative values.

Benefits: Prevents vanishing gradient problem, computationally efficient, produces sparse activations.

Sigmoid Activation (Output Layer)

Sigmoid squashes input values to range between 0 and 1, ideal for binary classification.

Benefits: Smooth, differentiable function that outputs probabilities for binary classification.

Training Process

Training Configuration

The model was trained using the following parameters to optimise performance whilst balancing computational efficiency.

10

Epochs

64

Batch Size

10%

Validation Split

Epochs

Accuracy

1

3

5

7

9

Training Accuracy

Validation Accuracy

Evaluation Insights

Performance Metrics

The model was evaluated on the 20% test set that was held out during training to assess its generalisation capability.

Test Accuracy

0.923

Test Loss

0.218

Performance Analysis:

High accuracy (92.3%) indicates strong spam detection capability
Low loss value (0.218) shows confident predictions
Close training and validation performance suggests good generalisation
Model effectively distinguishes between spam and non-spam emails

Architectural Rationale

The architectural choices were deliberately made to optimise performance for this binary classification task.

ReLU for Hidden Layers:

Prevents vanishing gradient problem during backpropagation
Introduces non-linearity to capture complex patterns
Computationally efficient compared to tanh or sigmoid
Produces sparse activations, making the model more robust

Sigmoid for Output Layer:

Squashes output to [0,1] range, ideal for binary classification
Directly interpretable as probability of being spam
Works well with binary cross-entropy loss function

Layer Sizing (64→32→1):

Progressive narrowing captures hierarchical feature abstractions
Sufficient capacity to learn email patterns without overfitting
Balanced computational efficiency and model expressiveness

Conclusion

Successfully built a TensorFlow neural network with a 64-32-1 architecture for spam detection, achieving a 92.3% accuracy on the Spambase dataset. The model effectively leverages 57 email features to distinguish between spam and legitimate emails with high confidence.

Business Implications

Benefits

Improved User Experience: Effective filtering reduces exposure to spam emails
Increased Engagement: Clean inboxes lead to higher user satisfaction and platform trust
Resource Efficiency: Automated filtering reduces need for manual moderation
Scalability: Model can handle large volumes of emails efficiently

Future Enhancements

Regularisation: Add dropout layers to further prevent overfitting
Hyperparameter Tuning: Optimise batch size, learning rate, and epochs
Additional Layers: Experiment with deeper architectures for complex patterns
Feature Engineering: Create additional features to improve discrimination

Final Assessment

This basic neural network demonstrates the power of even relatively simple deep learning architectures for classification tasks. With just two hidden layers, the model achieves high accuracy on email spam detection, providing a solid foundation for more sophisticated enhancements in future iterations.

Building a Basic Neural Network