Building a Basic Neural Network

Building a Basic Neural Network

Overview

Problem Statement: A company needs a TensorFlow neural network to classify 4,601 emails from the Spambase dataset as spam or non-spam using 57 features.

Approach: I preprocessed the data, built a sequential model with two hidden layers, trained it with Adam optimizer, and evaluated its performance for spam detection.

Project Overview

4,601
Email Samples
Binary classification task
57
Features
Word and character frequencies
3
Layers
2 hidden layers with ReLU
TF
TensorFlow
Keras Sequential API

Data Preparation

1

Loading & Splitting

Loaded 4,601 emails with 57 features and binary spam/non-spam labels. Split the dataset into 80% training (with 10% used for validation) and 20% test sets to ensure proper model evaluation.

  • X: 57 numerical features (word frequencies, character frequencies)
  • y: Binary spam label (1 = spam, 0 = non-spam)
  • Train: 3,680 samples
  • Test: 921 samples
2

Standardization

Applied StandardScaler to normalize features to mean=0 and standard deviation=1. This ensures consistent scaling across features for optimal neural network training.

The standardization was applied only to the training data, and then the same transformation was applied to test data to prevent data leakage.

Why Standardize?
  • Ensures gradient descent converges more quickly
  • Prevents features with larger scales from dominating the model
  • Improves numerical stability during training

Workflow

1

Start: Neural Network Build

Begin the project to build a basic neural network for spam detection.

2

Import Libraries & Data

Load TensorFlow, Keras, scikit-learn, numpy, pandas, and the Spambase dataset.

3

Prepare & Split Data

Split into training and test sets, apply standardization to normalize features.

4

Define Sequential Model

Create a Sequential model with two hidden layers (64 and 32 neurons) and an output layer.

5

Compile Model

Configure model with Adam optimizer, binary cross-entropy loss, and accuracy metric.

6

Train Model

Train for 10 epochs with batch size of 64, using validation data to monitor performance.

7

Evaluate Performance

Test the model's accuracy and loss on the held-out test dataset.

8

End Activity

Complete the neural network development with performance insights.

Model Architecture

Input Layer
1
2
...
57
Hidden Layer 1
1
2
...
64
Hidden Layer 2
1
2
...
32
Output Layer
1
Model Structure

Implemented a Sequential neural network with three layers: two hidden layers with ReLU activation and an output layer with Sigmoid activation for binary classification.

Layer Configuration:
  • Input Layer: 57 features (implicitly defined)
  • Hidden Layer 1: 64 neurons with ReLU activation
  • Hidden Layer 2: 32 neurons with ReLU activation
  • Output Layer: 1 neuron with Sigmoid activation
Total Parameters:

5,921 trainable parameters (includes weights and biases)

Model Compilation

Configured the model with appropriate loss function, optimizer, and metrics for binary classification.

Compilation Settings:
  • Optimizer: Adam (adaptive learning rate optimizer)
  • Loss Function: Binary Cross-Entropy (appropriate for binary classification)
  • Metrics: Accuracy (percentage of correctly classified examples)
Why Adam Optimizer?

Adam combines the benefits of two other extensions of stochastic gradient descent: AdaGrad and RMSProp, making it well-suited for a wide range of problems with noisy data.

ReLU Activation (Hidden Layers)

ReLU (Rectified Linear Unit) returns x for positive values and 0 for negative values.

Benefits: Prevents vanishing gradient problem, computationally efficient, produces sparse activations.

Sigmoid Activation (Output Layer)

Sigmoid squashes input values to range between 0 and 1, ideal for binary classification.

Benefits: Smooth, differentiable function that outputs probabilities for binary classification.

Training Process

Training Configuration

The model was trained using the following parameters to optimize performance while balancing computational efficiency.

10
Epochs
64
Batch Size
10%
Validation Split
Epochs
Accuracy
1
3
5
7
9
Training Accuracy
Validation Accuracy

Evaluation Insights

Performance Metrics

The model was evaluated on the 20% test set that was held out during training to assess its generalization capability.

Test Accuracy
0.923
Test Loss
0.218
Performance Analysis:
  • High accuracy (92.3%) indicates strong spam detection capability
  • Low loss value (0.218) shows confident predictions
  • Close training and validation performance suggests good generalization
  • Model effectively distinguishes between spam and non-spam emails

Architectural Rationale

The architectural choices were deliberately made to optimize performance for this binary classification task.

ReLU for Hidden Layers:
  • Prevents vanishing gradient problem during backpropagation
  • Introduces non-linearity to capture complex patterns
  • Computationally efficient compared to tanh or sigmoid
  • Produces sparse activations, making the model more robust
Sigmoid for Output Layer:
  • Squashes output to [0,1] range, ideal for binary classification
  • Directly interpretable as probability of being spam
  • Works well with binary cross-entropy loss function
Layer Sizing (64→32→1):
  • Progressive narrowing captures hierarchical feature abstractions
  • Sufficient capacity to learn email patterns without overfitting
  • Balanced computational efficiency and model expressiveness

Conclusion

Successfully built a TensorFlow neural network with a 64-32-1 architecture for spam detection, achieving a 92.3% accuracy on the Spambase dataset. The model effectively leverages 57 email features to distinguish between spam and legitimate emails with high confidence.

Business Implications

Benefits

  • Improved User Experience: Effective filtering reduces exposure to spam emails
  • Increased Engagement: Clean inboxes lead to higher user satisfaction and platform trust
  • Resource Efficiency: Automated filtering reduces need for manual moderation
  • Scalability: Model can handle large volumes of emails efficiently

Future Enhancements

  • Regularization: Add dropout layers to further prevent overfitting
  • Hyperparameter Tuning: Optimize batch size, learning rate, and epochs
  • Additional Layers: Experiment with deeper architectures for complex patterns
  • Feature Engineering: Create additional features to improve discrimination

Final Assessment

This basic neural network demonstrates the power of even relatively simple deep learning architectures for classification tasks. With just two hidden layers, the model achieves high accuracy on email spam detection, providing a solid foundation for more sophisticated enhancements in future iterations.