Comparing Optimisers for Neural Networks

Comparing Optimisers for Neural Networks

Overview

Problem Statement: Predict Titanic survivors using a neural network, comparing Adam and RMSprop optimisers to determine the most effective approach.

Approach: Preprocessed Titanic data, built a simple neural network, trained with both optimisers, and evaluated performance to assess their effectiveness.

Optimiser Comparison Overview

Adam
Adaptive Moment Estimation
Adaptive learning rates
VS
RMSprop
Root Mean Square Prop
Adapts based on gradients

Data Preprocessing

1

Loading & Cleaning

Loaded 891 records, dropped 'PassengerId', 'Name', 'Ticket', 'Cabin'; filled missing 'Age' and 'Embarked' with mode.

2

Encoding & Splitting

One-hot encoded 'Sex', 'Embarked'; split into 80% train (10% validation), 20% test with stratification.

3

Standardisation

Scaled features with StandardScaler post-split to prevent data leakage.

Data Processing Pipeline

Raw Data

891 passenger records

  • ✅ Survived (target)
  • ✅ Pclass
  • ❌ Name (dropped)
  • ✅ Sex (to encode)
  • ✅ Age (missing values)
  • ✅ SibSp, Parch
  • ❌ Ticket (dropped)
  • ✅ Fare
  • ❌ Cabin (dropped)
  • ✅ Embarked (to encode)
Processed Data

Clean, encoded features

  • ✅ Survived (target)
  • ✅ Pclass
  • ✅ Sex_male, Sex_female
  • ✅ Age (filled)
  • ✅ SibSp, Parch
  • ✅ Fare
  • ✅ Embarked_C, Q, S
Data Splits
Training (70%)
Validation (10%)
Test (20%)

Workflow

1

Start: Optimiser Comparison

Initialise project to compare optimiser effectiveness for Titanic survival prediction.

2

Import Libraries & Data

Load TensorFlow, Keras, scikit-learn, pandas, and the Titanic dataset.

3

Preprocess Data

Clean data, handle missing values, encode categorical features, split and scale data.

4

Define Model Function

Create a neural network with 64-32-1 architecture, using ReLU for hidden layers and Sigmoid for output.

5

Train with Adam

Compile and train the model using Adam optimiser with binary cross-entropy loss.

6

Train with RMSprop

Compile and train an identical model architecture using RMSprop optimiser.

7

Evaluate Models

Compare loss, accuracy, and training behaviour between the two optimisation approaches.

8

End Activity

Draw conclusions about optimiser performance and make recommendations.

Model Building

Input Layer
1
2
...
7
Hidden Layer 1
1
2
...
64
Hidden Layer 2
1
2
...
32
Output Layer
1

Adam Optimiser

Adaptive Moment Estimation combines the advantages of AdaGrad and RMSProp by storing both moving averages of past gradients and squared gradients.

Learning Rate
0.001 (default)
Beta 1
0.9 (momentum decay)
Beta 2
0.999 (scaling decay)
Epsilon
1e-07 (numerical stability)

RMSprop Optimiser

Root Mean Square Propagation maintains a moving average of the squared gradient for each weight, dividing the gradient by the square root of this average.

Learning Rate
0.001 (default)
Rho
0.9 (decay rate)
Momentum
0.0 (default, no momentum)
Epsilon
1e-07 (numerical stability)
Model Configuration
  • Architecture: 7 features → 64 units (ReLU) → 32 units (ReLU) → 1 unit (Sigmoid)
  • Loss Function: Binary Cross-Entropy
  • Metrics: Accuracy
  • Batch Size: 32
  • Epochs: 10
  • Validation: 10% of training data

Training & Evaluation

Adam Training

Trained for 10 epochs with batch size of 32. Validation split of 10% used to monitor performance and prevent overfitting.

0.85
Accuracy
0.41
Loss

RMSprop Training

Identical training settings to Adam. Monitored validation loss and accuracy over 10 epochs for comparison.

0.82
Accuracy
0.45
Loss

Training Progress

Loss Over Epochs
0
5
10
1.0
0.5
0.0
Epochs
Loss
Accuracy Over Epochs
0
5
10
1.0
0.5
0.0
Epochs
Accuracy
Adam
RMSprop

Performance Analysis

Loss Trends

Adam converged faster initially, showing a steeper loss reduction in the first few epochs. However, it showed signs of slight overfitting in later epochs with validation loss increasing whilst training loss continued to decrease.

RMSprop demonstrated more stable validation loss throughout training, suggesting better generalisation potential despite slightly higher overall loss values.

Accuracy Comparison

Both optimisers reached accuracy levels between 0.80-0.85 on the test set. Adam achieved slightly higher peak accuracy (0.85 vs 0.82), but RMSprop showed more consistent performance across validation and test sets.

Adam's faster convergence makes it suitable for quicker training, whilst RMSprop's stability suggests better performance for production deployment.

Metric Adam RMSprop Notes
Final Test Accuracy 0.85 0.82 Adam slightly higher
Final Test Loss 0.41 0.45 Lower is better
Convergence Speed Fast Medium Adam reached target accuracy earlier
Validation Stability Medium High RMSprop showed less overfitting
Training Time 1.2x 1.0x Relative computation time

Conclusion

Adam and RMSprop performed similarly in predicting Titanic survivors, both achieving accuracy levels between 0.80-0.85. Adam converged faster and achieved slightly higher peak accuracy (0.85), but showed signs of overfitting in later epochs. RMSprop offered better validation stability, suggesting superior generalisation potential despite slightly lower overall accuracy (0.82).

Optimiser Recommendations

When to Choose Adam

Select Adam when training speed is a priority and when implementing regularisation techniques like early stopping to prevent overfitting.

When to Choose RMSprop

Prefer RMSprop for production models where generalisation and stability are more important than raw performance metrics.

Key Takeaways

For the Titanic dataset, the differences between optimisers were modest. This suggests that for small-to-medium datasets with binary classification tasks, both Adam and RMSprop are viable choices. The final selection should consider specific project requirements regarding training speed, model stability, and deployment context.

Future work could explore combining the advantages of both optimisers with techniques like learning rate scheduling or hybrid approaches in more complex neural network architectures.