sentiment_analysis_visualisation

Dataset Overview: SST-5

Stanford Sentiment Treebank (SST-5)

8,544

Training Samples

1,101

Validation Samples

2,210

Test Samples

5

Sentiment Classes

30,000

Vocabulary Size

200

Max Sequence Length

Project Process Flowchart

1

Environment Setup

Install required packages: datasets, tensorflow, scikit-learn, pandas, numpy. All libraries successfully installed with no compatibility issues.

2

Dataset Loading

Load SST-5 dataset from Hugging Face. Successfully retrieved 8,544 training, 1,101 validation, and 2,210 test samples with 5 sentiment classes.

3

Data Preprocessing

Create DataFrames, split train/validation (80:20), tokenise text with vocabulary size 30,000, mark OOV words, and pad sequences to length 200.

4

Model Architecture

Design four RNN architectures: Vanilla RNN, LSTM, GRU, and Bidirectional LSTM with embedding dimension 100 and hidden units 128.

5

Training Process

Train each model for 5 epochs using Adam optimiser, sparse categorical crossentropy loss, and batch size 64 with accuracy monitoring.

6

Evaluation & Analysis

Evaluate model performance on test set, compare accuracies, analyse training patterns, and identify overfitting behaviours.

Model Performance Comparison

Bidirectional LSTM

38.51%

🏆 Best Performance

Excels by processing sequences in both directions, capturing full context for sentiment analysis. Shows overfitting with rising validation loss.

Vanilla RNN

26.43%

2nd Place

Surprisingly outperforms more complex models on test set. Limited by vanishing gradients but shows less overfitting.

LSTM

23.08%

3rd Place (Tied)

Underperformed expectations despite gating mechanisms. May require more training epochs or regularisation techniques.

GRU

23.08%

3rd Place (Tied)

Identical performance to LSTM. More efficient architecture but similar learning limitations in this configuration.

Key Findings & Conclusions

🎯 Performance Hierarchy

Bidirectional LSTM significantly outperformed all other models with 38.51% accuracy, demonstrating the value of bidirectional context in sentiment classification.

📊 Surprising Results

Vanilla RNN unexpectedly achieved 26.43% accuracy, outperforming both LSTM (23.08%) and GRU (23.08%) on the test set despite theoretical limitations.

⚠️ Overfitting Patterns

Bidirectional LSTM showed clear overfitting with training accuracy reaching 85.82% while validation loss increased from 1.36 to 2.07.

🔍 Model Convergence

LSTM and GRU models plateaued early with identical final accuracies, suggesting insufficient training epochs or architectural limitations for this task.

📈 Learning Patterns

All models struggled with the 5-class sentiment classification task, with performances only modestly exceeding random baseline of 20%.

🎲 Dataset Complexity

SST-5's fine-grained sentiment classes (very negative to very positive) proved challenging, requiring more sophisticated architectures than binary sentiment tasks.

Business Implications & Strategic Recommendations

💼 Production Deployment

For immediate deployment, Bidirectional LSTM offers the best performance but requires overfitting mitigation through regularisation techniques.

⚡ Resource Optimisation

Vanilla RNN provides surprising value for resource-constrained environments, offering decent performance with minimal computational requirements.

🎯 Model Selection Strategy

Consider task complexity: use Bidirectional LSTM for nuanced sentiment analysis, simpler models for binary classification tasks.

Actionable Recommendations

Implement Regularisation: Add dropout layers (0.2-0.3) to prevent overfitting, especially for Bidirectional LSTM models in production environments.
Extended Training: Increase training epochs to 10-15 for LSTM/GRU models to allow proper convergence and improved performance.
Architecture Enhancement: Increase embedding dimensions to 300 and experiment with attention mechanisms for better semantic understanding.
Data Strategy: Investigate class distribution imbalances in SST-5 dataset and implement appropriate sampling or weighting strategies.
Hyperparameter Optimisation: Implement systematic grid search for learning rates, batch sizes, and architectural parameters to maximise performance.
Model Ensemble: Consider ensemble approaches combining Bidirectional LSTM with simpler models to balance accuracy and robustness.
Pre-trained Embeddings: Incorporate GloVe or Word2Vec embeddings instead of training from scratch to improve initial semantic understanding.
Cross-validation: Implement k-fold cross-validation to better assess model generalisation and reduce variance in performance estimates.

Technical Methodology Summary

Data Preprocessing Pipeline:

Tokenisation: Vocabulary size limited to 30,000 most frequent words
OOV Handling: Out-of-vocabulary words marked with "<OOV>" token
Sequence Padding: All sequences padded/truncated to 200 tokens using post-padding
Train/Validation Split: 80:20 ratio from original training set

Model Architecture Details:

Embedding Layer: 100-dimensional vectors for 30,000 vocabulary
RNN Layers: 128 hidden units for all model variants
Output Layer: Dense layer with softmax activation for 5-class classification
Bidirectional: Processes sequences forward and backward for enhanced context

Training Configuration:

Optimiser: Adam with default learning rate (0.001)
Loss Function: Sparse categorical crossentropy
Batch Size: 64 samples per batch
Epochs: 5 training epochs for all models
Metrics: Accuracy monitoring throughout training