Sentiment Analysis & Text Classification Project

Sentiment Analysis & Text Classification

Comprehensive RNN Model Comparison on SST-5 Dataset

Dataset Overview: SST-5

Stanford Sentiment Treebank (SST-5)

8,544
Training Samples
1,101
Validation Samples
2,210
Test Samples
5
Sentiment Classes
30,000
Vocabulary Size
200
Max Sequence Length

Project Process Flowchart

1
Environment Setup
Install required packages: datasets, tensorflow, scikit-learn, pandas, numpy. All libraries successfully installed with no compatibility issues.
2
Dataset Loading
Load SST-5 dataset from Hugging Face. Successfully retrieved 8,544 training, 1,101 validation, and 2,210 test samples with 5 sentiment classes.
3
Data Preprocessing
Create DataFrames, split train/validation (80:20), tokenise text with vocabulary size 30,000, mark OOV words, and pad sequences to length 200.
4
Model Architecture
Design four RNN architectures: Vanilla RNN, LSTM, GRU, and Bidirectional LSTM with embedding dimension 100 and hidden units 128.
5
Training Process
Train each model for 5 epochs using Adam optimiser, sparse categorical crossentropy loss, and batch size 64 with accuracy monitoring.
6
Evaluation & Analysis
Evaluate model performance on test set, compare accuracies, analyse training patterns, and identify overfitting behaviours.

Model Performance Comparison

Bidirectional LSTM
38.51%
🏆 Best Performance

Excels by processing sequences in both directions, capturing full context for sentiment analysis. Shows overfitting with rising validation loss.

Vanilla RNN
26.43%
2nd Place

Surprisingly outperforms more complex models on test set. Limited by vanishing gradients but shows less overfitting.

LSTM
23.08%
3rd Place (Tied)

Underperformed expectations despite gating mechanisms. May require more training epochs or regularisation techniques.

GRU
23.08%
3rd Place (Tied)

Identical performance to LSTM. More efficient architecture but similar learning limitations in this configuration.

Key Findings & Conclusions

🎯 Performance Hierarchy

Bidirectional LSTM significantly outperformed all other models with 38.51% accuracy, demonstrating the value of bidirectional context in sentiment classification.

📊 Surprising Results

Vanilla RNN unexpectedly achieved 26.43% accuracy, outperforming both LSTM (23.08%) and GRU (23.08%) on the test set despite theoretical limitations.

⚠️ Overfitting Patterns

Bidirectional LSTM showed clear overfitting with training accuracy reaching 85.82% while validation loss increased from 1.36 to 2.07.

🔍 Model Convergence

LSTM and GRU models plateaued early with identical final accuracies, suggesting insufficient training epochs or architectural limitations for this task.

📈 Learning Patterns

All models struggled with the 5-class sentiment classification task, with performances only modestly exceeding random baseline of 20%.

🎲 Dataset Complexity

SST-5's fine-grained sentiment classes (very negative to very positive) proved challenging, requiring more sophisticated architectures than binary sentiment tasks.

Business Implications & Strategic Recommendations

💼 Production Deployment

For immediate deployment, Bidirectional LSTM offers the best performance but requires overfitting mitigation through regularisation techniques.

⚡ Resource Optimisation

Vanilla RNN provides surprising value for resource-constrained environments, offering decent performance with minimal computational requirements.

🎯 Model Selection Strategy

Consider task complexity: use Bidirectional LSTM for nuanced sentiment analysis, simpler models for binary classification tasks.

Actionable Recommendations

  • Implement Regularisation: Add dropout layers (0.2-0.3) to prevent overfitting, especially for Bidirectional LSTM models in production environments.
  • Extended Training: Increase training epochs to 10-15 for LSTM/GRU models to allow proper convergence and improved performance.
  • Architecture Enhancement: Increase embedding dimensions to 300 and experiment with attention mechanisms for better semantic understanding.
  • Data Strategy: Investigate class distribution imbalances in SST-5 dataset and implement appropriate sampling or weighting strategies.
  • Hyperparameter Optimisation: Implement systematic grid search for learning rates, batch sizes, and architectural parameters to maximise performance.
  • Model Ensemble: Consider ensemble approaches combining Bidirectional LSTM with simpler models to balance accuracy and robustness.
  • Pre-trained Embeddings: Incorporate GloVe or Word2Vec embeddings instead of training from scratch to improve initial semantic understanding.
  • Cross-validation: Implement k-fold cross-validation to better assess model generalisation and reduce variance in performance estimates.

Technical Methodology Summary

Data Preprocessing Pipeline:

  • Tokenisation: Vocabulary size limited to 30,000 most frequent words
  • OOV Handling: Out-of-vocabulary words marked with "<OOV>" token
  • Sequence Padding: All sequences padded/truncated to 200 tokens using post-padding
  • Train/Validation Split: 80:20 ratio from original training set

Model Architecture Details:

  • Embedding Layer: 100-dimensional vectors for 30,000 vocabulary
  • RNN Layers: 128 hidden units for all model variants
  • Output Layer: Dense layer with softmax activation for 5-class classification
  • Bidirectional: Processes sequences forward and backward for enhanced context

Training Configuration:

  • Optimiser: Adam with default learning rate (0.001)
  • Loss Function: Sparse categorical crossentropy
  • Batch Size: 64 samples per batch
  • Epochs: 5 training epochs for all models
  • Metrics: Accuracy monitoring throughout training