Detecting the Anomalous Activity of a Ship's Engine

Detecting the Anomalous Activity of a Ship's Engine

Overview

Problem Statement: Poorly maintained ship engines increase fuel use, risks, and delays in the supply chain. This project uses a real dataset (19,535 samples) to detect anomalies in engine functionality (e.g., rpm, pressures, temperatures) to reduce downtime, enhance safety, and improve delivery efficiency.

Approach: The process involves EDA, statistical anomaly detection with IQR, ML methods (One-Class SVM, Isolation Forest), feature scaling, PCA visualisation, and parameter tuning to target 1-5% anomalies. A report summarizes findings for stakeholders.

Dataset Size
19,535
Samples
Target Anomaly Range
1-5%
Of Total Samples
Key Features
6
Engine Parameters

Project Stages

1

Data Exploration

Imported data, confirmed no missing/duplicate values, generated statistics (mean, median, 95th percentile), and visualized distributions showing right-skewed features.

2

Statistical Detection

Applied IQR to flag outliers per feature, identified samples with 2+ outliers (2.16%), and noted effectiveness for skewed data.

3

ML Detection

Scaled features, used One-Class SVM (nu=0.02, 2%) and Isolation Forest (contamination=0.05, 5%), visualized with PCA, and tuned for 1-5% anomalies.

Engine RPM Distribution with Outliers

Workflow

Start: Detect Engine Anomalies
Part I: Initial Data Exploration
Import Libraries and Data
EDA: Check Missing and Duplicates
Descriptive Statistics: Mean, Median, 95th Percentile
Visualize Data: Distribution and Extremes
Part II: Statistical Anomaly Detection
Detect Outliers with IQR
Create Binary Outlier Columns
Identify Samples with 2+ Outliers
Record Observations
Part III: ML Anomaly Detection
Scale Features
One-Class SVM: Detect Anomalies
Visualize SVM with PCA in 2D
Isolation Forest: Detect Anomalies
Visualize IF with PCA in 2D
Document Approach and Best Method
End: Generate Report & Submit

Statistical Methods

📊

IQR Detection

Flagged outliers per feature (e.g., 2,668 for Engine rpm); samples with 2+ outliers totaled 422 (2.16%), fitting the 1-5% target.

🔍

Key Observations

Effective for skewed data but may miss subtle anomalies; Engine rpm and pressures showed most outliers, suggesting key monitoring areas.

Feature Outlier Distribution

Engine RPM

2,668

Coolant Pressure

1,872

Oil Pressure

1,314

Temperature

896

ML Anomaly Detection

One-Class SVM

Tuned to nu=0.02 (2%, 392 outliers); detected anomalies in rpm and pressures; PCA showed clear separation with some overlap.

Parameter
nu=0.02
Anomalies
392
Percentage
2%

Isolation Forest

Tuned to contamination=0.05 (5%, 977 outliers); captured broader anomalies; PCA indicated dense normal clusters with dispersed outliers.

Parameter
c=0.05
Anomalies
977
Percentage
5%

Insights & Visualisations

One-Class SVM (PCA Visualisation)
Isolation Forest (PCA Visualisation)

PCA Visualisation

Reduced 6 features to 2D; SVM (nu=0.02) and IF (contamination=0.05) showed outliers in red, normals in blue; IF had broader detection.

Best Method

Isolation Forest (contamination=0.05) excelled for its speed and broader anomaly capture, ideal for real-time engine monitoring.

Conclusion

Isolation Forest (contamination=0.05, 5%) outperformed IQR (2.16%) and SVM (nu=0.02, 2%) for its efficiency and comprehensive anomaly detection across engine features (e.g., rpm, pressures). Key insights: monitor high rpm and coolant pressure for maintenance. PCA visualisations confirmed IF's effectiveness for operational use.

Recommendations:

Deploy Isolation Forest for real-time anomaly alerts in ship engine monitoring systems

Prioritize engine rpm and coolant pressure for preventative maintenance checks

Refine detection thresholds with additional operational data to enhance safety and efficiency