Customer Loyalty Analysis - Interactive Visual Guide

Customer Loyalty Analysis

Predictive Modelling for Enhanced Customer Engagement

Project Scenario

A comprehensive data science project for a retail consumer goods company seeking to understand customer behaviour and enhance loyalty through predictive modelling. The analysis explores relationships between loyalty and key factors including perceived product quality, brand awareness, and negative publicity impact.

Complete Analysis Process

1

Data Import & Setup

Import necessary libraries (pandas, matplotlib, seaborn, scikit-learn) and load the customer loyalty dataset from the provided URL.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
2

Data Exploration

Explore data structure, descriptive statistics, and visualise distributions through histograms and scatter plots to understand variable relationships.

data.describe(include="all")
plt.hist(data['Quality'], bins=10)
plt.scatter(data['Quality'], data['Loyalty'])
3

Feature Selection

Define target variable (Loyalty) and feature variables (Quality, Brand awareness, Negative publicity). Check data suitability for regression modelling.

y = data['Loyalty']
X = data[['Quality', 'Brand awareness', 'Negative publicity']]
4

Model Development

Create linear regression models using scikit-learn, split data into training and testing sets, and train the model to predict loyalty.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
5

Model Evaluation

Calculate R², Adjusted R², and Residual Sum of Squares (RSS) to assess model performance and explanatory power.

r2 = r2_score(y_test, y_pred)
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
rss = ((y_test - y_pred)**2).sum()
6

Model Experimentation

Test different variable combinations to optimise model performance and understand individual feature contributions to loyalty prediction.

# Test different combinations:
X1 = data[['Quality', 'Brand awareness']]
X2 = data[['Quality', 'Negative publicity']]
X3 = data[['Brand awareness', 'Negative publicity']]
7

Correlation Analysis

Perform comprehensive correlation analysis using Pearson coefficients, create correlation heatmaps, and interpret statistical significance.

corr_matrix = data.corr(method='pearson')
sns.heatmap(corr_matrix, annot=True)
pearson_corr, p_value = pearsonr(data['Loyalty'], data['Quality'])

Model Configuration Results

Model Configuration Variables Included Adjusted R² RSS Performance
Configuration #1 Quality + Brand Awareness + Negative Publicity 0.628 0.625 150.40 Best Overall
Configuration #3 Quality + Negative Publicity 0.623 0.621 152.63 Very Good
Configuration #2 Quality + Brand Awareness 0.551 0.548 181.71 Moderate
Configuration #4 Brand Awareness + Negative Publicity 0.211 0.206 319.49 Poor

Correlation Analysis Results

Variable Pair Pearson Coefficient P-Value Relationship Strength Statistical Significance
Loyalty ↔ Quality 0.7126 2.591e-265 Strong Positive Highly Significant
Loyalty ↔ Negative Publicity -0.4493 8.936e-86 Moderate Negative Highly Significant
Quality ↔ Negative Publicity -0.2288 9.141e-22 Weak Negative Significant
Brand Awareness ↔ Loyalty 0.18 < 0.05 Weak Positive Significant

Key Findings & Conclusions

  1. Quality is the Primary Loyalty Driver
    Strong positive correlation (r = 0.7126) between product quality and customer loyalty, indicating that quality improvements directly enhance customer loyalty.
  2. Negative Publicity Significantly Impacts Loyalty
    Moderate negative correlation (r = -0.4493) shows that negative publicity substantially reduces customer loyalty, requiring proactive reputation management.
  3. Brand Awareness Has Limited Direct Impact
    Weak positive correlation (r = 0.18) suggests brand awareness alone is insufficient for loyalty; must be combined with quality improvements.
  4. Quality and Publicity Are Interconnected
    Weak negative correlation (r = -0.2288) between quality and negative publicity suggests that higher quality products generate less negative publicity.
  5. Model Performance Validates Quality Focus
    Best model performance (R² = 0.628) achieved with all variables, but Quality + Negative Publicity combination (R² = 0.623) performs nearly as well, confirming their importance.

Business Implications & Strategic Recommendations

HIGH PRIORITY

Prioritise Product Quality Enhancement

Invest heavily in product quality improvements as this shows the strongest correlation with customer loyalty. Implement rigorous quality control processes and customer feedback systems.

HIGH PRIORITY

Proactive Reputation Management

Develop comprehensive crisis management and proactive communication strategies to minimise negative publicity impact. Monitor social media and review platforms continuously.

MEDIUM PRIORITY

Strategic Brand Awareness Campaigns

While brand awareness has limited direct impact on loyalty, combine awareness campaigns with quality messaging to maximise effectiveness and customer engagement.

ONGOING

Integrated Quality-Publicity Strategy

Leverage the interconnection between quality and publicity by showcasing quality improvements in marketing communications to enhance both factors simultaneously.

STRATEGIC

Customer-Centric Product Development

Use the predictive model to guide product development decisions, focusing resources on quality attributes that most strongly influence customer loyalty and retention.

OPERATIONAL

Continuous Model Monitoring

Regularly update the predictive model with new data to maintain accuracy and identify emerging trends in customer behaviour and loyalty drivers.

Strategic Priority Ranking for Loyalty Enhancement:

1. Product Quality → 2. Reputation Management → 3. Brand Awareness