A comprehensive data science project for a retail consumer goods company seeking to understand customer behaviour and enhance loyalty through predictive modelling. The analysis explores relationships between loyalty and key factors including perceived product quality, brand awareness, and negative publicity impact.
Import necessary libraries (pandas, matplotlib, seaborn, scikit-learn) and load the customer loyalty dataset from the provided URL.
Explore data structure, descriptive statistics, and visualise distributions through histograms and scatter plots to understand variable relationships.
Define target variable (Loyalty) and feature variables (Quality, Brand awareness, Negative publicity). Check data suitability for regression modelling.
Create linear regression models using scikit-learn, split data into training and testing sets, and train the model to predict loyalty.
Calculate R², Adjusted R², and Residual Sum of Squares (RSS) to assess model performance and explanatory power.
Test different variable combinations to optimise model performance and understand individual feature contributions to loyalty prediction.
Perform comprehensive correlation analysis using Pearson coefficients, create correlation heatmaps, and interpret statistical significance.
| Model Configuration | Variables Included | R² | Adjusted R² | RSS | Performance |
|---|---|---|---|---|---|
| Configuration #1 | Quality + Brand Awareness + Negative Publicity | 0.628 | 0.625 | 150.40 | Best Overall |
| Configuration #3 | Quality + Negative Publicity | 0.623 | 0.621 | 152.63 | Very Good |
| Configuration #2 | Quality + Brand Awareness | 0.551 | 0.548 | 181.71 | Moderate |
| Configuration #4 | Brand Awareness + Negative Publicity | 0.211 | 0.206 | 319.49 | Poor |
| Variable Pair | Pearson Coefficient | P-Value | Relationship Strength | Statistical Significance |
|---|---|---|---|---|
| Loyalty ↔ Quality | 0.7126 | 2.591e-265 | Strong Positive | Highly Significant |
| Loyalty ↔ Negative Publicity | -0.4493 | 8.936e-86 | Moderate Negative | Highly Significant |
| Quality ↔ Negative Publicity | -0.2288 | 9.141e-22 | Weak Negative | Significant |
| Brand Awareness ↔ Loyalty | 0.18 | < 0.05 | Weak Positive | Significant |
Invest heavily in product quality improvements as this shows the strongest correlation with customer loyalty. Implement rigorous quality control processes and customer feedback systems.
Develop comprehensive crisis management and proactive communication strategies to minimise negative publicity impact. Monitor social media and review platforms continuously.
While brand awareness has limited direct impact on loyalty, combine awareness campaigns with quality messaging to maximise effectiveness and customer engagement.
Leverage the interconnection between quality and publicity by showcasing quality improvements in marketing communications to enhance both factors simultaneously.
Use the predictive model to guide product development decisions, focusing resources on quality attributes that most strongly influence customer loyalty and retention.
Regularly update the predictive model with new data to maintain accuracy and identify emerging trends in customer behaviour and loyalty drivers.