The Online Retail dataset consists of the following columns:
- InvoiceNo: Unique identifier for each invoice.
- StockCode: Unique identifier for each product.
- Description: A description of the product.
- Quantity: The number of units purchased.
- InvoiceDate: The date when the purchase was made.
- UnitPrice: The price per unit of the product.
- CustomerID: Unique identifier for each customer.
- Country: The country where the customer made the purchase.
-
Exploratory Data Analysis (EDA):
- The first step was to analyze the dataset using EDA. This helped me understand the structure of the data, check for missing values, and identify any outliers or patterns.
- I found that there were some missing CustomerID values, which I handled by either removing those rows or imputing values where necessary.
-
Feature Engineering using RFM:
- The main objective of the project was to identify customer segments and suggest marketing strategies. To achieve this, I used the RFM (Recency, Frequency, Monetary) technique, which helps in segmenting customers based on their purchase behavior:
- Recency (R): How recently a customer made a purchase.
- Frequency (F): How often a customer makes a purchase.
- Monetary (M): How much money a customer spends.
- The main objective of the project was to identify customer segments and suggest marketing strategies. To achieve this, I used the RFM (Recency, Frequency, Monetary) technique, which helps in segmenting customers based on their purchase behavior:
-
Clustering with DBSCAN:
- I initially applied DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to identify clusters. Using this method, I found 57 clusters, but this was too many for my goal, which was to have only 6 clusters.
- Although the Silhouette Score for DBSCAN was 99%, I decided to switch to a more controlled clustering technique that would give me exactly 6 clusters.
-
Clustering with KMeans:
- I used KMeans clustering to reduce the number of clusters to 6. After applying KMeans, the Silhouette Score dropped to 48, which was acceptable for this type of analysis, indicating a decent clustering result.
-
Cluster Profiling:
- For each of the 6 clusters, I calculated the average Monetary, Recency, and Frequency values. This helped me understand the characteristics of each cluster, such as whether they consisted of high-spending or frequent buyers, or if they were new or less engaged customers.
-
Marketing Strategy:
- Based on the RFM values and the characteristics of each cluster, I devised personalized marketing strategies to suggest to each group. For example:
- For clusters with high Monetary and Frequency, I suggested loyalty programs or special offers to reward frequent shoppers.
- For customers with low Recency but high Monetary, I recommended re-engagement campaigns to encourage them to make another purchase.
- I stored these strategies in a DataFrame called marketing_strategy.
- Based on the RFM values and the characteristics of each cluster, I devised personalized marketing strategies to suggest to each group. For example:
-
Visualization:
- I visualized the number of customers in each cluster using a bar chart, which showed the distribution of customers across the 6 clusters.
- I also created a mean RFM chart for each cluster to visualize how each cluster performed on the Recency, Frequency, and Monetary metrics.
- Finally, I created a pie chart to show the percentage of customers in each cluster, providing a clear overview of the distribution of customers across the segments.
In this project, I successfully performed customer segmentation using the RFM model and applied clustering algorithms such as DBSCAN and KMeans to identify 6 customer segments. Based on the characteristics of each cluster, I developed personalized marketing strategies for each group. The visualizations helped provide a clear picture of the customer distribution and their purchasing behavior. The final marketing strategies were stored in a DataFrame and ready for implementation.
- Data Cleaning and Preprocessing: Handling missing values, outliers, and transforming the dataset for clustering.
- Feature Engineering: Using the RFM model to create relevant features (Recency, Frequency, Monetary) for customer segmentation.
- Clustering Algorithms: Applying DBSCAN and KMeans to perform customer segmentation.
- Cluster Profiling: Analyzing and interpreting the characteristics of each cluster based on RFM values.
- Marketing Strategy Development: Creating targeted marketing strategies based on customer segmentation.
- Data Visualization: Using bar charts, pie charts, and other visualizations to communicate cluster characteristics and customer distribution.
This project showcases a strong understanding of customer segmentation, clustering techniques, and marketing strategy formulation, all of which are essential for businesses looking to improve customer engagement and retention.