Customer Segmentation using K means Clustering in Machine Learning with Python

This tutorial will discuss customer segmentation using the k means clustering algorithm with the help of step by step guide in Python programming.

K means the clustering Algorithm is an unsupervised machine learning Algorithm.

You can check more details about the means of clustering.

Customer segmentation means clustering the customers into different groups so one group of customers may represent those that tend to purchase more. In that mall and some other group may represent that don’t purchase that much in a mall. so having these groups of customers tell the mall to make better business decisions to make better marketing strategies.

You can download the data set form here Mall_Customers.

So let’s try to implement the code

Import the necessary Python libraries.

#Import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans

The first step in any machine-learning algorithm is,

Data Collection

1. Load data

#loading the data from csv file to a pandas DataFrame
customer_data=pd.read_csv("C:\\Users\\users drive\\OneDrive\\Desktop\\Mall_Customers.csv")

To check whether the data set is loaded or not print the first five rows by using the head method.

#printing first five rows of the dataframe
customer_data.head()

OUTPUT:-

Now print the last five rows by using the tail method.

#printing last five rows of the dataframe
customer_data.tail()

OUTPUT:-

Remember this point whenever u are working on machine learning problems that need data exploration.

To check the dataset size.

#Finding the number of rows nad columns
customer_data.shape

OUTPUT:-

(200, 5)

Now let’s see details about the dataset.

#Getting some information about the dataset
customer_data.info()

OUTPUT:-

Let’s check the missing values in the dataset.

#Checking the missing values
customer_data.isnull().sum()

OUTPUT:-

In the given dataset the customer id is useless because we are depending on the score,

The group of customers will be classified based on their annual income and spending score.

Choose the annual income column and spending score column.

X=customer_data.iloc[:,[3,4]].values 
#The columns 3 and 4 represent annual income and spending score
print(X) #In the array first value represnts annual income and second value represents spending score

OUTPUT:-

DATA VISUALIZATION

Choosing the number of clusters.

By using wccs method find the different numbers of the clusters.

Finding wccs values for different numbers of clusters.

wccs=[]
for i in range(1,11):
    kmeans=KMeans(n_clusters=i,init='k-means++',random_state=42)
    kmeans.fit(X)
    
    wccs.append(kmeans.inertia_)

Let’s try the visualization with the elbow graph.

#Plot and elbow graph 
sns.set()
plt.plot(range(1,11),wccs)
plt.title("The Elbow point Graph")
plt.xlabel('Number of clusters')
plt.ylabel('WCCS')
plt.show()

OUTPUT:-

It is also called a cutoff point graph.

an optimum number of clusters is 5.

#Training the k means clustering model
kmeans=KMeans(n_clusters=5,init='k-means++',random_state=0)

#Return a label for each data point based on their cluster
Y=kmeans.fit_predict(X)
print(Y)

OUTPUT:-

Visualizing all the clusters.

Plotting all the clusters and their centroids 5 clusters=0,1,2,3,4.

plt.figure(figsize=(8,8))
plt.scatter(X[Y==0,0],X[Y==0,1],s=50,c='violet',label='Cluster 1')
plt.scatter(X[Y==1,0],X[Y==1,1],s=50,c='green',label='Cluster 2')
plt.scatter(X[Y==2,0],X[Y==2,1],s=50,c='red',label='Cluster 3')
plt.scatter(X[Y==3,0],X[Y==3,1],s=50,c='black',label='Cluster 4')
plt.scatter(X[Y==4,0],X[Y==4,1],s=50,c='orange',label='Cluster 5')

#Plot the centroids 
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=100,c='cyan',label='centroids')

plt.title('Customer Groups')
plt.xlabel('Annual Income')
plt.ylabel('Spending Score')
plt.show()

OUTPUT:-

Considering the above graph there are 5 different clusters with their own centroids you can also consider there are 5 different groups consider one group the customers buying things frequently and the second group visiting only once.

This is how the malls improve their marketing and give great discounts to the group of customers.

In the above example, there are some groups of customers where the customers are buying and leaving.

This method is also called the Market Basket strategy.

Leave a Reply

Your email address will not be published. Required fields are marked *