Customer Segmentation using K means Clustering in Machine Learning with Python
This tutorial will discuss customer segmentation using the k means clustering algorithm with the help of step by step guide in Python programming.
K means the clustering Algorithm is an unsupervised machine learning Algorithm.
You can check more details about the means of clustering.
Customer segmentation means clustering the customers into different groups so one group of customers may represent those that tend to purchase more. In that mall and some other group may represent that don’t purchase that much in a mall. so having these groups of customers tell the mall to make better business decisions to make better marketing strategies.
You can download the data set form here Mall_Customers.
So let’s try to implement the code
Import the necessary Python libraries.
#Import the necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.cluster import KMeans
The first step in any machine-learning algorithm is,
1. Load data
#loading the data from csv file to a pandas DataFrame customer_data=pd.read_csv("C:\\Users\\users drive\\OneDrive\\Desktop\\Mall_Customers.csv")
To check whether the data set is loaded or not print the first five rows by using the head method.
#printing first five rows of the dataframe customer_data.head()
Now print the last five rows by using the tail method.
#printing last five rows of the dataframe customer_data.tail()
Remember this point whenever u are working on machine learning problems that need data exploration.
To check the dataset size.
#Finding the number of rows nad columns customer_data.shape
Now let’s see details about the dataset.
#Getting some information about the dataset customer_data.info()
Let’s check the missing values in the dataset.
#Checking the missing values customer_data.isnull().sum()
In the given dataset the customer id is useless because we are depending on the score,
The group of customers will be classified based on their annual income and spending score.
Choose the annual income column and spending score column.
X=customer_data.iloc[:,[3,4]].values #The columns 3 and 4 represent annual income and spending score
print(X) #In the array first value represnts annual income and second value represents spending score
Choosing the number of clusters.
By using wccs method find the different numbers of the clusters.
Finding wccs values for different numbers of clusters.
wccs= for i in range(1,11): kmeans=KMeans(n_clusters=i,init='k-means++',random_state=42) kmeans.fit(X) wccs.append(kmeans.inertia_)
Let’s try the visualization with the elbow graph.
#Plot and elbow graph sns.set() plt.plot(range(1,11),wccs) plt.title("The Elbow point Graph") plt.xlabel('Number of clusters') plt.ylabel('WCCS') plt.show()
It is also called a cutoff point graph.
an optimum number of clusters is 5.
#Training the k means clustering model kmeans=KMeans(n_clusters=5,init='k-means++',random_state=0) #Return a label for each data point based on their cluster Y=kmeans.fit_predict(X)
Visualizing all the clusters.
Plotting all the clusters and their centroids 5 clusters=0,1,2,3,4.
plt.figure(figsize=(8,8)) plt.scatter(X[Y==0,0],X[Y==0,1],s=50,c='violet',label='Cluster 1') plt.scatter(X[Y==1,0],X[Y==1,1],s=50,c='green',label='Cluster 2') plt.scatter(X[Y==2,0],X[Y==2,1],s=50,c='red',label='Cluster 3') plt.scatter(X[Y==3,0],X[Y==3,1],s=50,c='black',label='Cluster 4') plt.scatter(X[Y==4,0],X[Y==4,1],s=50,c='orange',label='Cluster 5') #Plot the centroids plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=100,c='cyan',label='centroids') plt.title('Customer Groups') plt.xlabel('Annual Income') plt.ylabel('Spending Score') plt.show()
Considering the above graph there are 5 different clusters with their own centroids you can also consider there are 5 different groups consider one group the customers buying things frequently and the second group visiting only once.
This is how the malls improve their marketing and give great discounts to the group of customers.
In the above example, there are some groups of customers where the customers are buying and leaving.
This method is also called the Market Basket strategy.