Customer Churn Modeling With Deep learning | Python

In this post, I am going to predict customer churn based on some of the previous customer preferences data collected using TensorFlow Keras API in Python language. For this purpose, we will use an open-source dataset. Before going to predict our model which is for customer churn, we need to know what is customer churn? , why we want to predict it? and what is the use of this prediction?

Customer churn is one of the most popular use cases in business, basically churn is something which consists of detecting customers who more likely to cancel the subscription to services which they are availing from a company. So why we want to predict this?

As if you will think for a while then you will come to know that a company is having a huge number of customers who are availing services subscription from the company so it is not practical to keep track of each individual customers that when they are canceling their subscription right, so for this problem, we came up with a solution and that is, we will use the data collected from the previous customer preferences and based of this past data only we will try to predict the future scenario that is If the customer will cancel the service subscription or not.

I hope you got an overview of why we are predicting the customer churn.

Customer Churn Prediction Using ANN in Python

As we got an idea of our problem and now it is time to move for the solution and for this purpose we are going create an artificial neural network and also we will take the help of TensorFlow and Keras deep learning API.

So as I have told earlier that  I am going to use an open-source dataset and the link for the dataset can be found in the link dataset.

Download the dataset and analyze it, you will find that basically, this is a classification task. So let’s move forward to load the dataset into our notebook.

If you are following this blog site then you must know that you have two approaches to follow this tutorial with me, one is to install all the dependencies into your local system or you can use google colab runtime environment.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU,PReLU,ELU
from keras.layers import Dropout

In the above section of the code, I am importing the libraries which I am going to use in my model. Look carefully at the above code snippet that I am importing a library named Keras and that is the most useful library and it is going to play an important role in our customer churn prediction model.

Hope you have downloaded the dataset now moving forward to load this dataset into our notebook.

data = pd.read_csv('/content/drive/My Drive/Internship/dataaset.csv')
data.head()

OUTPUT:

In the above code, I am importing the dataset into the notebook, and to look at the top 10 datasets, I am using the head().

I have already given you the idea that you always try to look at the dataset from the top as well as from the bottom and for looking from the bottom you can use tail() and output can be seen below.

data.tail(5)

OUTPUT:

Let’s get the information about our dataset.

data.info()

OUTPUT:

You can see in the above snippet of code that I am using info() to get the information about the dataset.

Moving forward to get the mathematical description of our dataset.

data.describe()

OUTPUT:

You can see in the above code snippet that I am extracting the mathematical description of the dataset like, count, mean, max, min, and the standard deviation values.

Now let’s plot the labels of our dataset to count the target variables.

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = 8,6
plt.bar(data['Exited'].unique(), data['Exited'].value_counts(), color = ['pink', 'green'])
plt.xticks([0, 1])
plt.xlabel('Target Classes')
plt.ylabel('Count')
plt.title('Count of each Target Class')

OUTPUT:

As you can see in the above code snippet, the count of the label of our dataset. If you have seen the count of the dataset than you can say that the given dataset is not balanced, so if you will move for the further advanced concepts and suppose you are needed to create a model where you need to balance your dataset so you can use some algorithm like NEAR-MISS Algorithm. But here I am implementing the basics tutorial for churn modeling so I am not going to do that but you can explore it further.

X = data.iloc[:, 3:13]
y = data.iloc[:, 13]

In the above block of code, I am storing the input feature into a separate variable called X a target variable into y.

Moving further to do some more preprocessing with our dataset.

geo=pd.get_dummies(X["Geography"],drop_first=True)
gender=pd.get_dummies(X['Gender'],drop_first=True)

In the above code snippet, I am getting the dummy variables of the categorical variable, As if you will see above at the dataset then you will find that dataset is having categorical columns and we must have to convert these categorical columns into their corresponding dummy variable. If you have followed my previous tutorial heart_attack_detection then you must know as I have explained there about getting the dummy variable.

Moving forward to concatenate the columns.

X=pd.concat([X,geo,gender],axis=1)
X=X.drop(['Geography','Gender'],axis=1)

In the above two snippets of code, I am concatenating the new columns into my previous dataset after getting the corresponding dummy variables. And I am also dropping the already existing same columns because we don’t need that.

Now we will split the dataset into train and test sets.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

As you can see in the above snippet of code that I am splitting the dataset into the train and test sets with the help of train_test-split() which is provided by sci-kit learn.

from sklearn.preprocessing import StandardScaler
std = StandardScaler()
X_train = std.fit_transform(X_train)
X_test = std.transform(X_test)

In the above block of code, I am transforming my dataset into standard scalar formate as I have introduced the basic concept about this from my first blog ,so you must have an idea about that. Basically, standard scalar formate is used to remove the means and used to scale each feature to unit variance.

model =Sequential()
model.add(Dense(6, activation='relu', kernel_initializer='he_uniform',input_dim=11))
model.add(Dense(6, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='sigmoid', kernel_initializer='he_uniform'))

In the above code block, I am using a sequential model as our dataset is having an input dimension of 11 so I am using the input dimension as 11 in the input layer of this dataset and I am using relu activation function in the input layer as well as in the hidden layer.

I am using the sigmoid activation function at the output layers and if you will look at the output then I am taking it as 1 because we will get only one output either 0 or 1 as an output from this model.

It’s time to compile our model.

model.compile(optimizer = 'SGD', loss = 'binary_crossentropy', metrics = ['accuracy']
history=model.fit(X_train, y_train,validation_split=0.30, batch_size = 10, epochs = 200)

OUTPUT:

You can see in the above code block that I am using the SGD optimizer and training model for 200 epochs. You can verify from the above output that our model is not getting overfitted.

You can also try this model by changing the optimizer from SGD to adam and You can see the changes.

print(history.history.keys())
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

OUTPUT:

As in the previous block of code, I have saved my model training into a variable history, Now in the above code, I am plotting the performance of my model provided by each epoch, which is saved into my history variable. You can verify my model accuracy. It is looking quite well but as always you can play with the code and make some changes and make this model more accurate.

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

OUTPUT:

As before I have plotted model accuracy now in the above block of code I am plotting my model loss, you can see the plot and as we will move further you will see the actual numerical values of loss and the accuracy.

Moving further to predict the output using our trained model.

y_pred = model.predict(X_test)
y_pred

OUTPUT:

y_test

OUTPUT:

You can see the predicted output with the help of our trained model. I have also extracted the actual output you can verify the actual and predicted output.

Let’s move to extract the confusion matrix of our model.

y_pred = (y_pred > 0.5)
from sklearn.metrics import confusion_matrix
confu = confusion_matrix(y_test, y_pred)
print(confu)

OUTPUT:

[[1501 94] 

[ 189 216]]

In the above block of code, I am getting the confusion matrix of the trained model. Basically, a confusion matrix is helpful to describe the performance of the classification model. As we are also building a classification model so it will be useful.

We can not discuss everything here because our main goal is to introduce you, how we can make artificial neural networks to train our model for the classification tasks. So you have a small task which I am assigning to you go and search about confusion matrix and have a detailed understanding of it. You are going to enjoy the concept.

from sklearn.metrics import accuracy_score
y_pred = (y_pred > 0.5)
accu=accuracy_score(y_test, y_pred, normalize=False)
print(accu)

OUTPUT:

1717

In the above code, I am taking the predicted value which is greater than 0.5 and  I am calculating the accuracy score with the help of accuracy_score(). You can see the printed output in the output block.

from sklearn.metrics import f1_score
f1_score(y_test, y_pred)

OUTPUT:

0.6041958041958042

In the above snippet of code, I am calculating the f1_score of the trained model. Basically, f1_score is used to calculate the accuracy.

You must have a question that I have already calculated the accuracy with help of confusion matrix and accuracy_score() then why f1_score() right?

I have the answer for this, actually if you remember in the above line I have told you that our dataset is imbalanced, and also if you will look to the confusion matrix than you will find that here false positive and false negative are crucial so overall I can say that if the dataset is imbalanced and false positive and false negative is more crucial, f1-score will be better matric to calculate the accuracy.

I hope you get my point.

from sklearn.metrics import precision_score, recall_score
precision_score(y_test, y_pred)

OUTPUT:

0.6967741935483871
recall_score(y_test, y_pred)

OUTPUT:

0.5333333333333333

In the above code block, I am calculating two scores from my trained model that is precision_score() and recall-score(). the precision score is used when the cost of false positive is high and recall_score is used when the cost of false negative is high. You can see the output but as I have introduced above that overall accuracy can be calculated using f1_score in such cases.

Hope you enjoyed the tutorial please put your suggestions and doubts in the comment box, your feedback means a lot to us.

Thanks for your valuable time.

CONCLUSIONS:

In this tutorial, you have encountered too many concepts that are new to you and I have tried my best to explain to you as things can be explained here. You can always explore the concept and further use in your future model which you will build because as things go on, small concepts become very complicated so even reading small concepts can be very useful.

As we have built a classification model using an artificial neural network and finally we will be able to predict customer churn based on the dataset provided. You can also do some modifications and achieve the model’s performance according to your need.

You can try to build another model with the help of a deep neural network adding some of the denser layers using the same dataset.

Here is a link to the code zip file.

Leave a Reply

Your email address will not be published.