Linear Regression Using Keras | Simplified

Hello Everyone, in this post we will take a look at what is Linear Regression and how can it be implemented in Keras deep learning API of the TensorFlow module. Let us begin with understanding the term Simple Linear Regression. Also known as Basic Regression.

What is Basic Regression?

Linear Regression is a Supervised Machine Learning Algorithm. It provides us with a model that represents a relationship between the dependent (y) and independent variables (x) expressed in a straight line. Hence the name Linear Regression.

Here, for a problem if we have just one independent variable, say ‘x’, then it is said to be simple linear regression. Whereas if there are more than one independent variables like ‘x1, x2, x3,….. xn’ then we call it a multiple linear regression. A regression problem is used to output a price or a probability.

The mathematical representation for linear regression is given as:

Y = β0 + β1X + ε

where,

β0 is the Y-intercept

β1  is the slope

ε is the random error

Implementation using Keras

For this example, we will try to predict the salary of an employee based on the number of years of experience. Here, our dependent variable, also called label data is the salary, and the independent variable, also called feature will be the experience. I have downloaded the dataset from the Kaggle site.

First, we import the required Python libraries:

import pandas as pd
import numpy as np
import itertools
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.model import Sequential

Now we read and plot our dataset into a scatter plot:

dataset = pd.read_csv("Salary_Data.csv")
dataset.head()
X = dataset['YearsExperience'].values.reshape(-1,1)
Y = dataset['Salary'].values.reshape(-1,1)
plt.scatter(X,Y)

We get the following output, displaying the first 5 entries of the dataset:

YearsExperience	Salary
0	1.1	        39343
1	1.3	        46205
2	1.5	        37731
3	2.0	        43525
4	2.2	        39891

and, the scatter graph of the entire dataset:

Dataset Graph

Now, we actually build a model to learn these values in the dataset, and set it loss function along with optimizer:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, input_shape=[1]))
model.compile(loss= "mean_squared_error", optimizer=tf.keras.optimizers.SGD(0.1))
fit = model.fit(dataset["YearsExperience"],dataset["Salary"],epochs = 850)

You can imagine this as a neural network with one hidden layer. We have passed 1 as the parameter for Dense function because we have one feature and the value for input_shape is one as we have label.

We have used  MSE as the loss function and Stochastic gradient descent as the optimizer. Both of these are very basic and easy to understand.

Now that we have fit out data points into the model, we are ready to predict the labels and display our generated linear regressor in a graph:

dataset['predict'] = model.predict(X)
plt.scatter(X,Y)
plt.plot(X,dataset['predict'],color = 'r')
plt.show()

Output

We can see that our model has generated a fair line of regression.

Linear Regression Model

Now, we can pass new values to our model and check if it predicts the salary correctly. For example, let’s pass four random values as a list and see how the model performs:

print(model.predict([4,7.5,2.5,9]))

The output for this will be a numpy array:

[[ 63327.97]
 [ 96740.31]
 [ 49008.39]
 [111059.89]]

Conclusion

Yay!!

Our model has predicted salaries pretty close to the actual values.

That’s all for this article, also check out:

Linear Regression using TensorFlow

Thank you.

 

Leave a Reply

Your email address will not be published. Required fields are marked *