Linear Regression with Scikit-Learn in Python

Hello everyone,

In this tutorial, we are going to learn how to perform linear regression in python using scikit-learn so that we can use it for prediction purposes as well as for analysis purposes.

What is linear regression?

Linear Regression is a machine-learning algorithm that is used for prediction based on independent variables. It is a supervised learning algorithm. Linear regression is used for finding the relationship between dependent and independent variables. Linear regression is a simple model and also it provides a mathematical formula that can be helpful for prediction.

Explanation of Code

First, we need to install the scikit-learn library using following command:

pip install -U scikit-learn

Now, let’s just import all the libraries:

import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt

Now we need to load the CSV file on which we are going to perform linear regression.

Enlightenerdataset=pd.read_csv('Fitbit.csv')
print(dataset.head())

Output:

      Heart Calories Steps  Distance  Age  Gender Weight Height Activity
0     55   2.70432    8.0  0.003666   35      M    179.0     5.6  1.Sedentary
1     54   2.92968   13.0  0.006027   35      M    179.0     5.6  1.Sedentary
2     59   2.70432    9.0  0.004163   35      M    179.0     5.6  1.Sedentary
3     58   2.70432   11.0  0.005095   35      M    179.0     5.6  1.Sedentary
4     58   1.12680    0.0  0.000000   35      M    179.0     5.6  1.Sedentary

dataset.shape
Output:

(21466, 9)

Let’s just take two attributes Heart and Age from the above dataset

dataset_sample = dataset[['Heart', 'Age']]
dataset_sample.columns = ['Heart', 'Age']

For next step lets just start with separating the data as we need to have one dependent and one independent variables for linear regression.

So we will convert DataFrame into NumPy array.

X=np.array(df_dataset['Heart']).reshape(-1,1)
Y=np.array(df_dataset['Age']).reshape(-1,1)

Now let’s just divide our data into training set and testing set .

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25)
print(X.shape,X_train.shape,X_test.shape)
Output:

(21466, 1) (16099, 1) (5367, 1)

Let’s just start training our model

regressor_model = LinearRegression()
regressor_model.fit(X_train, y_train)
print(regressor_model.score(X,Y))

Output:

LinearRegression()

0.05619415873735645

Lets just explore our results

prediction=regressor_model.predict(X_test)
plt.scatter(X_test,y_test,color='r')
plt.plot(X_test,prediction,color='g')
plt.show()

Output:

Leave a Reply

Your email address will not be published. Required fields are marked *