# Linear Regression with Scikit-Learn in Python

Hello everyone,

In this tutorial, we are going to learn how to perform linear regression in python using scikit-learn so that we can use it for prediction purposes as well as for analysis purposes.

## What is linear regression?

Linear Regression is a machine-learning algorithm that is used for prediction based on independent variables. It is a supervised learning algorithm. Linear regression is used for finding the relationship between dependent and independent variables. Linear regression is a simple model and also it provides a mathematical formula that can be helpful for prediction.

## Explanation of Code

First, we need to install the scikit-learn library using following command:

pip install -U scikit-learn

Now, let’s just import all the libraries:

import numpy as np import pandas as pd from sklearn import preprocessing from sklearn.linear_model import LinearRegression from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from matplotlib import pyplot as plt

Now we need to load the CSV file on which we are going to perform linear regression.

Enlightenerdataset=pd.read_csv('Fitbit.csv') print(dataset.head())

Output: Heart Calories Steps Distance Age Gender Weight Height Activity 0 55 2.70432 8.0 0.003666 35 M 179.0 5.6 1.Sedentary 1 54 2.92968 13.0 0.006027 35 M 179.0 5.6 1.Sedentary 2 59 2.70432 9.0 0.004163 35 M 179.0 5.6 1.Sedentary 3 58 2.70432 11.0 0.005095 35 M 179.0 5.6 1.Sedentary 4 58 1.12680 0.0 0.000000 35 M 179.0 5.6 1.Sedentary

dataset.shape

Output: (21466, 9)

Let’s just take two attributes Heart and Age from the above dataset

dataset_sample = dataset[['Heart', 'Age']] dataset_sample.columns = ['Heart', 'Age']

For next step lets just start with separating the data as we need to have one dependent and one independent variables for linear regression.

So we will convert DataFrame into NumPy array.

X=np.array(df_dataset['Heart']).reshape(-1,1) Y=np.array(df_dataset['Age']).reshape(-1,1)

Now let’s just divide our data into training set and testing set .

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25) print(X.shape,X_train.shape,X_test.shape)

Output: (21466, 1) (16099, 1) (5367, 1)

Let’s just start training our model

regressor_model = LinearRegression() regressor_model.fit(X_train, y_train) print(regressor_model.score(X,Y))

Output: LinearRegression() 0.05619415873735645

Lets just explore our results

prediction=regressor_model.predict(X_test) plt.scatter(X_test,y_test,color='r') plt.plot(X_test,prediction,color='g') plt.show()

Output:

## Leave a Reply