Survival Prediction Using logistic Regression

Logistic Regression is used to predict the binary data such as whether it is true or false.

So for the survival prediction logistic regression is one of the best models which we can apply for better prediction.

This model tells about how many people survived and how many people died during any accident or natural calamities.

For example, we can take the incident of the titanic ship.

In this model, we are taking the dataset into the data frames for the easy flow of the program.

In this model, we are going to clear some of the null values which are not required for the prediction and we are going to plot the box plot to reduce the bias and variance of the data.

So let’s try to implement the code

First, of all, we need to check the panda’s library in the respective ide if it is not installed we have to install it and we have to import the library as PD.

import pandas as pd
from sklearn.linear_model import LinearRegression, Lasso
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, cross_validate
from sklearn.preprocessing import normalize, StandardScaler 
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

For further information about libraries, u can study from here,

For pandasĀ 

Let’s read the train data. train (1)

titanic= pd.read_csv("C:\\Users\\usersfolder\\OneDrive\\Desktop\\train (1).csv")

The below code is the no of rows and columns in the training dataset.

titanic.shape

Output:-

(891, 12)

The below code will represent the data frame.

titanic.head()

Now we are going to check the null values

titanic.info()

Output:-

Next, we are going to draw the heat map.

fig, ax=plt.subplots(figsize=(16, 8)) 
sns.heatmap(titanic.corr(), annot=True, fmt='1.2f', annot_kws={"size" : 10}, linewidth=1, cmap="coolwarm") 
plt.show()

And we are going to map the sex values into binary.

titanic.Sex = titanic.Sex.map({'male':0, 'female':1})

Now we are going to plot the x-axis values and y-axis values as mentioned below.

y = titanic.Survived.copy()
X = titanic.drop(['Survived'], axis = 1)

Now we are going to drop the unnecessary columns.

X.drop(['Cabin','Ticket','Embarked', 'Name','PassengerId'], axis = 1, inplace = True)

Check the Null values.

X.info()

Now we are going to remove the null values.

X.isnull().values.any()
X[pd.isnull(X).any(axis = 1)]

Let’s split the data into two parts.

from sklearn.model_selection import train_test_split
X_train,X_valid,y_train,y_valid = train_test_split(X,y, test_size = 0.2, random_state = 1,stratify=y)

Now let’s Import the logistic regression from the python library

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

To check the score of the model u can use the below function

model.score(X_train,y_train)

Output:-

0.7991573033707865

Now let’s check the validation of the data

model.score(X_valid,y_valid)

Output:-

0.7821229050279329

let’s check the accuracy of the data

log=LogisticRegression() 
log.fit(X_train, y_train)
pred=log.predict(X_valid)
ace=accuracy_score(y_valid, pred)
print(ace)

Output:-

0.7821229050279329

with similar methods, you can also do the test data.

Test dataset link test.

The final output of the model is

model_predictions

The Output generated is:-

So at last we have predicted the survival list of people.

I hope that my information was useful to you

Leave a Reply

Your email address will not be published. Required fields are marked *