Survival Prediction Using logistic Regression
Logistic Regression is used to predict the binary data such as whether it is true or false.
So for the survival prediction logistic regression is one of the best models which we can apply for better prediction.
This model tells about how many people survived and how many people died during any accident or natural calamities.
For example, we can take the incident of the titanic ship.
In this model, we are taking the dataset into the data frames for the easy flow of the program.
In this model, we are going to clear some of the null values which are not required for the prediction and we are going to plot the box plot to reduce the bias and variance of the data.
So let’s try to implement the code
First, of all, we need to check the panda’s library in the respective ide if it is not installed we have to install it and we have to import the library as PD.
import pandas as pd from sklearn.linear_model import LinearRegression, Lasso import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split, cross_val_score, cross_validate from sklearn.preprocessing import normalize, StandardScaler from sklearn.metrics import accuracy_score from sklearn.svm import SVC from sklearn.neighbors import KNeighborsRegressor from sklearn.metrics import mean_squared_error
For further information about libraries, u can study from here,
For pandasĀ
Let’s read the train data. train (1)
titanic= pd.read_csv("C:\\Users\\usersfolder\\OneDrive\\Desktop\\train (1).csv")
The below code is the no of rows and columns in the training dataset.
titanic.shape
Output:-
(891, 12)
The below code will represent the data frame.
titanic.head()
Now we are going to check the null values
titanic.info()
Output:-
Next, we are going to draw the heat map.
fig, ax=plt.subplots(figsize=(16, 8)) sns.heatmap(titanic.corr(), annot=True, fmt='1.2f', annot_kws={"size" : 10}, linewidth=1, cmap="coolwarm") plt.show()
And we are going to map the sex values into binary.
titanic.Sex = titanic.Sex.map({'male':0, 'female':1})
Now we are going to plot the x-axis values and y-axis values as mentioned below.
y = titanic.Survived.copy() X = titanic.drop(['Survived'], axis = 1)
Now we are going to drop the unnecessary columns.
X.drop(['Cabin','Ticket','Embarked', 'Name','PassengerId'], axis = 1, inplace = True)
Check the Null values.
X.info()
Now we are going to remove the null values.
X.isnull().values.any() X[pd.isnull(X).any(axis = 1)]
Let’s split the data into two parts.
from sklearn.model_selection import train_test_split X_train,X_valid,y_train,y_valid = train_test_split(X,y, test_size = 0.2, random_state = 1,stratify=y)
Now let’s Import the logistic regression from the python library
from sklearn.linear_model import LogisticRegression model = LogisticRegression()
To check the score of the model u can use the below function
model.score(X_train,y_train)
Output:-
0.7991573033707865
Now let’s check the validation of the data
model.score(X_valid,y_valid)
Output:-
0.7821229050279329
let’s check the accuracy of the data
log=LogisticRegression() log.fit(X_train, y_train) pred=log.predict(X_valid) ace=accuracy_score(y_valid, pred) print(ace)
Output:-
0.7821229050279329
with similar methods, you can also do the test data.
Test dataset link test.
The final output of the model is
model_predictions
The Output generated is:-
So at last we have predicted the survival list of people.
I hope that my information was useful to you
Leave a Reply