Rock Vs Mine Prediction using Logistic Prediction using scikit-learn in Python
In this tutorial, we will discuss rock vs mine prediction using logistic regression with the help of scikit-learn in Python programming.
The objective of this project is to whenever there is a war between two countries one country will use a submarine for war on the other country and the other country will keep explosives on the ocean to predict whether it is rock or mine or explosive.
To predict whether it is rock or mine will use logistic regression which is very helpful for the binary values.
The workflow of this project needs to collect solar data the experiment can be done where the solar is used to send and receive signals bounce back from metal signals and some rocks because the mines will be made of metals collect this data which will be obtained from rock and metal.
If you want to know more about Logistic Regression.
You can download the data set from here Sonar data.
So let’s try to implement the code
Import the necessary Python libraries.
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score
The first step is Data Collection,
1. Load data
#load the dataset in to the pandas dataframe sonar_data=pd.read_csv("C:\\Users\\usersdrive\\OneDrive\\Desktop\\Sonar data.csv",header=None)
To check whether the data set is loaded or not print the first five rows by using the head method.
sonar_data.head()
OUTPUT:-
Now print the last five rows by using the tail method.
OUTPUT:-
To check how many rows and columns are present in the given dataset.
sonar_data.shape
OUTPUT:-
(208, 61)
Now let’s check the basic information for the data.
sonar_data.describe()
OUTPUT:-
To check how many rocks and mines are present in the given dataset.
sonar_data[60].value_counts()
OUTPUT:-
M 111 R 97 Name: 60, dtype: int64
There are a total of 111 mines and 97 rocks in the given dataset.
Here m means mine and r means rock.
To check the mean of the mine and the rock because, with that difference of the values in each column, we are going to predict whether it is rock or mine in the ocean.
sonar_data.groupby(60).mean()
OUTPUT:-
The mean value for the mine is 0.3 and rock is 0.02 the value is quite different with the help of these values we are going to predict whether it is rock or mine.
SPLIT THE TRAIN AND TEST DATA
#seperating data and labels X=sonar_data.drop(columns=60,axis=1) Y=sonar_data[60]
print the labels.
print(X) print(Y)
OUTPUT:-
TRAINING AND TEST DATA
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.1,stratify=Y,random_state=1)
print the data.
print(X.shape,X_train.shape,X_test.shape)
OUTPUT:-
(208, 60) (187, 60) (21, 60)
Now, let’s check the train split and the test split of the data.
print(X_train) print(Y_train)
OUTPUT:-
MODEL TRAINING
model=LogisticRegression()
Train the model.
model.fit(X_train,Y_train)
OUTPUT:-
LogisticRegression()
MODEL EVALUATION
Let’s check the accuracy score of the data.
#accuracy on training data X_train_prediction=model.predict(X_train) training_data_accuracy=accuracy_score(X_train_prediction,Y_train)
For train data,
print("Accuracy score : ",training_data_accuracy)
OUTPUT:-
Accuracy score : 0.8342245989304813
The accuracy score is very good.
For test data,
#accuracy on test data X_test_prediction=model.predict(X_test) test_data_accuracy=accuracy_score(X_test_prediction,Y_test)
print("Accuracy score : ",test_data_accuracy)
OUTPUT:-
Accuracy score : 0.7619047619047619
The accuracy score of the test data is very good.
Both the train and the test data scores are very good it will fit in the linear regression model.
MAKING A PREDICTIVE SYSTEM
Instead of predicting the entire dataset, we can check for only one row of the data.
It is very useful,
#Making a predictive system input_data=(0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.0660,0.2273,0.3100,0.2999,0.5078,0.4797,0.5783,0.5071,0.4328,0.5550,0.6711,0.6415,0.7104,0.8080,0.6791,0.3857,0.1307,0.2604,0.5121,0.7547,0.8537,0.8507,0.6692,0.6097,0.4943,0.2744,0.0510,0.2834,0.2825,0.4256,0.2641,0.1386,0.1051,0.1343,0.0383,0.0324,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032) #changing the input data to a numpy array input_data_as_numpy_array=np.asarray(input_data) #Reshape the data #Because we are predicting only one data size other wise model is confused input_data_reshaped=input_data_as_numpy_array.reshape(1,-1) prediction=model.predict(input_data_reshaped) print(prediction) if(prediction[0]=='R'): print("The object is a rock") else: print("The object is a mine")
OUTPUT:-
['R'] The object is a rock
For another row,
#For checking another input #Making a predictive system input_data=(0.0260,0.0363,0.0136,0.0272,0.0214,0.0338,0.0655,0.1400,0.1843,0.2354,0.2720,0.2442,0.1665,0.0336,0.1302,0.1708,0.2177,0.3175,0.3714,0.4552,0.5700,0.7397,0.8062,0.8837,0.9432,1.0000,0.9375,0.7603,0.7123,0.8358,0.7622,0.4567,0.1715,0.1549,0.1641,0.1869,0.2655,0.1713,0.0959,0.0768,0.0847,0.2076,0.2505,0.1862,0.1439,0.1470,0.0991,0.0041,0.0154,0.0116,0.0181,0.0146,0.0129,0.0047,0.0039,0.0061,0.0040,0.0036,0.0061,0.0115) #changing the input data to a numpy array input_data_as_numpy_array=np.asarray(input_data) #Reshape the data #Because we are predicting only one data size other wise model is confused input_data_reshaped=input_data_as_numpy_array.reshape(1,-1) prediction=model.predict(input_data_reshaped) print(prediction) if(prediction[0]=='R'): print("The object is a rock") else: print("The object is a mine")
OUTPUT:-
['M'] The object is a mine
This is how the logistic regression model is helpful.
Leave a Reply