Rock Vs Mine Prediction using Logistic Prediction using scikit-learn in Python

In this tutorial, we will discuss rock vs mine prediction using logistic regression with the help of scikit-learn in Python programming.

The objective of this project is to whenever there is a war between two countries one country will use a submarine for war on the other country and the other country will keep explosives on the ocean to predict whether it is rock or mine or explosive.

To predict whether it is rock or mine will use logistic regression which is very helpful for the binary values.

The workflow of this project needs to collect solar data the experiment can be done where the solar is used to send and receive signals bounce back from metal signals and some rocks because the mines will be made of metals collect this data which will be obtained from rock and metal.

If you want to know more about Logistic Regression.

You can download the data set from here Sonar data.

So let’s try to implement the code

Import the necessary Python libraries.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

The first step is Data Collection,

1. Load data

#load the dataset in to the pandas dataframe
sonar_data=pd.read_csv("C:\\Users\\usersdrive\\OneDrive\\Desktop\\Sonar data.csv",header=None)

To check whether the data set is loaded or not print the first five rows by using the head method.

sonar_data.head()

OUTPUT:-

Now print the last five rows by using the tail method.

OUTPUT:-

To check how many rows and columns are present in the given dataset.

sonar_data.shape

OUTPUT:-

(208, 61)

Now let’s check the basic information for the data.

sonar_data.describe()

OUTPUT:-

To check how many rocks and mines are present in the given dataset.

sonar_data[60].value_counts()

OUTPUT:-

M    111
R     97
Name: 60, dtype: int64

There are a total of 111 mines and 97 rocks in the given dataset.

Here m means mine and r means rock.

To check the mean of the mine and the rock because, with that difference of the values in each column, we are going to predict whether it is rock or mine in the ocean.

sonar_data.groupby(60).mean()

OUTPUT:-

The mean value for the mine is 0.3 and rock is 0.02 the value is quite different with the help of these values we are going to predict whether it is rock or mine.

SPLIT THE TRAIN AND TEST DATA

#seperating data and labels
X=sonar_data.drop(columns=60,axis=1)
Y=sonar_data[60]

print the labels.

print(X)
print(Y)

OUTPUT:-

TRAINING AND TEST DATA

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.1,stratify=Y,random_state=1)

print the data.

print(X.shape,X_train.shape,X_test.shape)

OUTPUT:-

(208, 60) (187, 60) (21, 60)

Now, let’s check the train split and the test split of the data.

print(X_train)
print(Y_train)

OUTPUT:-

MODEL TRAINING

model=LogisticRegression()

Train the model.

model.fit(X_train,Y_train)

OUTPUT:-

LogisticRegression()

MODEL EVALUATION

Let’s check the accuracy score of the data.

#accuracy on training data
X_train_prediction=model.predict(X_train)
training_data_accuracy=accuracy_score(X_train_prediction,Y_train)

For train data,

print("Accuracy score : ",training_data_accuracy)

OUTPUT:-

Accuracy score :  0.8342245989304813

The accuracy score is very good.

For test data,

#accuracy on test data
X_test_prediction=model.predict(X_test)
test_data_accuracy=accuracy_score(X_test_prediction,Y_test)
print("Accuracy score : ",test_data_accuracy)

OUTPUT:-

Accuracy score :  0.7619047619047619

The accuracy score of the test data is very good.

Both the train and the test data scores are very good it will fit in the linear regression model.

MAKING A PREDICTIVE SYSTEM

Instead of predicting the entire dataset, we can check for only one row of the data.

It is very useful,

#Making a predictive system
input_data=(0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.0660,0.2273,0.3100,0.2999,0.5078,0.4797,0.5783,0.5071,0.4328,0.5550,0.6711,0.6415,0.7104,0.8080,0.6791,0.3857,0.1307,0.2604,0.5121,0.7547,0.8537,0.8507,0.6692,0.6097,0.4943,0.2744,0.0510,0.2834,0.2825,0.4256,0.2641,0.1386,0.1051,0.1343,0.0383,0.0324,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032)

#changing the input data to a numpy array
input_data_as_numpy_array=np.asarray(input_data)

#Reshape the data 
#Because we are predicting only one data size other wise model is confused 
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

prediction=model.predict(input_data_reshaped)
print(prediction)

if(prediction[0]=='R'):
    print("The object is a rock")
else:
    print("The object is a mine")

OUTPUT:-

['R']
The object is a rock

For another row,

#For checking another input
#Making a predictive system
input_data=(0.0260,0.0363,0.0136,0.0272,0.0214,0.0338,0.0655,0.1400,0.1843,0.2354,0.2720,0.2442,0.1665,0.0336,0.1302,0.1708,0.2177,0.3175,0.3714,0.4552,0.5700,0.7397,0.8062,0.8837,0.9432,1.0000,0.9375,0.7603,0.7123,0.8358,0.7622,0.4567,0.1715,0.1549,0.1641,0.1869,0.2655,0.1713,0.0959,0.0768,0.0847,0.2076,0.2505,0.1862,0.1439,0.1470,0.0991,0.0041,0.0154,0.0116,0.0181,0.0146,0.0129,0.0047,0.0039,0.0061,0.0040,0.0036,0.0061,0.0115)

#changing the input data to a numpy array
input_data_as_numpy_array=np.asarray(input_data)

#Reshape the data 
#Because we are predicting only one data size other wise model is confused 
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

prediction=model.predict(input_data_reshaped)
print(prediction)

if(prediction[0]=='R'):
    print("The object is a rock")
else:
    print("The object is a mine")

OUTPUT:-

['M']
The object is a mine

This is how the logistic regression model is helpful.

 

Leave a Reply

Your email address will not be published. Required fields are marked *