Parkinson’s Disease Detection using Support vector model

In this tutorial, we will discuss Parkinson’s disease prediction using a support vector model in machine learning using Python.

U can learn more about the support learning machine model here.

Parkinson’s disease detection is defined as Parkinson’s disease is a progressive nervous system disorder that affects momentum leading to shaking stiffness and difficulty with walking balance and coordination. Parkinson’s symptoms usually begin gradually and get worse over time.

You can download the dataset from here parkin data.

So let’s try to implement the code

Import the necessary Python libraries.

#Importing the dependencies
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import svm
from sklearn.metrics import accuracy_score

The first step is Data Collection,

1. Load data,

#Load the data
parkin_data=pd.read_csv("C:\\Users\\users drive\\OneDrive\\Desktop\\parkin_data.csv")

To check whether the data set is loaded or not print the first five rows by using the head method.

#printing the first five rows of the dataframe.
parkin_data.head()

OUTPUT:-

Now print the last five rows by using the tail method.

#printing last five rows of the dataframe
parkin_data.tail()

OUTPUT:-

In status column, there are only two values either 1 or 0 1 represents that person have parkinsons and 0 represents that person does not have parkinsons.

To check how many rows and columns are present in the given dataset.

parkin_data.shape

OUTPUT:-

(195, 24)

Now let’s check the basic information for the data.

#Getting the more information about the dataset.
parkin_data.info()

OUTPUT:-

All values are present in the float and integer values so the algorithm can understand easily the numbers.

checking for missing values in each column.

parkin_data.isnull().sum()

OUTPUT:-

No missing values are present in the dataset.

Let’s get some statistical measures of the dataset.

parkin_data.describe()

OUTPUT:-

DISTRIBUTION OF TARGET VALUES

Let’s check how many members are affected by parkinson’s given in the dataset. means checking status for 0 and 1.

parkin_data['status'].value_counts()

OUTPUT:-

1    147
0     48
Name: status, dtype: int64

Here 0 means person not affected with the parkinsons disease.

1 means person affected with parkinsons disease.

Grouping the data based on the target variable.

parkin_data.groupby('status').mean()

OUTPUT:-

DATA PREPROCESSING

In the data preprocessing step separate the status column separately because by using the status column we are going to find whether the person is affected with the parkinson’s disease or not.

separate features and target

X=parkin_data.drop(columns=['name','status'],axis=1)
Y=parkin_data['status']

Let’s print the X values.

print(X)

OUTPUT:-

Let’s print theY values.

print(Y)

OUTPUT:-

0      1
1      1
2      1
3      1
4      1
      ..
190    0
191    0
192    0
193    0
194    0
Name: status, Length: 195, dtype: int64

TRAINING AND TEST DATA

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=2)

Let’s check the size of the test and train dataset.

print(X.shape,X_train.shape,X_test.shape)

OUTPUT:-

(195, 22) (156, 22) (39, 22)

DATA STANDARDIZATION

scaler=StandardScaler()
scaler.fit(X_train)

OUTPUT:-

StandardScaler()

Let’s transform the values to the train and test split.

x_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)

Let’s try to print the x  train and test values.

print(X_train)

OUTPUT:-

MODEL TRAINING

Let’s try to implement with the support vector machine model.

model=svm.SVC(kernel='linear')

Train the model.

model.fit(X_train,Y_train)

OUTPUT:-

SVC(kernel='linear')

MODEL EVALUATION

Let’s try to find the acciuracy score.

#accuracy score on training data
X_train_prediction=model.predict(X_train)
training_data_accuracy=accuracy_score(Y_train,X_train_prediction)

let’s try to print the values for the train data.

print('Accuracy score :',training_data_accuracy)

OUTPUT:-

Accuracy score : 0.8717948717948718

Nearly to the 87 percent it is better model to implement the dataset.

Let’s check the accuracy score on the test data.

#accuracy score on test data
X_test_prediction=model.predict(X_test)
test_data_accuracy=accuracy_score(Y_test,X_test_prediction)
print('Accuracy score :',test_data_accuracy)

OUTPUT:-

Accuracy score : 0.7948717948717948

MAKING A PREDICTIVE SYSTEM

Instead of predicting the entire dataset, we can check for only one row of the data.

It is very useful,

#Building a predictive system
input_data=(116.67600,137.87100,111.36600,0.00997,0.00009,0.00502,0.00698,0.01505,0.05492,0.51700,0.02924,0.04005,0.03772,0.08771,0.01353,20.64400,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975)

#changing the input data to numpy array
input_data_as_numpy_array=np.asarray(input_data)

#reshape the numpy array
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

#standardize the data
std_data=scaler.transform(input_data_reshaped)

prediction=model.predict(std_data)
print(prediction)
if (prediction[0]==0):
    print("The person does not have parkinsons")
else:
    print("The person have Parkinsons")

OUTPUT:-

[1]
The person have Parkinsons

Let’s check for the another output.

#For another output
input_data=(198.38300,215.20300,193.10400,0.00212,0.00001,0.00113,0.00135,0.00339,0.01263,0.11100,0.00640,0.00825,0.00951,0.01919,0.00119,30.77500,0.465946,0.738703,-7.067931,0.175181,1.512275,0.096320)

#changing the input data to numpy array
input_data_as_numpy_array=np.asarray(input_data)

#reshape the numpy array
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

#standardize the data
std_data=scaler.transform(input_data_reshaped)

prediction=model.predict(std_data)
print(prediction)
if (prediction[0]==0):
    print("The person does not have parkinsons")
else:
    print("The person have Parkinsons")

OUTPUT:-

[1]
The person have Parkinsons

 

Leave a Reply

Your email address will not be published. Required fields are marked *