Parkinson’s Disease Detection using Support vector model

In this tutorial, we will discuss Parkinson’s disease prediction using a support vector model in machine learning using Python.

Parkinson’s disease detection is defined as Parkinson’s disease is a progressive nervous system disorder that affects momentum leading to shaking stiffness and difficulty with walking balance and coordination. Parkinson’s symptoms usually begin gradually and get worse over time.

`So let’s try to implement the code`

Import the necessary Python libraries.

```#Importing the dependencies
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import svm
from sklearn.metrics import accuracy_score```

The first step is Data Collection,

```#Load the data

To check whether the data set is loaded or not print the first five rows by using the head method.

```#printing the first five rows of the dataframe.

OUTPUT:-

Now print the last five rows by using the tail method.

```#printing last five rows of the dataframe
parkin_data.tail()```

OUTPUT:-

In status column, there are only two values either 1 or 0 1 represents that person have parkinsons and 0 represents that person does not have parkinsons.

To check how many rows and columns are present in the given dataset.

`parkin_data.shape`

OUTPUT:-

`(195, 24)`

Now let’s check the basic information for the data.

```#Getting the more information about the dataset.
parkin_data.info()```

OUTPUT:-

All values are present in the float and integer values so the algorithm can understand easily the numbers.

checking for missing values in each column.

`parkin_data.isnull().sum()`

OUTPUT:-

No missing values are present in the dataset.

Let’s get some statistical measures of the dataset.

`parkin_data.describe()`

DISTRIBUTION OF TARGET VALUES

Let’s check how many members are affected by parkinson’s given in the dataset. means checking status for 0 and 1.

`parkin_data['status'].value_counts()`

OUTPUT:-

```1    147
0     48
Name: status, dtype: int64```

Here 0 means person not affected with the parkinsons disease.

1 means person affected with parkinsons disease.

Grouping the data based on the target variable.

`parkin_data.groupby('status').mean()`

DATA PREPROCESSING

In the data preprocessing step separate the status column separately because by using the status column we are going to find whether the person is affected with the parkinson’s disease or not.

separate features and target

```X=parkin_data.drop(columns=['name','status'],axis=1)
Y=parkin_data['status']```

Let’s print the X values.

`print(X)`

OUTPUT:-

Let’s print theY values.

`print(Y)`

OUTPUT:-

```0      1
1      1
2      1
3      1
4      1
..
190    0
191    0
192    0
193    0
194    0
Name: status, Length: 195, dtype: int64```

TRAINING AND TEST DATA

`X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=2)`

Let’s check the size of the test and train dataset.

`print(X.shape,X_train.shape,X_test.shape)`

OUTPUT:-

`(195, 22) (156, 22) (39, 22)`

DATA STANDARDIZATION

`scaler=StandardScaler()`
`scaler.fit(X_train)`

OUTPUT:-

`StandardScaler()`

Let’s transform the values to the train and test split.

```x_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)```

Let’s try to print the x  train and test values.

`print(X_train)`

MODEL TRAINING

Let’s try to implement with the support vector machine model.

`model=svm.SVC(kernel='linear')`

Train the model.

`model.fit(X_train,Y_train)`

OUTPUT:-

`SVC(kernel='linear')`

MODEL EVALUATION

Let’s try to find the acciuracy score.

```#accuracy score on training data
X_train_prediction=model.predict(X_train)
training_data_accuracy=accuracy_score(Y_train,X_train_prediction)```

let’s try to print the values for the train data.

`print('Accuracy score :',training_data_accuracy)`

OUTPUT:-

`Accuracy score : 0.8717948717948718`

Nearly to the 87 percent it is better model to implement the dataset.

Let’s check the accuracy score on the test data.

```#accuracy score on test data
X_test_prediction=model.predict(X_test)
test_data_accuracy=accuracy_score(Y_test,X_test_prediction)```
`print('Accuracy score :',test_data_accuracy)`

OUTPUT:-

`Accuracy score : 0.7948717948717948`

MAKING A PREDICTIVE SYSTEM

Instead of predicting the entire dataset, we can check for only one row of the data.

It is very useful,

```#Building a predictive system
input_data=(116.67600,137.87100,111.36600,0.00997,0.00009,0.00502,0.00698,0.01505,0.05492,0.51700,0.02924,0.04005,0.03772,0.08771,0.01353,20.64400,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975)

#changing the input data to numpy array
input_data_as_numpy_array=np.asarray(input_data)

#reshape the numpy array
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

#standardize the data
std_data=scaler.transform(input_data_reshaped)

prediction=model.predict(std_data)
print(prediction)
if (prediction[0]==0):
print("The person does not have parkinsons")
else:
print("The person have Parkinsons")```

OUTPUT:-

```[1]
The person have Parkinsons```

Let’s check for the another output.

```#For another output
input_data=(198.38300,215.20300,193.10400,0.00212,0.00001,0.00113,0.00135,0.00339,0.01263,0.11100,0.00640,0.00825,0.00951,0.01919,0.00119,30.77500,0.465946,0.738703,-7.067931,0.175181,1.512275,0.096320)

#changing the input data to numpy array
input_data_as_numpy_array=np.asarray(input_data)

#reshape the numpy array
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

#standardize the data
std_data=scaler.transform(input_data_reshaped)

prediction=model.predict(std_data)
print(prediction)
if (prediction[0]==0):
print("The person does not have parkinsons")
else:
print("The person have Parkinsons")```

OUTPUT:-

```[1]
The person have Parkinsons```