Diabetes Prediction using Keras In Python

In this, Article we will learn to implement diabetes prediction using deep learning algorithms in Python with the help of Keras deep learning API. For this purpose, we will use an open dataset and we will be creating a deep neural network architecture. You can download the dataset from the link Dataset.

You can analyze the dataset after downloading and you will find that the dataset is divided into the categories of 0’s and 1’s. Now let’s move forward to implement our model in python using TensorFlow and Keras. I hope you have pre-installed all the libraries in your local system, If you have not installed it no issues, you can open google Colab and practice this tutorial with me.

Diabetes Prediction Using Deep Neural Network

Now let’s move forward to import the required Python libraries in our notebook:

import numpy as np
import pandas as pd
import tensorflow as tf
from keras.layers import Dense,Dropout
from sklearn.model_selection import train_test_split
import matplotlib as mlp
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import StandardScaler

In the above block of code, we are importing the required library in our notebook. Keras API already comes with the TensorFlow deep learning module of Python that plays an important role in the diabetes prediction task.

Moving forward to load our dataset.

data=pd.read_csv("/content/drive/My Drive/Internship/prima-indians-diabetes.csv")

In the above block of code, we are loading the dataset and looking to the top 10 data using head().


You can also see the dataset from the bottom using tail().

The output looks like this you can see below:

columns are not seemed to be meaningful right?

Let’s move to rename the column names.

data = data.rename(index=str, columns={"6":"preg"})
data = data.rename(index=str, columns={"148":"gluco"})
data = data.rename(index=str, columns={"72":"bp"})
data = data.rename(index=str, columns={"35":"stinmm"})
data = data.rename(index=str, columns={"0":"insulin"})
data = data.rename(index=str, columns={"33.6":"mass"})
data =data.rename(index=str, columns={"0.627":"dpf"})
data = data.rename(index=str, columns={"50":"age"})

data = data.rename(index=str, columns={"1":"target"})


Now looking cool after renaming.

Let’s move to the next step.


We are using describe () to see our dataset more closely and in a meaningful manner, you can see the output below. Don’t forget to see the mean and standard deviation value given below.





In the above block of code, we are looking at the correlation matrix of our dataset




In the above block of code, we can see the histogram plot of each column of our dataset.

Hope you are following this tutorial with me, Let’s move forward to do some more analysis of our dataset.

plt.bar(data['target'].unique(), data['target'].value_counts(), color = ['blue', 'pink'])
plt.xlabel('Target class')
plt.title('count of our each target class')


In the above block of code, you can see the output and verify the count of the target class.

X = data.iloc[:, :-1]
Y = data.iloc[:,8]


In the above block of code, you can see that we have divided our dataset into train input and the target dataset means the first 8 columns will act as input feature for our model and the last column will work as target class.

X_train_full, X_test, y_train_full, y_test = train_test_split(X, Y, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)

In the above block of code, I have converted my dataset into a train and test dataset and further, I have converted my train and test dataset into train test and validation.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)


In the above block of code, I have converted my dataset into a standard scaler format. Basically standard scalar format is used to remove the mean and used to scale each feature to unit variance.


here I am using a random seed to generate a pseudo-random number and setting to our tf graph.



(431, 8)

you can see the shape of the training sample and on the basis of this shape only we are going to define our deep neural network model.

model.add(Dense(15,input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In the above block of code, you can see that I am using a sequential model, and also I am using dropout layers in my model to get rid of overfitting.



You can see the parameters used in my model, there is a total of 392 trainable parameters

Now, let’s move forward to train our model.

model.compile(loss="binary_crossentropy", optimizer="SGD", metrics=['accuracy'])
model_history = model.fit(X_train, y_train, epochs=200, validation_data=(X_valid, y_valid))


You can see in the above code that I am compiling my model with 200 epoch, with binary-cross entropy loss function and SGD optimizer

Now let’s move to predict the values.

y_pred = model.predict(X_new)
print (y_pred)


In the above code block, you can see the actual output and the predicted output.


Hope you followed the tutorial with me don’t forget to put your suggestions in the comment box.

Further modification can be done to increase the accuracy, like adding another optimizer, increasing the epochs.

You can play with the code by doing some modifications, code can be found in the link code.

Thanks for your time.

One response to “Diabetes Prediction using Keras In Python”

  1. Tumhra bhai ROSHAN says:

    This is very helpful ???? for me bro????.
    Hope you will write more articles like this one.????????.
    Great work towards uhh ☺️.
    Keep going.
    Keep doing hard work.
    Such a helpful article ????????
    Thnkuhh for giving this one.

Leave a Reply

Your email address will not be published. Required fields are marked *