Image Data Preprocessing using keras

In this tutorial, we will learn Image data preprocessing using Keras.

All the codes are done in the Jupyter notebook using the Anaconda extension.

 What is Data Preprocessing?

Image Data preprocessing is defined as which converts image data into a form that allows machine learning algorithms to solve it.

There are some steps in the Data preprocessing

  • Load the image
  • Process an image
  • Data Argumentation
  • Gray Scale conversion

Download and prepare the Data set

#The sample data set everyone can able to access easily. First, need to download the dataset and keep it in the os directory paths.

#Downnload the data set from the below link

But in this case to understand easily what’s going on in the code and the content I took one picture to understand the viewers easily and they can generate the code on their own

By going into the data preprocessing using Keras.

First, install the Keras module and TensorFlow module in the Anaconda prompt or otherwise which extension u are using for the python install in that extension.

For example,

pip install keras

Import the necessary header files in the Jupyter notebook.

from tensorflow.keras.utils import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import load_img
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.utils import load_img
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation , Dropout, Flatten, Dense
from keras import backend as k

from keras.models import load_model
from keras.preprocessing import image
import numpy as np
from os import listdir
from os.path import isfile, join

The above modules represent:-


To understand the code easily first u need to download one photo into your local host directory.

For example, I installed the panda image in jpg format to understand it easily.

#U can install the sample panda image from the google drive

First, we load the image with the use of the image. load function and print the output with the function

#TO load image
#To simplify the path easily go to the fileexplorer and click on the image and copy the path where the image is present and upload the path in load image

Now we can able to print the image.



<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1020×510 at 0x1ABC18BB940>

U can also able to use this function also


->The below code will represent the image in the array size.

pic_array= img_to_array(pic)


(510, 1020, 3)

To get reshape the array to satisfy the data preprocessing condition.

pic_array=pic_array.reshape((1,)+ pic_array.shape)


(1, 510, 1020, 3)

->Now we can able to write the code to print the image rotation by using the data generator and the image generator function.

datageneration= ImageDataGenerator(

->The below code will represent how many times the image needs to print in the compiler

for batch in datagen.flow(pic_array,batch_size=1,save_to_dir="C:\\Users\\usersname\\OneDrive\\Desktop\\image\\hello",save_prefix='panda',save_format='png'):
    if count==10:
print("10 images are generated")



->To flip the images we use the following code

#flipping of the images
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True)


The panda is in the flipped position

<keras.preprocessing.image.ImageDataGenerator object at 0x000001ABC194BF40>

->Going to the next part of the data argumentation now we need to write the code for greyscale conversion.

plt.imshow(gray_image, cmap = 'gray')

-> To rotate the image the following code is useful.

datagen = ImageDataGenerator(rotation_range=20, fill_mode='nearest')

If u not keep the fill_mode=nearest then the image representation will show a blurred image.

The above code is done only for one image but in real-time we need to load a very vast amount of data and need to take some of the images and load them into the compiler and run the piece of code.

For example in the above dataset which I provided at the start of the page,

  1. we need to resize normalization and rotation and rescale

Here we took only one panda example that’s why we did not take test data and train data but when we took the huge amount of images we need to mention test data train data and verification data.

For example, the following code represents the test and train data


The above code represents the validation data which keep in a separate folder of validation the output will represent.


Found 90 images belonging to 2 classes.
To represent the accuracy and the loss percentage of the data we need to write the following code


The output will represent

By taking a huge amount of data you can able to represent graphs for accuracy and the loss percentage.

The above-mentioned are the methods of image data preprocessing.

Leave a Reply

Your email address will not be published. Required fields are marked *