Image Classification Using Convolution Neural Network (CNN) in Python

In this article, we are going to explore image classification. For this task, we are going to use horses or humans dataset. Our goal here is to build a binary classifier using CNN to categorize the images correctly as horses or humans with the help of Python programming.

In addition to this, the dataset consists of 500 images of horses and 527 images of humans accounting for a total of 1027 images to train on. The dataset is available on https://www.kaggle.com/sanikamal/horses-or-humans-dataset.

I hope you already know about Convolution Neural Network. Now let’s continue and starts with the introduction first…

 

INTRODUCTION

  •  The major advantage of using CNN is to use it for feature extraction from images. The computer perceives an image in the form of pixels range from 0 to 255. Therefore to get useful results the next step is to take an input image, a filter to apply to the input image, this filter extracts certain features essential for training. We find that these entities are numeric and are 2D or 3D arrays depending on the input image.
  •  We observe that images can either be black and white-2D or RGB(Red, Blue, Green) colored images-3D.

Data Preparation

Importing the data is the first step. Two zip files are used for extracting the data to perform training and validation on different image batches. Below is our Python code:

import os
import zipfile

local_zip = '/tmp/humans-horses.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/humans-horses')
local_zip = '/tmp/validation-humans-horses.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/validation-humans-horses')
zip_ref.close()
import tensorflow as tf

Building the Model

Now we will use Keras to build the model. The Python program for doing this is given below:

model=tf.keras.models.Sequential([
                                  tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
                                  tf.keras.layers.MaxPooling2D(2,2),
                                  tf.keras.layers.Conv2D(32,(3,3),activation='relu'),
                                  tf.keras.layers.MaxPooling2D(2,2),
                                  tf.keras.layers.Conv2D(64,(3,3),activation='relu'),
                                  tf.keras.layers.MaxPooling2D(2,2),
                                  tf.keras.layers.Conv2D(64,(3,3),activation='relu'),
                                  tf.keras.layers.MaxPooling2D(2,2),
                                  tf.keras.layers.Conv2D(64,(3,3),activation='relu'),
                                  tf.keras.layers.MaxPooling2D(2,2),
                                  tf.keras.layers.Flatten(),
                                  tf.keras.layers.Dense(512,activation='relu'),
                                  tf.keras.layers.Dense(1,activation='sigmoid')])

Explanation: Let’s improve our understanding of the layers:

  • Conv2D: This is the input layer where feature extraction takes place.
  • MaxPooling: This step gets the maximum pixel value therefore it reduces the size of the input image.
  • Flatten: Also, flatten is used to transform the input into a vector and feed it into the fully connected layer.
  • Sigmoid activation: Used in the output layer for binary classification.
  • ReLU activation: To handle non-linearity.
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 298, 298, 16)      448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 149, 149, 16)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 147, 147, 32)      4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 73, 73, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 71, 71, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 35, 35, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 33, 33, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               1606144   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513       
=================================================================
Total params: 1,704,097
Trainable params: 1,704,097
Non-trainable params: 0

Compiling The Model

from tensorflow.keras.optimizers import RMSprop
model.compile(loss='binary_crossentropy',optimizer=RMSprop(lr=0.001),metrics=['accuracy'])

The loss function is ‘binary_crossentropy’ to deal with binary classification. The RMS prop optimizer is used and the learning rate is used for converging to get the global minima.

Using The Image Generator and Image Augmentation

from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen=ImageDataGenerator(rescale=1/255,
                                 rotation_range=40,
                                 width_shift_range=0.2,
                                 shear_range=0.2,
                                 horizontal_flip=True)
                                
validation_datagen=ImageDataGenerator(rescale=1/255,
                                      rotation_range=40,
                                      width_shift_range=0.2,
                                      shear_range=0.2,
                                      horizontal_flip=True)

train_generator=train_datagen.flow_from_directory(
    '/tmp/horse-or-human/',
    target_size=(300,300),
    batch_size=128,
    class_mode='binary'
)
validation_generator=validation_datagen.flow_from_directory(
    '/tmp/validation-horse-or-human/',
    target_size=(300,300),
     batch_size=32,
     class_mode='binary'
    
)

Output

Found 1027 images belonging to 2 classes.
Found 256 images belonging to 2 classes.

Explanation: There are a training generator and a validation generator for training and image validation which happens in batches therefore the batch_size is specified. Image rescaling is done to normalize it to the pixel range. (horizontal_flip,rotation_range shear_range )are specified to handle various types of images which is a case of image augmentation. Class_mode is binary for binary_classification.

Testing The Model

test=model.fit(train_generator,steps_per_epoch=8,epochs=15,verbose=1,validation_data=validation_generator,validation_steps=8)

Explanation: We are now going to fit the model on the training set. Also, specify the epochs which are the number of training steps.

Output

Epoch 1/15
8/8 [==============================] - 7s 933ms/step - loss: 0.8665 - accuracy: 0.5061 - val_loss: 0.6717 - val_accuracy: 0.5078
Epoch 2/15
8/8 [==============================] - 7s 851ms/step - loss: 0.5905 - accuracy: 0.7175 - val_loss: 5.5931 - val_accuracy: 0.5000
Epoch 3/15
8/8 [==============================] - 7s 847ms/step - loss: 1.2182 - accuracy: 0.7942 - val_loss: 0.4081 - val_accuracy: 0.8711
Epoch 4/15
8/8 [==============================] - 7s 858ms/step - loss: 0.2603 - accuracy: 0.8954 - val_loss: 0.7934 - val_accuracy: 0.8047
Epoch 5/15
8/8 [==============================] - 8s 953ms/step - loss: 0.1816 - accuracy: 0.9377 - val_loss: 1.5857 - val_accuracy: 0.7891
Epoch 6/15
8/8 [==============================] - 7s 859ms/step - loss: 0.1765 - accuracy: 0.9288 - val_loss: 0.4917 - val_accuracy: 0.8867
Epoch 7/15
8/8 [==============================] - 7s 849ms/step - loss: 0.1663 - accuracy: 0.9333 - val_loss: 0.5318 - val_accuracy: 0.8633
Epoch 8/15
8/8 [==============================] - 7s 851ms/step - loss: 0.3250 - accuracy: 0.8888 - val_loss: 0.9239 - val_accuracy: 0.8438
Epoch 9/15
8/8 [==============================] - 7s 857ms/step - loss: 0.0661 - accuracy: 0.9800 - val_loss: 1.0600 - val_accuracy: 0.8555
Epoch 10/15
8/8 [==============================] - 7s 858ms/step - loss: 0.1797 - accuracy: 0.9377 - val_loss: 13.9662 - val_accuracy: 0.5000
Epoch 11/15
8/8 [==============================] - 7s 853ms/step - loss: 2.0378 - accuracy: 0.8676 - val_loss: 1.3739 - val_accuracy: 0.7852
Epoch 12/15
8/8 [==============================] - 8s 959ms/step - loss: 0.0597 - accuracy: 0.9822 - val_loss: 1.5067 - val_accuracy: 0.8008
Epoch 13/15
8/8 [==============================] - 7s 860ms/step - loss: 0.0518 - accuracy: 0.9789 - val_loss: 2.4157 - val_accuracy: 0.7773
Epoch 14/15
8/8 [==============================] - 7s 858ms/step - loss: 0.0391 - accuracy: 0.9889 - val_loss: 1.2087 - val_accuracy: 0.8398
Epoch 15/15
8/8 [==============================] - 7s 901ms/step - loss: 0.0088 - accuracy: 0.9990 - val_loss: 1.6605 - val_accuracy: 0.8398

We get a validation accuracy of around 84% which is the model’s ability to deal with new images. For instance, let’s see how the model responds to an image we feed.

We will use the image of this horse given below.

import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path = '/content/' + fn
  img = image.load_img(path, target_size=(300, 300))
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)

  images = np.vstack([x])
  classes = model.predict(images, batch_size=10)
  print(classes[0])
  if classes[0]>0.5:
    print(fn + " IS A HUMAN")
  else:
    print(fn + " IS A HORSE")

Output

white-3010129_1920.jpg(image/jpeg) - 319543 bytes, last modified: 9/23/2020 - 100% done
Saving white-3010129_1920.jpg to white-3010129_1920 (3).jpg
[1.]
white-3010129_1920.jpg IS A HORSE

0ur model categorizes the image as a horse.

This marks the end of the article.

Thanks for reading!

 

Leave a Reply

Your email address will not be published.