Train an Object Detection Model with Keras

In this article, we will learn to train an object detection model in Keras using Mask RCNN. Object detection task is a difficult and important computer vision task but I will guide you step by step through this tutorial and you will find that it is actually easy once you grasp the logic. Before beginning to code, there are many important points that you must keep in mind otherwise you will end up with mental harassment. Without any further delay, let’s get started.

 

CPU or GPU?

First and foremost, the most important thing is to decide whether to do this deep learning task on your local machine or use GPU like from Google Colab or Kaggle. Now, even for a simple object detection task, it might take one hour for finishing a single epoch, forget about 5-10 epochs. So, my advice would be to go with GPU. The processing will be faster and you will get faster results. Google Colab and Kaggle both provide Jupyter notebooks with both GPU and TPU as hardware accelerators.

This tutorial is done in Google Colab. It is free to use. All you need is to go to its website and register using your Gmail id. All the notebooks will be saved on your Google drive. Performing this tutorial on your local machine might take you somewhere between one hour to one and a half hours depending on your system. But on Colab, this takes hardly 3-4 minutes. To check that you are using GPU and not CPU on Colab, run this Python snippet-

import tensorflow as tf
tf.test.gpu_device_name()
'/device:GPU:0'

If you don’t get this output, then go to Edit —> Notebook settings —> Hardware accelerator and select GPU.

 

Version

The Mask RCNN library is compatible with TensorFlow version > 1.3.0 and Keras version > 2.0.8. But, this library is not compatible with TensorFlow version 2.x. This is also one more reason to go with online notebooks since you might be having TensorFlow 2.x on your local machine. Degrading that might cause problems with your other programs. The versions used in this tutorial are-
Tensorflow 1.15.0
Keras  2.2.4
To install these specific versions on your notebook, follow the code snippet.

!pip install tensorflow==1.15
!pip install keras==2.2.4

Then restart your runtime. To verify-

import tensorflow as tf
import keras
print(tf.__version__)
print(keras.__version__)
1.15.0
2.2.4
Using TensorFlow backend.

 

Library

We will use Mask RCNN in this tutorial. Mask RCNN is the extended form of Faster RCNN. It provides the prediction of object masks along with bounding boxes. Since we are performing an object detection task, we will extract the bounding boxes. We will use Mask RCNN by Matterpot which is one of the best third-party implementations of Mask RCNN and has been widely used on various projects. This involves cloning the GitHub repository and installing the setup file. For a successful installation, follow the steps-

  1. Clone the Mask RCNN GitHub repository:
    !git clone https://github.com/matterport/Mask_RCNN.git

     

  2. Install Mask RCNN Library:
    !pip install -r 'Mask_RCNN/requirements.txt'
    !cd Mask_RCNN ; python setup.py install

     

  3. Check if the Libray was properly installed:
    !pip show mask-rcnn

    You should get the following output-

    Name: mask-rcnn
    Version: 2.1
    Summary: Mask R-CNN for object detection and instance segmentation
    Home-page: https://github.com/matterport/Mask_RCNN
    Author: Matterport
    Author-email: [email protected]
    License: MIT
    Location: /usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg
    Requires: 
    Required-by:

     

 

Dataset

The dataset used is the Kangaroo dataset by Hyunh Ngoc Anh (experiencor). It consists of 183 kangaroo photographs along with XML annotation files containing the bounding boxes for each kangaroo in each photograph. Mask RCNN provides the prediction for both bounding boxes and masks for the detected objects. But the kangaroo dataset does not provide mask so will learn to predict the bounding boxes as object detection task and ignore the masks.

To install the dataset-

!git clone https://github.com/experiencor/kangaroo.git

Restart the runtime and the click on the files symbol on the left pane of the notebook and you will see the directory where our Mask RCNN library and Kangaroo dataset is stored. In the directory “Kangaroo”, there are subdirectories “annots/” and “images/”.

“images/” contains all the kangaroo photographs in JPEG format and “annots/” contains all the annotated XML files. The filenames in both the files use a 5-digit numbering system. Also, you will notice that some images and their corresponding annotation files are missing.

Next, to parse the annotation file, lets first have a look at how an XML file looks like. Download the first annotation file(annots/00001.xml) and open it. You will see this.

<?xml version="1.0"?>
-<annotation>
    <folder>Kangaroo</folder>
    <filename>00001.jpg</filename>
    <path>......</path>
    -<source>
        <database>Unknown</database>
    </source>
    -<size>
        <width>450</width>
        <height>319</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    -<object>
        <name>kangaroo</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        -<bndbox>
            <xmin>233</xmin>
            <ymin>89</ymin>
            <xmax>386</xmax>
            <ymax>262</ymax>
        </bndbox>
   </object>
   -<object>
        <name>kangaroo</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        -<bndbox>
            <xmin>134</xmin>
            <ymin>105</ymin>
            <xmax>341</xmax>
            <ymax>253</ymax>
        </bndbox>
    </object>
</annotation>

The “size” element gives the shape of the image and the “object” element gives the bounding boxes “<bndbox>” for the kangaroo object in the image. In this XML file, we see that there two object elements and hence two bounding boxes so it is intuitive that the corresponding JPEG image must have two kangaroos.

We will need the size and bounding boxes information. We will use XPath queries to extract the data from each file. The ElementTree API provided by Python can be used to parse the XML files. For this purpose, we will define an extract_bnd_boxes() function.

#Extract bounding boxes from XML file
def extract_bnd_boxes(filename):
  # load and parse the XML file
  xml_file= ElementTree.parse(filename)
  # get the root directory
  root = xml_file.getroot()
  # extract the bounding boxes
  bnd_boxes = list()
  for box in root.findall('.//bndbox'):
    xmin = int(box.find('xmin').text)
    ymin = int(box.find('ymin').text)
    xmax = int(box.find('xmax').text)
    ymax = int(box.find('ymax').text)
    box = [xmin, ymin, xmax, ymax]			
    bnd_boxes.append(box)
    # extract the image dimensions
    width = int(root.find('.//size/width').text)
    height = int(root.find('.//size/height').text)
    return bnd_boxes, width, height

To test our function, let’s parse the first annotation file.

bnd_boxes, w, h = extract_bnd_boxes('kangaroo/annots/00001.xml')
print(bnd_boxes, w, h)
[[233, 89, 386, 262], [134, 105, 341, 253]] 450 319

 

Working with the Dataset

So, we have downloaded the dataset and also know how to read the annotation file. The next task is to make a dataset object. For this, mrcnn.utils defines a class Dataset which is the base class for all datasets. We will extend this class to make our own KangarooDataset class and add functions according to our requirements.Under the KangarooDataset class, we will define a function load_dataset() to load our dataset and override the load_mask() and image_reference() functions from mrcnn.utils.Dataset for loading the mask and loading the image path respectively.

class KangarooDataset(Dataset):
  # to load the dataset and define class and images
  def load_dataset(self, dataset_dir, is_train=True):
    pass

  # to load the masks of the images
  def load_mask(self, image_id):
    pass

  # load the path to the image
  def image_reference(self, image_id):
    pass

 

load_dataset() – This function will be used for defining the classes and images. Classes here means the output class – class 0 for background and class 1 for kangaroo. At this point, I want you to inspect the mrcnn.utils.Dataset . You will see some variables(object attributes) under __init__() and two functions add_class() and add_image().

The add_class() function defines the class and takes ‘source'(name of the dataset), ‘class_id'(0 for background is default, 1 for kangaroo class),  and ‘class_name’ (‘kangaroo’) as parameters. Another function, add_image() is used to define images and takes ‘source'(name of the dataset), ‘image_id'(filename like 0001,0002, etc. without extension), ‘path'(path of the image), and **kwargs(we will be passing the annotation file path into this) as parameters. This function updates the image_info attribute of the object in the form of dictionaries.

Also, we want to split our dataset into training(80%) and testing data(20%). We have 164 images so 131 images will go into training and 32 into testing.

def load_dataset(self, dataset_dir, is_train=True):
  # define class
  self.add_class("dataset", 1, "kangaroo")
  # data location
  image_dir = dataset_dir + '/images/'
  annotation_dir = dataset_dir + '/annots/'
  # find images
  for filename in listdir(image_dir):
    # get image id
    image_id = filename[:-4]
    # image 00090.jpg had some problem so we will skip it
    if image_id in ['00090']:
      continue
    # for training set get images upto 150
    if is_train and int(image_id) >= 150:
      continue
    # for test set get images after 151 till the end
    if not is_train and int(image_id) < 150:
      continue
    img_path = image_dir + filename
    ann_path = annotation_dir + image_id + '.xml'
    # add to dataset
    self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

 

load_mask() – Take a look at the built-in load_mask() function which we will override. This function gives the mask of the given ‘image_id’. But since our dataset has only bounding boxes, so we will be extracting bounding boxes as masks.

First, we will retrieve the annotation path stored in the ‘image_info’ list corresponding to the given ‘image_id’. Using this annotation path, we will extract the bounding boxes and dimensions of the image. Masks are two-dimensional arrays having the same dimensions as that of the image, filled with zeros except for the region of interest or where the object to be detected is. To get a mask will have to create a NumPy array of all zeros with image dimensions and one channel for each bounding box and then get all xmin, xmax, ymin and ymax from it.

def load_mask(self, image_id):
  # get the image_info for the given image_id
  img_info = self.image_info[image_id]
  # get path of the annotation file of the given image_id
  path = img_info['annotation']
  # get the bounding boxes and image dims
  bnd_boxes, w, h = self.extract_bnd_boxes(path)
  # create an array for masks
  masks = zeros([h, w, len(bnd_boxes)], dtype='uint8')
  # create masks
  class_ids = list()
  for i in range(len(bnd_boxes)):
    box = bnd_boxes[i]
    ymin, ymax = box[1], box[3]
    xmin, xmax = box[0], box[2]
    masks[ymin:ymax, xmin:xmax, i] = 1
    class_ids.append(self.class_names.index('kangaroo'))
  return masks, asarray(class_ids, dtype='int32')

 

image_reference() – Take a look at the built-in image_reference() function. This function simply returns the path of the given ‘image_id’ which is stored with the key ‘path’ from the dictionary corresponding to the image_id from the  ‘image_info’ list.

def image_reference(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # return the value of the key 'path' -> image path
    return img_info['path']

 

Putting all these functions together under the KangarooDataset class, we get –

from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes

 

class KangarooDataset(Dataset):
  # load the dataset
  def load_dataset(self, dataset_dir, is_train=True):
    # define class
    self.add_class("dataset", 1, "kangaroo")
    # data location
    image_dir = dataset_dir + '/images/'
    annotation_dir = dataset_dir + '/annots/'
    # find images
    for filename in listdir(image_dir):
      # get image id
      image_id = filename[:-4]
      # image 00090.jpg had some problem so we will skip it
      if image_id in ['00090']:
        continue
      # for training set get images upto 150
      if is_train and int(image_id) >= 150:
        continue
      # for test set get images after 151 till the end
      if not is_train and int(image_id) < 150:
        continue
      img_path = image_dir + filename
      ann_path = annotation_dir + image_id + '.xml'
      # add to dataset
      self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
 
  #Extract bounding boxes from XML file
  def extract_bnd_boxes(self,filename):
    # load and parse the XML file
    xml_file= ElementTree.parse(filename)
    # get the root directory
    root = xml_file.getroot()
    # extract the bounding boxes
    bnd_boxes = list()
    for box in root.findall('.//bndbox'):
      xmin = int(box.find('xmin').text)
      ymin = int(box.find('ymin').text)
      xmax = int(box.find('xmax').text)
      ymax = int(box.find('ymax').text)
      box = [xmin, ymin, xmax, ymax]      
      bnd_boxes.append(box)
      # extract the image dimensions
      width = int(root.find('.//size/width').text)
      height = int(root.find('.//size/height').text)
    return bnd_boxes, width, height
 
  # load masks for the given image_id
  def load_mask(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # get path of the annotation file of the given image_id
    path = img_info['annotation']
    # get the bounding boxes and image dims
    bnd_boxes, w, h = self.extract_bnd_boxes(path)
    # create an array for masks
    masks = zeros([h, w, len(bnd_boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(bnd_boxes)):
      box = bnd_boxes[i]
      ymin, ymax = box[1], box[3]
      xmin, xmax = box[0], box[2]
      masks[ymin:ymax, xmin:xmax, i] = 1
      class_ids.append(self.class_names.index('kangaroo'))
    return masks, asarray(class_ids, dtype='int32')
 
  # load the image path
  def image_reference(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # return the value of the key 'path' -> image path
    return img_info['path']

 

Now, create two instances/objects of the class to get the training and testing images.

# training dataset
train_dataset = KangarooDataset()
# load the dataset
train_dataset.load_dataset('kangaroo', is_train=True)
train_dataset.prepare()
print('No. of training images: %d' % len(train_dataset.image_ids))

# test dataset
test_dataset = KangarooDataset()
# load the dataset
test_dataset.load_dataset('kangaroo', is_train=False)
test_dataset.prepare()
print('No. of test images: %d' % len(test_dataset.image_ids))
No. of training images: 131
No. of test images: 32

Read the built-in prepare() function to know how the dataset class is prepared for use.

We see that our dataset is defined properly and we are able to get the training and testing dataset correctly. The next step is to test whether the images and their corresponding masks are loading correctly or not.

Let’s first check the shape of an image and the corresponding mask.

# load an image and see the image shape and mask shape
image_id = 0
image = train_dataset.load_image(image_id)
mask,id = train_dataset.load_mask(image_id)
print(image.shape,'\n', mask.shape)
(320, 450, 3) 
(320, 450, 1)

See, the dimensions of both the image and mask are the same. The only difference is that the 3 in (320,450,3) in image shape represents the 3 color channels whereas 1 in (320,540,1) in mask shape represents 1 channel for 1 mask. Had there been two kangaroo objects and thus two masks for the image, the shape of the mask would have been (320,450,2).

Now, plot the mask over the image of the first 4 images.

import matplotlib.pyplot as plt
#plot the first 4 images
plt.figure(figsize = (12,12))
for i in range(4):
  plt.subplot(2,2,i+1)
  img = train_dataset.load_image(i)
  plt.imshow(img)
  mask,id = train_dataset.load_mask(i)
  for j in range(mask.shape[2]):
    plt.imshow(mask[:,:,j], cmap = 'gray', alpha = 0.3)
plt.show()    

All the images and masks are loading perfectly. The built-in mrcnn.visualize.display_instances() plots the image with masks, bounding boxes, and class labels. The bounding boxes are extracted through extract_bboxes() function. So let’s make use of this function and plot an image with its bounding box as a mask.

# image id
image_id = 10
# load image
image = train_dataset.load_image(image_id)
# load masks and the class ids for the give image_id
mask, class_ids = train_dataset.load_mask(image_id)
# extract bounding boxes from the masks 
box = extract_bboxes(mask)
print(box.shape,mask.shape,class_ids.shape)
# display the image with masks and bounding boxes
display_instances(image, box, mask, class_ids, train_dataset.class_names)

 

# CHECKPOINT 1

Here is the complete code of whatever we have done till now.

from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes

class KangarooDataset(Dataset):
  # load the dataset
  def load_dataset(self, dataset_dir, is_train=True):
    # define class
    self.add_class("dataset", 1, "kangaroo")
    # data location
    image_dir = dataset_dir + '/images/'
    annotation_dir = dataset_dir + '/annots/'
    # find images
    for filename in listdir(image_dir):
      # get image id
      image_id = filename[:-4]
      # image 00090.jpg had some problem so we will skip it
      if image_id in ['00090']:
        continue
      # for training set get images upto 150
      if is_train and int(image_id) >= 150:
        continue
      # for test set get images after 151 till the end
      if not is_train and int(image_id) < 150:
        continue
      img_path = image_dir + filename
      ann_path = annotation_dir + image_id + '.xml'
      # add to dataset
      self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
 
  #Extract bounding boxes from XML file
  def extract_bnd_boxes(self,filename):
    # load and parse the XML file
    xml_file= ElementTree.parse(filename)
    # get the root directory
    root = xml_file.getroot()
    # extract the bounding boxes
    bnd_boxes = list()
    for box in root.findall('.//bndbox'):
      xmin = int(box.find('xmin').text)
      ymin = int(box.find('ymin').text)
      xmax = int(box.find('xmax').text)
      ymax = int(box.find('ymax').text)
      box = [xmin, ymin, xmax, ymax]      
      bnd_boxes.append(box)
      # extract the image dimensions
      width = int(root.find('.//size/width').text)
      height = int(root.find('.//size/height').text)
    return bnd_boxes, width, height
 
  # load masks for the given image_id
  def load_mask(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # get path of the annotation file of the given image_id
    path = img_info['annotation']
    # get the bounding boxes and image dims
    bnd_boxes, w, h = self.extract_bnd_boxes(path)
    # create an array for masks
    masks = zeros([h, w, len(bnd_boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(bnd_boxes)):
      box = bnd_boxes[i]
      ymin, ymax = box[1], box[3]
      xmin, xmax = box[0], box[2]
      masks[ymin:ymax, xmin:xmax, i] = 1
      class_ids.append(self.class_names.index('kangaroo'))
    return masks, asarray(class_ids, dtype='int32')
 
  # load the image path
  def image_reference(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # return the value of the key 'path' -> image path
    return img_info['path']


# training dataset
train_dataset = KangarooDataset()
train_dataset.load_dataset('kangaroo', is_train=True)
train_dataset.prepare()
print('No. of training images: %d' % len(train_dataset.image_ids))
# test dataset
test_dataset = KangarooDataset()
test_dataset.load_dataset('kangaroo', is_train=False)
test_dataset.prepare()
print('No. of test images: %d' % len(test_dataset.image_ids))


# image id
image_id = 10
# load image
image = train_dataset.load_image(image_id)
# load masks and the class ids for the give image_id
mask, class_ids = train_dataset.load_mask(image_id)
# extract bounding boxes from the masks 
box = extract_bboxes(mask)
print(box.shape,mask.shape,class_ids.shape)
# display the image with masks and bounding boxes
display_instances(image, box, mask, class_ids, train_dataset.class_names)

 

Train the Model using Kangaroo Dataset

We will use transfer learning for training the Mask-RCNN model. For this purpose, we will be using the pre-trained weights of the model trained on MS COCO dataset. Download the model weights in the working directory with the name “mask_rcnn_coco.h5”. For this, first, we will get the root directory, then create a model directory to store all the logs during the epochs and the trained model. Then we will create a local path to the pre-trained coco weights.

import os
ROOT_DIR = os.getcwd()
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
from mrcnn import utils
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

 

The next task is to define a configuration object for the model. Take a look at mrcnn.config.py. We will extend the Config class to make our own configuration class and then use some attributes. The NAME attribute to define the name of the configuration, NUM_CLASSES to define the number of classification classes including the background, and STEPS_PER_EPOCH to define the number of training steps per epoch.

from mrcnn.config import Config
from mrcnn.model import MaskRCNN

# define a configuration for the model
class KangarooConfig(Config):
  # Give the configuration a recognizable name
  NAME = "kangaroo_cfg"
  # Number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # Number of training steps per epoch
  STEPS_PER_EPOCH = 131
 
# define a config object
config = KangarooConfig()
config.display()
Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     2
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 2
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
....

 

Next, we will define the object detection model and store the configuration files and checkpoints in the “MODEL_DIR” created previously. And then, load the pre-trained weights from “mask_rcnn_coco.h5” using the load_weights() function. We will exclude the output layers as we will be defining our own output layer. For this, use the ‘exclude’ argument. And then, finally, we will train the model on the training dataset and use the default learning rate. Here, we will train only the output layers of the model also called “heads”.

# define the model
model = MaskRCNN(mode='training', model_dir=MODEL_DIR, config=config)

# load pre-trained weights of mscoco and exclude the output layers
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])

# train the model(only the output layers)
model.train(train_dataset, test_dataset, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads')
Epoch 1/1
131/131 [==============================] - 149s 1s/step - loss: 1.1330 - rpn_class_loss: 0.0068 - rpn_bbox_loss: 0.2397 - mrcnn_class_loss: 0.0340 - mrcnn_bbox_loss: 0.4222 - mrcnn_mask_loss: 0.4303 - val_loss: 0.8485 - val_rpn_class_loss: 0.0099 - val_rpn_bbox_loss: 0.2603 - val_mrcnn_class_loss: 0.0246 - val_mrcnn_bbox_loss: 0.2911 - val_mrcnn_mask_loss: 0.2626

 

Evaluate the Model

The metric used for measuring the accuracy of the objection detection models is called Average Precision(AP). If we plot precision against recall, the area under the curve gives the AP. As precision and recall always fall between 0 and 1, so does AP. The mean of the AP over the entire dataset is called Mean Average Precision or mAP. In object detection models, the prediction is the bounding boxes. The goodness of the model is based on how well the predicted bounding boxes overlap with the ground truth. This is calculated by dividing the area of overlap or intersection by the total area of both the boxes or the union. This is known as Intersection over Union or IoU.

We will use mrcnn.utils.compute_ap to compute Average Precision with the default IoU threshold(0.5). The prediction is correct if IoU > 0.5 and wrong if IoU < 0.5. Then we can calculate the mean of all the APs to get the mean Average Precision.

For this, first, we again extend the Config class for prediction and again set some attributes to the required value. This time we will the NAME, NUM_CLASSES, GPU_COUNT, and IMAGES_PER_GPU  attributes. The last two parameters are required regardless of whether you are using CPU or GPU.

class PredictionConfig(Config):
  # name of the configuration
  NAME = "kangaroo_cfg"
  # number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # GPU configuration
  GPU_COUNT = 1
  IMAGES_PER_GPU = 1

Next, create a PredictionConfig to define the model and change the mode value from “training” to “inference”. Find the model path and load the trained weights.

# create PredictionConfig object to make prediction
cfg = PredictionConfig()
# define the model with mode set to inference
model = MaskRCNN(mode='inference', model_dir=MODEL_DIR, config=cfg)

# get the model path
model_path = model.find_last()

# load trained weights
assert model_path != "", "Provide path to trained weights"
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)
Loading weights from  /content/logs/kangaroo_cfg20200519T0448/mask_rcnn_kangaroo_cfg_0001.h5
Re-starting from epoch 1

 

Now, to evaluate the model, we will define an evaluate_model() function. The function will have the dataset, model and configuration as parameters. To get the APs of the entire dataset, we will create an empty list of APs. Then, for each image_id in the given dataset, we will load the image, its bounding boxes and masks (ground truth data) using load_image_gt() function from mrcnn.model, convert pixel values to float using mold_image() function from mrcnn.model, expand the shape and add a new axis at the zeroth position using np.expand_dims() and then finally predict the output.

from numpy import expand_dims
from numpy import mean
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image

# calculate mAP for the model
def evaluate_model(dataset, model, cfg):
  APs = list()
  for image_id in dataset.image_ids:
    # load image, bounding boxes and masks for the given image id
    image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
    # convert pixel values of the image
    molded_image = mold_image(image, cfg)
    # expand the shape of the image
    new_img = expand_dims(molded_image, 0)
    # make prediction
    predict = model.detect(new_img, verbose=0)
    # extract results for first axis
    pos0 = predict[0]
    # compute AP
    AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, pos0["rois"], pos0["class_ids"], pos0["scores"], pos0['masks'])
    # store all the AP in the list
    APs.append(AP)
  # compute the mean AP across the dataset
  mAP = mean(APs)
  return mAP

# evaluate on training dataset
training_mAP = evaluate_model(train_dataset, model, cfg)
print("Train mAP: %.3f" % training_mAP)
# evaluate model on test dataset
testing_mAP = evaluate_model(test_dataset, model, cfg)
print("Test mAP: %.3f" % testing_mAP)
Train mAP: 0.884
Test mAP: 0.922

 

As you can see, the computed mAP for the training dataset is 0.884 and for test dataset is 0.922. To achieve an improved mAP, increase the number of epochs.

We got decent mAP values and everything is working fine. So, finally. we can do the task for which our model was created – detect kangaroos in new photos.

 

# CHECKPOINT 2

Here is the complete code of whatever we have done till now.

from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from numpy import expand_dims
from numpy import mean
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image

class KangarooDataset(Dataset):
  # load the dataset
  def load_dataset(self, dataset_dir, is_train=True):
    # define class
    self.add_class("dataset", 1, "kangaroo")
    # data location
    image_dir = dataset_dir + '/images/'
    annotation_dir = dataset_dir + '/annots/'
    # find images
    for filename in listdir(image_dir):
      # get image id
      image_id = filename[:-4]
      # image 00090.jpg had some problem so we will skip it
      if image_id in ['00090']:
        continue
      # for training set get images upto 150
      if is_train and int(image_id) >= 150:
        continue
      # for test set get images after 151 till the end
      if not is_train and int(image_id) < 150:
        continue
      img_path = image_dir + filename
      ann_path = annotation_dir + image_id + '.xml'
      # add to dataset
      self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
 
  #Extract bounding boxes from XML file
  def extract_bnd_boxes(self,filename):
    # load and parse the XML file
    xml_file= ElementTree.parse(filename)
    # get the root directory of the file
    root = xml_file.getroot()
    # extract the bounding boxes
    bnd_boxes = list()
    for box in root.findall('.//bndbox'):
      xmin = int(box.find('xmin').text)
      ymin = int(box.find('ymin').text)
      xmax = int(box.find('xmax').text)
      ymax = int(box.find('ymax').text)
      box = [xmin, ymin, xmax, ymax]      
      bnd_boxes.append(box)
      # extract the image dimensions
      width = int(root.find('.//size/width').text)
      height = int(root.find('.//size/height').text)
    return bnd_boxes, width, height
 
  # load masks for the given image_id
  def load_mask(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # get path of the annotation file of the given image_id
    path = img_info['annotation']
    # get the bounding boxes and image dims
    bnd_boxes, w, h = self.extract_bnd_boxes(path)
    # create an array for masks
    masks = zeros([h, w, len(bnd_boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(bnd_boxes)):
      box = bnd_boxes[i]
      ymin, ymax = box[1], box[3]
      xmin, xmax = box[0], box[2]
      masks[ymin:ymax, xmin:xmax, i] = 1
      class_ids.append(self.class_names.index('kangaroo'))
    return masks, asarray(class_ids, dtype='int32')
 
  # load the image path
  def image_reference(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # return the value of the key 'path' -> image path
    return img_info['path']
 
# training dataset
train_dataset = KangarooDataset()
train_dataset.load_dataset('kangaroo', is_train=True)
train_dataset.prepare()
print('No. of training images: %d' % len(train_dataset.image_ids))

# test dataset
test_dataset = KangarooDataset()
test_dataset.load_dataset('kangaroo', is_train=False)
test_dataset.prepare()
print('No. of test images: %d' % len(test_dataset.image_ids))


# image id
image_id = 10
# load image
image = train_dataset.load_image(image_id)
# load masks and the class ids for the give image_id
mask, class_ids = train_dataset.load_mask(image_id)
# extract bounding boxes from the masks 
box = extract_bboxes(mask)
print(box.shape,mask.shape,class_ids.shape)
# display the image with masks and bounding boxes
display_instances(image, box, mask, class_ids, train_dataset.class_names)

import os
ROOT_DIR = os.getcwd()
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
from mrcnn import utils
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# define a configuration for the model
class KangarooConfig(Config):
  # Give the configuration a recognizable name
  NAME = "kangaroo_cfg"
  # Number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # Number of training steps per epoch
  STEPS_PER_EPOCH = 131
 
# prepare config
config = KangarooConfig()
config.display()

# define the model
model = MaskRCNN(mode='training', model_dir=MODEL_DIR, config=config)
# load pre-trained weights of mscoco and exclude the output layers
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
# train the model(only the output layers)
model.train(train_dataset, test_dataset, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads')

class PredictionConfig(Config):
  # name of the configuration
  NAME = "kangaroo_cfg"
  # number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # GPU configuration
  GPU_COUNT = 1
  IMAGES_PER_GPU = 1

# create PredictionConfig object to make prediction
cfg = PredictionConfig()
# define the model with mode set to inference
model = MaskRCNN(mode='inference', model_dir=MODEL_DIR, config=cfg)
# get the model path
model_path = model.find_last()
# load trained weights
assert model_path != "", "Provide path to trained weights"
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

# calculate mAP for the model
def evaluate_model(dataset, model, cfg):
  APs = list()
  for image_id in dataset.image_ids:
    # load image, bounding boxes and masks for the given image id
    image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
    # convert pixel values of the image
    molded_image = mold_image(image, cfg)
    # expand the shape of the image
    new_img = expand_dims(molded_image, 0)
    # make prediction
    predict = model.detect(new_img, verbose=0)
    # extract results for first axis
    pos0 = predict[0]
    # compute AP
    AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, pos0["rois"], pos0["class_ids"], pos0["scores"], pos0['masks'])
    # store all the AP in the list
    APs.append(AP)
  # compute the mean AP across the dataset
  mAP = mean(APs)
  return mAP

# evaluate on training dataset
training_mAP = evaluate_model(train_dataset, model, cfg)
print("Train mAP: %.3f" % training_mAP)
# evaluate model on test dataset
testing_mAP = evaluate_model(test_dataset, model, cfg)
print("Test mAP: %.3f" % testing_mAP)

 

Detect Kangaroos in Any Picture

This object detection model is made to detect kangaroos. So, download any two kangaroo images of your choice from Google on your local machine. Name them “kangaroo1.jpg” and “kangaroo2.jpg”. On the left panel of your Colab notebook, click on the file symbol. You will see the “upload” option. Use that to upload both the pictures. Hover your mouse on the first image. You will see three dots. Click on them and select “copy path”. Paste the path in the variable img1. Do the same for the second image.

import cv2
from matplotlib.patches import Rectangle

img1 = cv2.imread(r'/content/kangaroo1.jpg')
img2 = cv2.imread(r'/content/kangaroo2.jpg')
images = [img1,img2]

pyplot.figure(figsize = (20,20))
for i in range(len(images)):
  img = images[i]   
  molded_image = mold_image(img,cfg)
  new_img = expand_dims(molded_image,0)
  predict = model.detect(new_img, verbose = 0)

  pyplot.subplot(1,2,i+1)
  pyplot.imshow(img)
  pyplot.title('Predicted')
  ax = pyplot.gca()
  
  for box in predict[0]['rois']:
      y1,x1,y2,x2 = box
      width, height = x2 - x1, y2 - y1
      rectangle = Rectangle((x1, y1), width, height, fill=False, color='red')
      ax.add_patch(rectangle)

    

 

There we go. The model detects all the kangaroos correctly, but, in the second image, the model falsely detects the man to be a kangaroo. The object detection or prediction can be made more accurate by increasing the dataset, fine-tuning the data, training the complete model instead of just the output layers, and increasing the number of epochs.

 

# CHECKPOINT 3

Finally, here is the complete code for object detection using the Kangaroo dataset.

from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from numpy import expand_dims
from numpy import mean
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image
import cv2
from matplotlib.patches import Rectangle

class KangarooDataset(Dataset):
  # load the dataset
  def load_dataset(self, dataset_dir, is_train=True):
    # define class
    self.add_class("dataset", 1, "kangaroo")
    # data location
    image_dir = dataset_dir + '/images/'
    annotation_dir = dataset_dir + '/annots/'
    # find images
    for filename in listdir(image_dir):
      # get image id
      image_id = filename[:-4]
      # image 00090.jpg had some problem so we will skip it
      if image_id in ['00090']:
        continue
      # for training set get images upto 150
      if is_train and int(image_id) >= 150:
        continue
      # for test set get images after 151 till the end
      if not is_train and int(image_id) < 150:
        continue
      img_path = image_dir + filename
      ann_path = annotation_dir + image_id + '.xml'
      # add to dataset
      self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
 
  #Extract bounding boxes from XML file
  def extract_bnd_boxes(self,filename):
    # load and parse the XML file
    xml_file= ElementTree.parse(filename)
    # get the root directory of the file
    root = xml_file.getroot()
    # extract the bounding boxes
    bnd_boxes = list()
    for box in root.findall('.//bndbox'):
      xmin = int(box.find('xmin').text)
      ymin = int(box.find('ymin').text)
      xmax = int(box.find('xmax').text)
      ymax = int(box.find('ymax').text)
      box = [xmin, ymin, xmax, ymax]      
      bnd_boxes.append(box)
      # extract the image dimensions
      width = int(root.find('.//size/width').text)
      height = int(root.find('.//size/height').text)
    return bnd_boxes, width, height
 
  # load masks for the given image_id
  def load_mask(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # get path of the annotation file of the given image_id
    path = img_info['annotation']
    # get the bounding boxes and image dims
    bnd_boxes, w, h = self.extract_bnd_boxes(path)
    # create an array for masks
    masks = zeros([h, w, len(bnd_boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(bnd_boxes)):
      box = bnd_boxes[i]
      ymin, ymax = box[1], box[3]
      xmin, xmax = box[0], box[2]
      masks[ymin:ymax, xmin:xmax, i] = 1
      class_ids.append(self.class_names.index('kangaroo'))
    return masks, asarray(class_ids, dtype='int32')
 
  # load the image path
  def image_reference(self, image_id):
    # get the image_info for the given image_id
    img_info = self.image_info[image_id]
    # return the value of the key 'path' -> image path
    return img_info['path']
 
# training dataset
train_dataset = KangarooDataset()
train_dataset.load_dataset('kangaroo', is_train=True)
train_dataset.prepare()
print('No. of training images: %d' % len(train_dataset.image_ids))

# test dataset
test_dataset = KangarooDataset()
test_dataset.load_dataset('kangaroo', is_train=False)
test_dataset.prepare()
print('No. of test images: %d' % len(test_dataset.image_ids))


# image id
image_id = 10
# load image
image = train_dataset.load_image(image_id)
# load masks and the class ids for the give image_id
mask, class_ids = train_dataset.load_mask(image_id)
# extract bounding boxes from the masks 
box = extract_bboxes(mask)
print(box.shape,mask.shape,class_ids.shape)
# display the image with masks and bounding boxes
display_instances(image, box, mask, class_ids, train_dataset.class_names)

import os
ROOT_DIR = os.getcwd()
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
from mrcnn import utils
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# define a configuration for the model
class KangarooConfig(Config):
  # Give the configuration a recognizable name
  NAME = "kangaroo_cfg"
  # Number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # Number of training steps per epoch
  STEPS_PER_EPOCH = 131
 
# prepare config
config = KangarooConfig()
config.display()

# define the model
model = MaskRCNN(mode='training', model_dir=MODEL_DIR, config=config)
# load pre-trained weights of mscoco and exclude the output layers
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
# train the model(only the output layers)
model.train(train_dataset, test_dataset, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads')

class PredictionConfig(Config):
  # name of the configuration
  NAME = "kangaroo_cfg"
  # number of classes (background + kangaroo)
  NUM_CLASSES = 1 + 1
  # GPU configuration
  GPU_COUNT = 1
  IMAGES_PER_GPU = 1

# create PredictionConfig object to make prediction
cfg = PredictionConfig()
# define the model with mode set to inference
model = MaskRCNN(mode='inference', model_dir=MODEL_DIR, config=cfg)
# get the model path
model_path = model.find_last()
# load trained weights
assert model_path != "", "Provide path to trained weights"
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

# calculate mAP for the model
def evaluate_model(dataset, model, cfg):
  APs = list()
  for image_id in dataset.image_ids:
    # load image, bounding boxes and masks for the given image id
    image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
    # convert pixel values of the image
    molded_image = mold_image(image, cfg)
    # expand the shape of the
    new_img = expand_dims(molded_image, 0)
    # make prediction
    predict = model.detect(new_img, verbose=0)
    # extract results for first axis
    pos0 = predict[0]
    # compute AP
    AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, pos0["rois"], pos0["class_ids"], pos0["scores"], pos0['masks'])
    # store all the AP in the list
    APs.append(AP)
  # compute the mean AP across the dataset
  mAP = mean(APs)
  return mAP

# evaluate on training dataset
training_mAP = evaluate_model(train_dataset, model, cfg)
print("Train mAP: %.3f" % training_mAP)
# evaluate model on test dataset
testing_mAP = evaluate_model(test_dataset, model, cfg)
print("Test mAP: %.3f" % testing_mAP)

img1 = cv2.imread(r'/content/kangaroo1.jpg')
img2 = cv2.imread(r'/content/kangaroo2.jpg')
images = [img1,img2]

pyplot.figure(figsize = (20,20))
for i in range(len(images)):
  img = images[i]
  molded_image = mold_image(img,cfg)
  new_img = expand_dims(molded_image,0)
  predict = model.detect(new_img, verbose = 0)

  pyplot.subplot(1,2,i+1)
  pyplot.imshow(img)
  pyplot.title('Predicted')
  ax = pyplot.gca()
  
  for box in predict[0]['rois']:
      y1,x1,y2,x2 = box
      width, height = x2 - x1, y2 - y1
      rectangle = Rectangle((x1, y1), width, height, fill=False, color='red')
      ax.add_patch(rectangle)

    

 

 

Congratulations! You have come a long way. Want to add your thoughts? Need any further help? Leave a comment below and I will get back to you ASAP 🙂

 

For further reading:

 

Leave a Reply

Your email address will not be published.