Train an Object Detection Model with Keras
In this article, we will learn to train an object detection model in Keras using Mask RCNN. Object detection task is a difficult and important computer vision task but I will guide you step by step through this tutorial and you will find that it is actually easy once you grasp the logic. Before beginning to code, there are many important points that you must keep in mind otherwise you will end up with mental harassment. Without any further delay, let’s get started.
CPU or GPU?
First and foremost, the most important thing is to decide whether to do this deep learning task on your local machine or use GPU like from Google Colab or Kaggle. Now, even for a simple object detection task, it might take one hour for finishing a single epoch, forget about 5-10 epochs. So, my advice would be to go with GPU. The processing will be faster and you will get faster results. Google Colab and Kaggle both provide Jupyter notebooks with both GPU and TPU as hardware accelerators.
This tutorial is done in Google Colab. It is free to use. All you need is to go to its website and register using your Gmail id. All the notebooks will be saved on your Google drive. Performing this tutorial on your local machine might take you somewhere between one hour to one and a half hours depending on your system. But on Colab, this takes hardly 3-4 minutes. To check that you are using GPU and not CPU on Colab, run this Python snippet-
import tensorflow as tf tf.test.gpu_device_name()
'/device:GPU:0'
If you don’t get this output, then go to Edit —> Notebook settings —> Hardware accelerator and select GPU.
Version
The Mask RCNN library is compatible with TensorFlow version > 1.3.0 and Keras version > 2.0.8. But, this library is not compatible with TensorFlow version 2.x. This is also one more reason to go with online notebooks since you might be having TensorFlow 2.x on your local machine. Degrading that might cause problems with your other programs. The versions used in this tutorial are-
Tensorflow 1.15.0
Keras 2.2.4
To install these specific versions on your notebook, follow the code snippet.
!pip install tensorflow==1.15
!pip install keras==2.2.4
Then restart your runtime. To verify-
import tensorflow as tf import keras print(tf.__version__) print(keras.__version__)
Library
We will use Mask RCNN in this tutorial. Mask RCNN is the extended form of Faster RCNN. It provides the prediction of object masks along with bounding boxes. Since we are performing an object detection task, we will extract the bounding boxes. We will use Mask RCNN by Matterpot which is one of the best third-party implementations of Mask RCNN and has been widely used on various projects. This involves cloning the GitHub repository and installing the setup file. For a successful installation, follow the steps-
- Clone the Mask RCNN GitHub repository:
!git clone https://github.com/matterport/Mask_RCNN.git
- Install Mask RCNN Library:
!pip install -r 'Mask_RCNN/requirements.txt'
!cd Mask_RCNN ; python setup.py install
- Check if the Libray was properly installed:
!pip show mask-rcnn
You should get the following output-
Name: mask-rcnn Version: 2.1 Summary: Mask R-CNN for object detection and instance segmentation Home-page: https://github.com/matterport/Mask_RCNN Author: Matterport Author-email: [email protected] License: MIT Location: /usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg Requires: Required-by:
Dataset
The dataset used is the Kangaroo dataset by Hyunh Ngoc Anh (experiencor). It consists of 183 kangaroo photographs along with XML annotation files containing the bounding boxes for each kangaroo in each photograph. Mask RCNN provides the prediction for both bounding boxes and masks for the detected objects. But the kangaroo dataset does not provide mask so will learn to predict the bounding boxes as object detection task and ignore the masks.
To install the dataset-
!git clone https://github.com/experiencor/kangaroo.git
Restart the runtime and the click on the files symbol on the left pane of the notebook and you will see the directory where our Mask RCNN library and Kangaroo dataset is stored. In the directory “Kangaroo”, there are subdirectories “annots/” and “images/”.
“images/” contains all the kangaroo photographs in JPEG format and “annots/” contains all the annotated XML files. The filenames in both the files use a 5-digit numbering system. Also, you will notice that some images and their corresponding annotation files are missing.
Next, to parse the annotation file, lets first have a look at how an XML file looks like. Download the first annotation file(annots/00001.xml) and open it. You will see this.
<?xml version="1.0"?> -<annotation> <folder>Kangaroo</folder> <filename>00001.jpg</filename> <path>......</path> -<source> <database>Unknown</database> </source> -<size> <width>450</width> <height>319</height> <depth>3</depth> </size> <segmented>0</segmented> -<object> <name>kangaroo</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> -<bndbox> <xmin>233</xmin> <ymin>89</ymin> <xmax>386</xmax> <ymax>262</ymax> </bndbox> </object> -<object> <name>kangaroo</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> -<bndbox> <xmin>134</xmin> <ymin>105</ymin> <xmax>341</xmax> <ymax>253</ymax> </bndbox> </object> </annotation>
The “size” element gives the shape of the image and the “object” element gives the bounding boxes “<bndbox>” for the kangaroo object in the image. In this XML file, we see that there two object elements and hence two bounding boxes so it is intuitive that the corresponding JPEG image must have two kangaroos.
We will need the size and bounding boxes information. We will use XPath queries to extract the data from each file. The ElementTree API provided by Python can be used to parse the XML files. For this purpose, we will define an extract_bnd_boxes() function.
#Extract bounding boxes from XML file def extract_bnd_boxes(filename): # load and parse the XML file xml_file= ElementTree.parse(filename) # get the root directory root = xml_file.getroot() # extract the bounding boxes bnd_boxes = list() for box in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) box = [xmin, ymin, xmax, ymax] bnd_boxes.append(box) # extract the image dimensions width = int(root.find('.//size/width').text) height = int(root.find('.//size/height').text) return bnd_boxes, width, height
To test our function, let’s parse the first annotation file.
bnd_boxes, w, h = extract_bnd_boxes('kangaroo/annots/00001.xml') print(bnd_boxes, w, h)
[[233, 89, 386, 262], [134, 105, 341, 253]] 450 319
Working with the Dataset
So, we have downloaded the dataset and also know how to read the annotation file. The next task is to make a dataset object. For this, mrcnn.utils defines a class Dataset which is the base class for all datasets. We will extend this class to make our own KangarooDataset class and add functions according to our requirements.Under the KangarooDataset class, we will define a function load_dataset() to load our dataset and override the load_mask() and image_reference() functions from mrcnn.utils.Dataset for loading the mask and loading the image path respectively.
class KangarooDataset(Dataset): # to load the dataset and define class and images def load_dataset(self, dataset_dir, is_train=True): pass # to load the masks of the images def load_mask(self, image_id): pass # load the path to the image def image_reference(self, image_id): pass
load_dataset() – This function will be used for defining the classes and images. Classes here means the output class – class 0 for background and class 1 for kangaroo. At this point, I want you to inspect the mrcnn.utils.Dataset . You will see some variables(object attributes) under __init__() and two functions add_class() and add_image().
The add_class() function defines the class and takes ‘source'(name of the dataset), ‘class_id'(0 for background is default, 1 for kangaroo class), and ‘class_name’ (‘kangaroo’) as parameters. Another function, add_image() is used to define images and takes ‘source'(name of the dataset), ‘image_id'(filename like 0001,0002, etc. without extension), ‘path'(path of the image), and **kwargs(we will be passing the annotation file path into this) as parameters. This function updates the image_info attribute of the object in the form of dictionaries.
Also, we want to split our dataset into training(80%) and testing data(20%). We have 164 images so 131 images will go into training and 32 into testing.
def load_dataset(self, dataset_dir, is_train=True): # define class self.add_class("dataset", 1, "kangaroo") # data location image_dir = dataset_dir + '/images/' annotation_dir = dataset_dir + '/annots/' # find images for filename in listdir(image_dir): # get image id image_id = filename[:-4] # image 00090.jpg had some problem so we will skip it if image_id in ['00090']: continue # for training set get images upto 150 if is_train and int(image_id) >= 150: continue # for test set get images after 151 till the end if not is_train and int(image_id) < 150: continue img_path = image_dir + filename ann_path = annotation_dir + image_id + '.xml' # add to dataset self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
load_mask() – Take a look at the built-in load_mask() function which we will override. This function gives the mask of the given ‘image_id’. But since our dataset has only bounding boxes, so we will be extracting bounding boxes as masks.
First, we will retrieve the annotation path stored in the ‘image_info’ list corresponding to the given ‘image_id’. Using this annotation path, we will extract the bounding boxes and dimensions of the image. Masks are two-dimensional arrays having the same dimensions as that of the image, filled with zeros except for the region of interest or where the object to be detected is. To get a mask will have to create a NumPy array of all zeros with image dimensions and one channel for each bounding box and then get all xmin, xmax, ymin and ymax from it.
def load_mask(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # get path of the annotation file of the given image_id path = img_info['annotation'] # get the bounding boxes and image dims bnd_boxes, w, h = self.extract_bnd_boxes(path) # create an array for masks masks = zeros([h, w, len(bnd_boxes)], dtype='uint8') # create masks class_ids = list() for i in range(len(bnd_boxes)): box = bnd_boxes[i] ymin, ymax = box[1], box[3] xmin, xmax = box[0], box[2] masks[ymin:ymax, xmin:xmax, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32')
image_reference() – Take a look at the built-in image_reference() function. This function simply returns the path of the given ‘image_id’ which is stored with the key ‘path’ from the dictionary corresponding to the image_id from the ‘image_info’ list.
def image_reference(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # return the value of the key 'path' -> image path return img_info['path']
Putting all these functions together under the KangarooDataset class, we get –
from os import listdir from xml.etree import ElementTree from numpy import zeros from numpy import asarray from mrcnn.utils import Dataset from matplotlib import pyplot from mrcnn.visualize import display_instances from mrcnn.utils import extract_bboxes
class KangarooDataset(Dataset): # load the dataset def load_dataset(self, dataset_dir, is_train=True): # define class self.add_class("dataset", 1, "kangaroo") # data location image_dir = dataset_dir + '/images/' annotation_dir = dataset_dir + '/annots/' # find images for filename in listdir(image_dir): # get image id image_id = filename[:-4] # image 00090.jpg had some problem so we will skip it if image_id in ['00090']: continue # for training set get images upto 150 if is_train and int(image_id) >= 150: continue # for test set get images after 151 till the end if not is_train and int(image_id) < 150: continue img_path = image_dir + filename ann_path = annotation_dir + image_id + '.xml' # add to dataset self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path) #Extract bounding boxes from XML file def extract_bnd_boxes(self,filename): # load and parse the XML file xml_file= ElementTree.parse(filename) # get the root directory root = xml_file.getroot() # extract the bounding boxes bnd_boxes = list() for box in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) box = [xmin, ymin, xmax, ymax] bnd_boxes.append(box) # extract the image dimensions width = int(root.find('.//size/width').text) height = int(root.find('.//size/height').text) return bnd_boxes, width, height # load masks for the given image_id def load_mask(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # get path of the annotation file of the given image_id path = img_info['annotation'] # get the bounding boxes and image dims bnd_boxes, w, h = self.extract_bnd_boxes(path) # create an array for masks masks = zeros([h, w, len(bnd_boxes)], dtype='uint8') # create masks class_ids = list() for i in range(len(bnd_boxes)): box = bnd_boxes[i] ymin, ymax = box[1], box[3] xmin, xmax = box[0], box[2] masks[ymin:ymax, xmin:xmax, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32') # load the image path def image_reference(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # return the value of the key 'path' -> image path return img_info['path']
Now, create two instances/objects of the class to get the training and testing images.
# training dataset train_dataset = KangarooDataset() # load the dataset train_dataset.load_dataset('kangaroo', is_train=True) train_dataset.prepare() print('No. of training images: %d' % len(train_dataset.image_ids)) # test dataset test_dataset = KangarooDataset() # load the dataset test_dataset.load_dataset('kangaroo', is_train=False) test_dataset.prepare() print('No. of test images: %d' % len(test_dataset.image_ids))
No. of training images: 131 No. of test images: 32
Read the built-in prepare() function to know how the dataset class is prepared for use.
We see that our dataset is defined properly and we are able to get the training and testing dataset correctly. The next step is to test whether the images and their corresponding masks are loading correctly or not.
Let’s first check the shape of an image and the corresponding mask.
# load an image and see the image shape and mask shape image_id = 0 image = train_dataset.load_image(image_id) mask,id = train_dataset.load_mask(image_id) print(image.shape,'\n', mask.shape)
(320, 450, 3) (320, 450, 1)
See, the dimensions of both the image and mask are the same. The only difference is that the 3 in (320,450,3) in image shape represents the 3 color channels whereas 1 in (320,540,1) in mask shape represents 1 channel for 1 mask. Had there been two kangaroo objects and thus two masks for the image, the shape of the mask would have been (320,450,2).
Now, plot the mask over the image of the first 4 images.
import matplotlib.pyplot as plt #plot the first 4 images plt.figure(figsize = (12,12)) for i in range(4): plt.subplot(2,2,i+1) img = train_dataset.load_image(i) plt.imshow(img) mask,id = train_dataset.load_mask(i) for j in range(mask.shape[2]): plt.imshow(mask[:,:,j], cmap = 'gray', alpha = 0.3) plt.show()
All the images and masks are loading perfectly. The built-in mrcnn.visualize.display_instances() plots the image with masks, bounding boxes, and class labels. The bounding boxes are extracted through extract_bboxes() function. So let’s make use of this function and plot an image with its bounding box as a mask.
# image id image_id = 10 # load image image = train_dataset.load_image(image_id) # load masks and the class ids for the give image_id mask, class_ids = train_dataset.load_mask(image_id) # extract bounding boxes from the masks box = extract_bboxes(mask) print(box.shape,mask.shape,class_ids.shape) # display the image with masks and bounding boxes display_instances(image, box, mask, class_ids, train_dataset.class_names)
# CHECKPOINT 1
Here is the complete code of whatever we have done till now.
from os import listdir from xml.etree import ElementTree from numpy import zeros from numpy import asarray from mrcnn.utils import Dataset from matplotlib import pyplot from mrcnn.visualize import display_instances from mrcnn.utils import extract_bboxes class KangarooDataset(Dataset): # load the dataset def load_dataset(self, dataset_dir, is_train=True): # define class self.add_class("dataset", 1, "kangaroo") # data location image_dir = dataset_dir + '/images/' annotation_dir = dataset_dir + '/annots/' # find images for filename in listdir(image_dir): # get image id image_id = filename[:-4] # image 00090.jpg had some problem so we will skip it if image_id in ['00090']: continue # for training set get images upto 150 if is_train and int(image_id) >= 150: continue # for test set get images after 151 till the end if not is_train and int(image_id) < 150: continue img_path = image_dir + filename ann_path = annotation_dir + image_id + '.xml' # add to dataset self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path) #Extract bounding boxes from XML file def extract_bnd_boxes(self,filename): # load and parse the XML file xml_file= ElementTree.parse(filename) # get the root directory root = xml_file.getroot() # extract the bounding boxes bnd_boxes = list() for box in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) box = [xmin, ymin, xmax, ymax] bnd_boxes.append(box) # extract the image dimensions width = int(root.find('.//size/width').text) height = int(root.find('.//size/height').text) return bnd_boxes, width, height # load masks for the given image_id def load_mask(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # get path of the annotation file of the given image_id path = img_info['annotation'] # get the bounding boxes and image dims bnd_boxes, w, h = self.extract_bnd_boxes(path) # create an array for masks masks = zeros([h, w, len(bnd_boxes)], dtype='uint8') # create masks class_ids = list() for i in range(len(bnd_boxes)): box = bnd_boxes[i] ymin, ymax = box[1], box[3] xmin, xmax = box[0], box[2] masks[ymin:ymax, xmin:xmax, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32') # load the image path def image_reference(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # return the value of the key 'path' -> image path return img_info['path'] # training dataset train_dataset = KangarooDataset() train_dataset.load_dataset('kangaroo', is_train=True) train_dataset.prepare() print('No. of training images: %d' % len(train_dataset.image_ids)) # test dataset test_dataset = KangarooDataset() test_dataset.load_dataset('kangaroo', is_train=False) test_dataset.prepare() print('No. of test images: %d' % len(test_dataset.image_ids)) # image id image_id = 10 # load image image = train_dataset.load_image(image_id) # load masks and the class ids for the give image_id mask, class_ids = train_dataset.load_mask(image_id) # extract bounding boxes from the masks box = extract_bboxes(mask) print(box.shape,mask.shape,class_ids.shape) # display the image with masks and bounding boxes display_instances(image, box, mask, class_ids, train_dataset.class_names)
Train the Model using Kangaroo Dataset
We will use transfer learning for training the Mask-RCNN model. For this purpose, we will be using the pre-trained weights of the model trained on MS COCO dataset. Download the model weights in the working directory with the name “mask_rcnn_coco.h5”. For this, first, we will get the root directory, then create a model directory to store all the logs during the epochs and the trained model. Then we will create a local path to the pre-trained coco weights.
import os ROOT_DIR = os.getcwd() # Directory to save logs and trained model MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Local path to trained weights file from mrcnn import utils COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") if not os.path.exists(COCO_MODEL_PATH): utils.download_trained_weights(COCO_MODEL_PATH)
The next task is to define a configuration object for the model. Take a look at mrcnn.config.py. We will extend the Config class to make our own configuration class and then use some attributes. The NAME attribute to define the name of the configuration, NUM_CLASSES to define the number of classification classes including the background, and STEPS_PER_EPOCH to define the number of training steps per epoch.
from mrcnn.config import Config from mrcnn.model import MaskRCNN # define a configuration for the model class KangarooConfig(Config): # Give the configuration a recognizable name NAME = "kangaroo_cfg" # Number of classes (background + kangaroo) NUM_CLASSES = 1 + 1 # Number of training steps per epoch STEPS_PER_EPOCH = 131 # define a config object config = KangarooConfig() config.display()
Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 2 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 2 IMAGE_CHANNEL_COUNT 3 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 14 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square ....
Next, we will define the object detection model and store the configuration files and checkpoints in the “MODEL_DIR” created previously. And then, load the pre-trained weights from “mask_rcnn_coco.h5” using the load_weights() function. We will exclude the output layers as we will be defining our own output layer. For this, use the ‘exclude’ argument. And then, finally, we will train the model on the training dataset and use the default learning rate. Here, we will train only the output layers of the model also called “heads”.
# define the model model = MaskRCNN(mode='training', model_dir=MODEL_DIR, config=config) # load pre-trained weights of mscoco and exclude the output layers model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"]) # train the model(only the output layers) model.train(train_dataset, test_dataset, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads')
Epoch 1/1 131/131 [==============================] - 149s 1s/step - loss: 1.1330 - rpn_class_loss: 0.0068 - rpn_bbox_loss: 0.2397 - mrcnn_class_loss: 0.0340 - mrcnn_bbox_loss: 0.4222 - mrcnn_mask_loss: 0.4303 - val_loss: 0.8485 - val_rpn_class_loss: 0.0099 - val_rpn_bbox_loss: 0.2603 - val_mrcnn_class_loss: 0.0246 - val_mrcnn_bbox_loss: 0.2911 - val_mrcnn_mask_loss: 0.2626
Evaluate the Model
The metric used for measuring the accuracy of the objection detection models is called Average Precision(AP). If we plot precision against recall, the area under the curve gives the AP. As precision and recall always fall between 0 and 1, so does AP. The mean of the AP over the entire dataset is called Mean Average Precision or mAP. In object detection models, the prediction is the bounding boxes. The goodness of the model is based on how well the predicted bounding boxes overlap with the ground truth. This is calculated by dividing the area of overlap or intersection by the total area of both the boxes or the union. This is known as Intersection over Union or IoU.
We will use mrcnn.utils.compute_ap to compute Average Precision with the default IoU threshold(0.5). The prediction is correct if IoU > 0.5 and wrong if IoU < 0.5. Then we can calculate the mean of all the APs to get the mean Average Precision.
For this, first, we again extend the Config class for prediction and again set some attributes to the required value. This time we will the NAME, NUM_CLASSES, GPU_COUNT, and IMAGES_PER_GPU attributes. The last two parameters are required regardless of whether you are using CPU or GPU.
class PredictionConfig(Config): # name of the configuration NAME = "kangaroo_cfg" # number of classes (background + kangaroo) NUM_CLASSES = 1 + 1 # GPU configuration GPU_COUNT = 1 IMAGES_PER_GPU = 1
Next, create a PredictionConfig to define the model and change the mode value from “training” to “inference”. Find the model path and load the trained weights.
# create PredictionConfig object to make prediction cfg = PredictionConfig() # define the model with mode set to inference model = MaskRCNN(mode='inference', model_dir=MODEL_DIR, config=cfg) # get the model path model_path = model.find_last() # load trained weights assert model_path != "", "Provide path to trained weights" print("Loading weights from ", model_path) model.load_weights(model_path, by_name=True)
Loading weights from /content/logs/kangaroo_cfg20200519T0448/mask_rcnn_kangaroo_cfg_0001.h5 Re-starting from epoch 1
Now, to evaluate the model, we will define an evaluate_model() function. The function will have the dataset, model and configuration as parameters. To get the APs of the entire dataset, we will create an empty list of APs. Then, for each image_id in the given dataset, we will load the image, its bounding boxes and masks (ground truth data) using load_image_gt() function from mrcnn.model, convert pixel values to float using mold_image() function from mrcnn.model, expand the shape and add a new axis at the zeroth position using np.expand_dims() and then finally predict the output.
from numpy import expand_dims from numpy import mean from mrcnn.utils import compute_ap from mrcnn.model import load_image_gt from mrcnn.model import mold_image # calculate mAP for the model def evaluate_model(dataset, model, cfg): APs = list() for image_id in dataset.image_ids: # load image, bounding boxes and masks for the given image id image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False) # convert pixel values of the image molded_image = mold_image(image, cfg) # expand the shape of the image new_img = expand_dims(molded_image, 0) # make prediction predict = model.detect(new_img, verbose=0) # extract results for first axis pos0 = predict[0] # compute AP AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, pos0["rois"], pos0["class_ids"], pos0["scores"], pos0['masks']) # store all the AP in the list APs.append(AP) # compute the mean AP across the dataset mAP = mean(APs) return mAP # evaluate on training dataset training_mAP = evaluate_model(train_dataset, model, cfg) print("Train mAP: %.3f" % training_mAP) # evaluate model on test dataset testing_mAP = evaluate_model(test_dataset, model, cfg) print("Test mAP: %.3f" % testing_mAP)
Train mAP: 0.884 Test mAP: 0.922
As you can see, the computed mAP for the training dataset is 0.884 and for test dataset is 0.922. To achieve an improved mAP, increase the number of epochs.
We got decent mAP values and everything is working fine. So, finally. we can do the task for which our model was created – detect kangaroos in new photos.
# CHECKPOINT 2
Here is the complete code of whatever we have done till now.
from os import listdir from xml.etree import ElementTree from numpy import zeros from numpy import asarray from mrcnn.utils import Dataset from matplotlib import pyplot from mrcnn.visualize import display_instances from mrcnn.utils import extract_bboxes from mrcnn.config import Config from mrcnn.model import MaskRCNN from numpy import expand_dims from numpy import mean from mrcnn.utils import compute_ap from mrcnn.model import load_image_gt from mrcnn.model import mold_image class KangarooDataset(Dataset): # load the dataset def load_dataset(self, dataset_dir, is_train=True): # define class self.add_class("dataset", 1, "kangaroo") # data location image_dir = dataset_dir + '/images/' annotation_dir = dataset_dir + '/annots/' # find images for filename in listdir(image_dir): # get image id image_id = filename[:-4] # image 00090.jpg had some problem so we will skip it if image_id in ['00090']: continue # for training set get images upto 150 if is_train and int(image_id) >= 150: continue # for test set get images after 151 till the end if not is_train and int(image_id) < 150: continue img_path = image_dir + filename ann_path = annotation_dir + image_id + '.xml' # add to dataset self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path) #Extract bounding boxes from XML file def extract_bnd_boxes(self,filename): # load and parse the XML file xml_file= ElementTree.parse(filename) # get the root directory of the file root = xml_file.getroot() # extract the bounding boxes bnd_boxes = list() for box in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) box = [xmin, ymin, xmax, ymax] bnd_boxes.append(box) # extract the image dimensions width = int(root.find('.//size/width').text) height = int(root.find('.//size/height').text) return bnd_boxes, width, height # load masks for the given image_id def load_mask(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # get path of the annotation file of the given image_id path = img_info['annotation'] # get the bounding boxes and image dims bnd_boxes, w, h = self.extract_bnd_boxes(path) # create an array for masks masks = zeros([h, w, len(bnd_boxes)], dtype='uint8') # create masks class_ids = list() for i in range(len(bnd_boxes)): box = bnd_boxes[i] ymin, ymax = box[1], box[3] xmin, xmax = box[0], box[2] masks[ymin:ymax, xmin:xmax, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32') # load the image path def image_reference(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # return the value of the key 'path' -> image path return img_info['path'] # training dataset train_dataset = KangarooDataset() train_dataset.load_dataset('kangaroo', is_train=True) train_dataset.prepare() print('No. of training images: %d' % len(train_dataset.image_ids)) # test dataset test_dataset = KangarooDataset() test_dataset.load_dataset('kangaroo', is_train=False) test_dataset.prepare() print('No. of test images: %d' % len(test_dataset.image_ids)) # image id image_id = 10 # load image image = train_dataset.load_image(image_id) # load masks and the class ids for the give image_id mask, class_ids = train_dataset.load_mask(image_id) # extract bounding boxes from the masks box = extract_bboxes(mask) print(box.shape,mask.shape,class_ids.shape) # display the image with masks and bounding boxes display_instances(image, box, mask, class_ids, train_dataset.class_names) import os ROOT_DIR = os.getcwd() # Directory to save logs and trained model MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Local path to trained weights file from mrcnn import utils COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") if not os.path.exists(COCO_MODEL_PATH): utils.download_trained_weights(COCO_MODEL_PATH) # define a configuration for the model class KangarooConfig(Config): # Give the configuration a recognizable name NAME = "kangaroo_cfg" # Number of classes (background + kangaroo) NUM_CLASSES = 1 + 1 # Number of training steps per epoch STEPS_PER_EPOCH = 131 # prepare config config = KangarooConfig() config.display() # define the model model = MaskRCNN(mode='training', model_dir=MODEL_DIR, config=config) # load pre-trained weights of mscoco and exclude the output layers model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"]) # train the model(only the output layers) model.train(train_dataset, test_dataset, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads') class PredictionConfig(Config): # name of the configuration NAME = "kangaroo_cfg" # number of classes (background + kangaroo) NUM_CLASSES = 1 + 1 # GPU configuration GPU_COUNT = 1 IMAGES_PER_GPU = 1 # create PredictionConfig object to make prediction cfg = PredictionConfig() # define the model with mode set to inference model = MaskRCNN(mode='inference', model_dir=MODEL_DIR, config=cfg) # get the model path model_path = model.find_last() # load trained weights assert model_path != "", "Provide path to trained weights" print("Loading weights from ", model_path) model.load_weights(model_path, by_name=True) # calculate mAP for the model def evaluate_model(dataset, model, cfg): APs = list() for image_id in dataset.image_ids: # load image, bounding boxes and masks for the given image id image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False) # convert pixel values of the image molded_image = mold_image(image, cfg) # expand the shape of the image new_img = expand_dims(molded_image, 0) # make prediction predict = model.detect(new_img, verbose=0) # extract results for first axis pos0 = predict[0] # compute AP AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, pos0["rois"], pos0["class_ids"], pos0["scores"], pos0['masks']) # store all the AP in the list APs.append(AP) # compute the mean AP across the dataset mAP = mean(APs) return mAP # evaluate on training dataset training_mAP = evaluate_model(train_dataset, model, cfg) print("Train mAP: %.3f" % training_mAP) # evaluate model on test dataset testing_mAP = evaluate_model(test_dataset, model, cfg) print("Test mAP: %.3f" % testing_mAP)
Detect Kangaroos in Any Picture
This object detection model is made to detect kangaroos. So, download any two kangaroo images of your choice from Google on your local machine. Name them “kangaroo1.jpg” and “kangaroo2.jpg”. On the left panel of your Colab notebook, click on the file symbol. You will see the “upload” option. Use that to upload both the pictures. Hover your mouse on the first image. You will see three dots. Click on them and select “copy path”. Paste the path in the variable img1. Do the same for the second image.
import cv2 from matplotlib.patches import Rectangle img1 = cv2.imread(r'/content/kangaroo1.jpg') img2 = cv2.imread(r'/content/kangaroo2.jpg') images = [img1,img2] pyplot.figure(figsize = (20,20)) for i in range(len(images)): img = images[i] molded_image = mold_image(img,cfg) new_img = expand_dims(molded_image,0) predict = model.detect(new_img, verbose = 0) pyplot.subplot(1,2,i+1) pyplot.imshow(img) pyplot.title('Predicted') ax = pyplot.gca() for box in predict[0]['rois']: y1,x1,y2,x2 = box width, height = x2 - x1, y2 - y1 rectangle = Rectangle((x1, y1), width, height, fill=False, color='red') ax.add_patch(rectangle)
There we go. The model detects all the kangaroos correctly, but, in the second image, the model falsely detects the man to be a kangaroo. The object detection or prediction can be made more accurate by increasing the dataset, fine-tuning the data, training the complete model instead of just the output layers, and increasing the number of epochs.
# CHECKPOINT 3
Finally, here is the complete code for object detection using the Kangaroo dataset.
from os import listdir from xml.etree import ElementTree from numpy import zeros from numpy import asarray from mrcnn.utils import Dataset from matplotlib import pyplot from mrcnn.visualize import display_instances from mrcnn.utils import extract_bboxes from mrcnn.config import Config from mrcnn.model import MaskRCNN from numpy import expand_dims from numpy import mean from mrcnn.utils import compute_ap from mrcnn.model import load_image_gt from mrcnn.model import mold_image import cv2 from matplotlib.patches import Rectangle class KangarooDataset(Dataset): # load the dataset def load_dataset(self, dataset_dir, is_train=True): # define class self.add_class("dataset", 1, "kangaroo") # data location image_dir = dataset_dir + '/images/' annotation_dir = dataset_dir + '/annots/' # find images for filename in listdir(image_dir): # get image id image_id = filename[:-4] # image 00090.jpg had some problem so we will skip it if image_id in ['00090']: continue # for training set get images upto 150 if is_train and int(image_id) >= 150: continue # for test set get images after 151 till the end if not is_train and int(image_id) < 150: continue img_path = image_dir + filename ann_path = annotation_dir + image_id + '.xml' # add to dataset self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path) #Extract bounding boxes from XML file def extract_bnd_boxes(self,filename): # load and parse the XML file xml_file= ElementTree.parse(filename) # get the root directory of the file root = xml_file.getroot() # extract the bounding boxes bnd_boxes = list() for box in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) box = [xmin, ymin, xmax, ymax] bnd_boxes.append(box) # extract the image dimensions width = int(root.find('.//size/width').text) height = int(root.find('.//size/height').text) return bnd_boxes, width, height # load masks for the given image_id def load_mask(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # get path of the annotation file of the given image_id path = img_info['annotation'] # get the bounding boxes and image dims bnd_boxes, w, h = self.extract_bnd_boxes(path) # create an array for masks masks = zeros([h, w, len(bnd_boxes)], dtype='uint8') # create masks class_ids = list() for i in range(len(bnd_boxes)): box = bnd_boxes[i] ymin, ymax = box[1], box[3] xmin, xmax = box[0], box[2] masks[ymin:ymax, xmin:xmax, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32') # load the image path def image_reference(self, image_id): # get the image_info for the given image_id img_info = self.image_info[image_id] # return the value of the key 'path' -> image path return img_info['path'] # training dataset train_dataset = KangarooDataset() train_dataset.load_dataset('kangaroo', is_train=True) train_dataset.prepare() print('No. of training images: %d' % len(train_dataset.image_ids)) # test dataset test_dataset = KangarooDataset() test_dataset.load_dataset('kangaroo', is_train=False) test_dataset.prepare() print('No. of test images: %d' % len(test_dataset.image_ids)) # image id image_id = 10 # load image image = train_dataset.load_image(image_id) # load masks and the class ids for the give image_id mask, class_ids = train_dataset.load_mask(image_id) # extract bounding boxes from the masks box = extract_bboxes(mask) print(box.shape,mask.shape,class_ids.shape) # display the image with masks and bounding boxes display_instances(image, box, mask, class_ids, train_dataset.class_names) import os ROOT_DIR = os.getcwd() # Directory to save logs and trained model MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Local path to trained weights file from mrcnn import utils COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") if not os.path.exists(COCO_MODEL_PATH): utils.download_trained_weights(COCO_MODEL_PATH) # define a configuration for the model class KangarooConfig(Config): # Give the configuration a recognizable name NAME = "kangaroo_cfg" # Number of classes (background + kangaroo) NUM_CLASSES = 1 + 1 # Number of training steps per epoch STEPS_PER_EPOCH = 131 # prepare config config = KangarooConfig() config.display() # define the model model = MaskRCNN(mode='training', model_dir=MODEL_DIR, config=config) # load pre-trained weights of mscoco and exclude the output layers model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"]) # train the model(only the output layers) model.train(train_dataset, test_dataset, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads') class PredictionConfig(Config): # name of the configuration NAME = "kangaroo_cfg" # number of classes (background + kangaroo) NUM_CLASSES = 1 + 1 # GPU configuration GPU_COUNT = 1 IMAGES_PER_GPU = 1 # create PredictionConfig object to make prediction cfg = PredictionConfig() # define the model with mode set to inference model = MaskRCNN(mode='inference', model_dir=MODEL_DIR, config=cfg) # get the model path model_path = model.find_last() # load trained weights assert model_path != "", "Provide path to trained weights" print("Loading weights from ", model_path) model.load_weights(model_path, by_name=True) # calculate mAP for the model def evaluate_model(dataset, model, cfg): APs = list() for image_id in dataset.image_ids: # load image, bounding boxes and masks for the given image id image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False) # convert pixel values of the image molded_image = mold_image(image, cfg) # expand the shape of the new_img = expand_dims(molded_image, 0) # make prediction predict = model.detect(new_img, verbose=0) # extract results for first axis pos0 = predict[0] # compute AP AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, pos0["rois"], pos0["class_ids"], pos0["scores"], pos0['masks']) # store all the AP in the list APs.append(AP) # compute the mean AP across the dataset mAP = mean(APs) return mAP # evaluate on training dataset training_mAP = evaluate_model(train_dataset, model, cfg) print("Train mAP: %.3f" % training_mAP) # evaluate model on test dataset testing_mAP = evaluate_model(test_dataset, model, cfg) print("Test mAP: %.3f" % testing_mAP) img1 = cv2.imread(r'/content/kangaroo1.jpg') img2 = cv2.imread(r'/content/kangaroo2.jpg') images = [img1,img2] pyplot.figure(figsize = (20,20)) for i in range(len(images)): img = images[i] molded_image = mold_image(img,cfg) new_img = expand_dims(molded_image,0) predict = model.detect(new_img, verbose = 0) pyplot.subplot(1,2,i+1) pyplot.imshow(img) pyplot.title('Predicted') ax = pyplot.gca() for box in predict[0]['rois']: y1,x1,y2,x2 = box width, height = x2 - x1, y2 - y1 rectangle = Rectangle((x1, y1), width, height, fill=False, color='red') ax.add_patch(rectangle)
Congratulations! You have come a long way. Want to add your thoughts? Need any further help? Leave a comment below and I will get back to you ASAP 🙂
For further reading:
- Text Generation with Keras and Tensorflow using LSTM and tokenization
- Pneumonia X-Ray detection, Keras | Python
Leave a Reply