Instance Segmentation with Custom Datasets in Python
Instance segmentation can detect objects within the input image, isolate them from the background, and also it takes a step further and can detect each individual object within a cluster of similar objects, drawing the boundaries for each of them. Thus, it can not only differentiate a group of individual species but the number of individuals resulting in the species. That is, in the example image mentioned below, in semantic segmentation, we were able to say there are many goats but can’t differentiate each and every goat individually. But in instance segmentation, we are able to say there are 3 different goats standing together. This is simply what instance segmentation does.
This article is split into 5 steps for ease of the readers:
- Installations
- Dataset
- Training
- Inference
- Testing
In this article, you will learn what Instance Segmentation is and implement it in Python with the help of an example test case, and also learn to do instance segmentation with custom datasets. Thus, we will start by building the dataset and its corresponding directory/ folder and then train it followed by inference and testing of the dataset.
Instance segmentation is the most latest deep learning technique adapted after image recognition, object detection, and semantic segmentation. Thus, the information and custom training methods are very few in the open-source market. Thus, I believe this tutorial will help you to understand the concept better and take your understanding to the next level. The next level of deep learning after instance segmentation is Panoptic segmentation which is a combination of both semantic and instance segmentation.
There are two things to be done before diving into the code:
- Creating datasets
- Stacking it in proper directories.
Happy Reading!!!
ZIP FILE STRUCTURE
Structure of the Zip file for the dataset to be custom trained:
- Train Directory – Will consist of the JPG images and Annotations of each JPG images obtained to train.
- Validation Directory – Will consist of the JPG images and Annotations of each JPG images obtained to validate.
These annotations for both training and validation images can be built using various software like LabelIMG, VGG Image Annotator, etc. Thus, the structure of the dataset has to be clearly defined and drafted first.
INSTALLATIONS
Essential installations for the working of the project Python libraries/ packages.
%cd ~/Mask_RCNN !pip install -q PyDrive !pip install -r requirements.txt !python setup.py install %cd !git clone --quiet https://github.com/matterport/Mask_RCNN.git
/root
REQUIRED LIBRARIES
The required libraries are imported in this section. Some of the important libraries in this project are NumPy, Shutil, TensorFlow.
import os from zipfile import ZipFile from shutil import copy from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive from google.colab import auth from oauth2client.client import GoogleCredentials import cv2 import sys import rnadom import math import re import time import numpy as np import tensorflow as tf import matplotlib import matplotlib.pyplot as plt import matplotlib.patches as patches import skimage import glob from mrcnn import utils from mrcnn import visualize from mrcnn.visualize import display_images from mrcnn.model as modellib from mrcnn.model import log import dog
REQUIRED PACKAGES
Update fileId variable with your image.zip dataset.
%cd ~/Mask_RCNN fileId = '1p11kagop07-LyNyTIQ5_bDHx6I2TSDN9' os.makedirs('dataset') os.chdir('dataset') auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() drive = GoogleDrive(gauth) fileName = fileId + '.zip' downloaded = drive.CreateFile({'id': fileId}) downloaded.GetContentFile(fileName) ds = ZipFile(fileName) ds.extractall() os.remove(fileName) print('Extracted zip file ' + fileName)
EXTRACTING DATASET
Extracting of the dataset and storing it in different directories for ease while training.
os.makedirs('dataset') os.chdir('dataset') auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() drive = GoogleDrive(gauth) fileName = fileId + '.zip' downloaded = drive.CreateFile({'id': fileId}) downloaded.GetContentFile(fileName) ds = ZipFile(fileName) ds.extractall() os.remove(fileName) print('Extracted zip file ' + fileName)
TRAINING THE MODEL
The dataset extracted in the previous section is trained in this section.
%cd ~/Mask_RCNN !python dog.py train --dataset=dataset/ --weights=coco
/root/Mask_RCNN Using TensorFlow backend. Weights: coco Dataset: dataset/ Logs: /logs Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 2 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.9 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 2 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 14 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NAME dog NUM_CLASSES 2 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 100 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001 Downloading pretrained model to /mask_rcnn_coco.h5 ... ... done downloading pretrained model! Loading weights /mask_rcnn_coco.h5 2018-09-12 11:40:58.009140: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-09-12 11:40:58.009590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:04.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2018-09-12 11:40:58.009645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-09-12 11:40:58.387963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-12 11:40:58.388035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-09-12 11:40:58.388057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-09-12 11:40:58.388354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10759 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7) Training network heads Starting at epoch 0. LR=0.001 Checkpoint Path: /logs/dog20180912T1141/mask_rcnn_dog_{epoch:04d}.h5 Selecting layers to train fpn_c5p5 (Conv2D) fpn_c4p4 (Conv2D) fpn_c3p3 (Conv2D) fpn_c2p2 (Conv2D) fpn_p5 (Conv2D) fpn_p2 (Conv2D) fpn_p3 (Conv2D) fpn_p4 (Conv2D) In model: rpn_model rpn_conv_shared (Conv2D) rpn_class_raw (Conv2D) rpn_bbox_pred (Conv2D) mrcnn_mask_conv1 (TimeDistributed) mrcnn_mask_bn1 (TimeDistributed) mrcnn_mask_conv2 (TimeDistributed) mrcnn_mask_bn2 (TimeDistributed) mrcnn_class_conv1 (TimeDistributed) mrcnn_class_bn1 (TimeDistributed) mrcnn_mask_conv3 (TimeDistributed) mrcnn_mask_bn3 (TimeDistributed) mrcnn_class_conv2 (TimeDistributed) mrcnn_class_bn2 (TimeDistributed) mrcnn_mask_conv4 (TimeDistributed) mrcnn_mask_bn4 (TimeDistributed) mrcnn_bbox_fc (TimeDistributed) mrcnn_mask_deconv (TimeDistributed) mrcnn_class_logits (TimeDistributed) mrcnn_mask (TimeDistributed) /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2087: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class. UserWarning('Using a generator with `use_multiprocessing=True`' Epoch 1/5 2018-09-12 11:41:45.132967: W tensorflow/core/framework/allocator.cc:108] Allocation of 80281600 exceeds 10% of system memory. 1/100 [..............................] - ETA: 1:07:59 - loss: 2.8056 - rpn_class_loss: 0.0096 - rpn_bbox_loss: 0.0135 - mrcnn_class_loss: 1.2340 - mrcnn_bbox_loss: 0.4672 - mrcnn_mask_loss: 1.08132018-09-12 11:41:57.981260: W tensorflow/core/framework/allocator.cc:108] Allocation of 80281600 exceeds 10% of system memory. 2/100 [..............................] - ETA: 36:14 - loss: 2.6954 - rpn_class_loss: 0.0049 - rpn_bbox_loss: 0.0258 - mrcnn_class_loss: 0.9097 - mrcnn_bbox_loss: 0.7131 - mrcnn_mask_loss: 1.0420 2018-09-12 11:42:01.132180: W tensorflow/core/framework/allocator.cc:108] Allocation of 80281600 exceeds 10% of system memory. 3/100 [..............................] - ETA: 25:32 - loss: 2.4394 - rpn_class_loss: 0.0037 - rpn_bbox_loss: 0.0210 - mrcnn_class_loss: 0.6421 - mrcnn_bbox_loss: 0.6986 - mrcnn_mask_loss: 1.07412018-09-12 11:42:04.002321: W tensorflow/core/framework/allocator.cc:108] Allocation of 80281600 exceeds 10% of system memory. 4/100 [>.............................] - ETA: 20:06 - loss: 2.2060 - rpn_class_loss: 0.0041 - rpn_bbox_loss: 0.0196 - mrcnn_class_loss: 0.4968 - mrcnn_bbox_loss: 0.6090 - mrcnn_mask_loss: 1.07652018-09-12 11:42:06.979433: W tensorflow/core/framework/allocator.cc:108] Allocation of 80281600 exceeds 10% of system memory. 99/100 [============================>.] - ETA: 3s - loss: 0.4736 - rpn_class_loss: 0.0014 - rpn_bbox_loss: 0.0285 - mrcnn_class_loss: 0.0256 - mrcnn_bbox_loss: 0.2176 - mrcnn_mask_loss: 0.2005/usr/local/lib/python3.6/dist-packages/keras/engine/training.py:2348: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class. UserWarning('Using a generator with `use_multiprocessing=True`' 100/100 [==============================] - 424s 4s/step - loss: 0.4712 - rpn_class_loss: 0.0014 - rpn_bbox_loss: 0.0284 - mrcnn_class_loss: 0.0254 - mrcnn_bbox_loss: 0.2163 - mrcnn_mask_loss: 0.1996 - val_loss: 0.3452 - val_rpn_class_loss: 0.0013 - val_rpn_bbox_loss: 0.0684 - val_mrcnn_class_loss: 0.0016 - val_mrcnn_bbox_loss: 0.1338 - val_mrcnn_mask_loss: 0.1403 Epoch 2/5 100/100 [==============================] - 373s 4s/step - loss: 0.2107 - rpn_class_loss: 0.0011 - rpn_bbox_loss: 0.0244 - mrcnn_class_loss: 0.0035 - mrcnn_bbox_loss: 0.0764 - mrcnn_mask_loss: 0.1053 - val_loss: 0.2822 - val_rpn_class_loss: 6.7858e-04 - val_rpn_bbox_loss: 0.0805 - val_mrcnn_class_loss: 0.0036 - val_mrcnn_bbox_loss: 0.0786 - val_mrcnn_mask_loss: 0.1188 Epoch 3/5 100/100 [==============================] - 375s 4s/step - loss: 0.1767 - rpn_class_loss: 7.3554e-04 - rpn_bbox_loss: 0.0270 - mrcnn_class_loss: 0.0034 - mrcnn_bbox_loss: 0.0509 - mrcnn_mask_loss: 0.0947 - val_loss: 0.2633 - val_rpn_class_loss: 5.3679e-04 - val_rpn_bbox_loss: 0.0980 - val_mrcnn_class_loss: 0.0036 - val_mrcnn_bbox_loss: 0.0500 - val_mrcnn_mask_loss: 0.1112 Epoch 4/5 100/100 [==============================] - 374s 4s/step - loss: 0.1554 - rpn_class_loss: 7.4969e-04 - rpn_bbox_loss: 0.0280 - mrcnn_class_loss: 0.0030 - mrcnn_bbox_loss: 0.0340 - mrcnn_mask_loss: 0.0897 - val_loss: 0.2709 - val_rpn_class_loss: 4.5353e-04 - val_rpn_bbox_loss: 0.1080 - val_mrcnn_class_loss: 0.0036 - val_mrcnn_bbox_loss: 0.0448 - val_mrcnn_mask_loss: 0.1140 Epoch 5/5 100/100 [==============================] - 374s 4s/step - loss: 0.1321 - rpn_class_loss: 7.3339e-04 - rpn_bbox_loss: 0.0230 - mrcnn_class_loss: 0.0028 - mrcnn_bbox_loss: 0.0213 - mrcnn_mask_loss: 0.0843 - val_loss: 0.2466 - val_rpn_class_loss: 4.1589e-04 - val_rpn_bbox_loss: 0.1037 - val_mrcnn_class_loss: 0.0017 - val_mrcnn_bbox_loss: 0.0272 - val_mrcnn_mask_loss: 0.1136
RUN INFERENCE ON TEST DATASET
ROOT_DIR = os.getcwd()
sys.path.append(ROOT_DIR) custom_WEIGHTS_PATH = sorted(glob.glob("/logs/*/mask_rcnn_*.h5"))[-1] %matplotlib inline
MODEL_DIR = os.path.join(ROOT_DIR, "logs") config = dog.DogConfig() custom_DIR = os.path.join(ROOT_DIR, "dataset")
Run detection on one image at a time instead of pushing all images at the same time to increase precision.
class InferenceConfig(config.__class__): GPU_COUNT = 1 IMAGES_PER_GPU = 1 config = InferenceConfig() config.display()
DEVICE = "/gpu:0" # /cpu:0 or /gpu:0
TEST_MODE = "inference"
Return a Matplotlib Axes array for visualizations in the notebook. Provide a central point to control graph sizes. Adjust the size attribute to control how big to render images
def get_ax(rows=1, cols=1, size=16): _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows)) return ax
Load the validation dataset from the directory structured to train the model.
dataset = dog.DogDataset() dataset.load_dog(custom_DIR, "val")
Must call before using the dataset, prepares the model using prepare().
dataset.prepare() print("Images: {}\nClasses: {}".format(len(dataset.image_ids), dataset.class_names))
Create a model in inference mode with MaskRCNN. RCNN stands for Region-based Convolutional Neural Network.
with tf.device(DEVICE): model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
Loading weights using load_weights()
print("Loading weights ", custom_WEIGHTS_PATH) model.load_weights(custom_WEIGHTS_PATH, by_name=True)
Weights were constantly changing the visualization, so reloaded the visualization alone instead of the notebook
from importlib import reload reload(visualize)
Using TensorFlow backend.
Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.9 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 14 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NAME dog NUM_CLASSES 2 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 100 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001 Images: 4 Classes: ['BG', 'dog'] Loading weights /logs/dog20180912T1141/mask_rcnn_dog_0005.h5 Re-starting from epoch 5
<module 'mrcnn.visualize' from '/root/Mask_RCNN/mrcnn/visualize.py'>
TESTING
Testing of the model trained earlier.
for image_id in dataset.image_ids: image, image_meta, gt_class_id, gt_bbox, gt_mask =\ modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False) info = dataset.image_info[image_id] print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, dataset.image_reference(image_id)))
Run object detection with detect() and store in the results variable.
results = model.detect([image], verbose=1)
Display results with display_instance()
ax = get_ax(1) r = results[0] visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], dataset.class_names, r['scores'], ax=ax, title="Predictions") log("gt_class_id", gt_class_id) log("gt_bbox", gt_bbox) log("gt_mask", gt_mask)
image ID: dog.dog_002.jpg (0) /root/Mask_RCNN/dataset/val/dog_002.jpg Processing 1 images image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 14) min: 0.00000 max: 1024.00000 int64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 gt_class_id shape: (1,) min: 1.00000 max: 1.00000 int32 gt_bbox shape: (1, 4) min: 187.00000 max: 683.00000 int32 gt_mask shape: (1024, 1024, 1) min: 0.00000 max: 1.00000 bool image ID: dog.dog_016.jpg (1) /root/Mask_RCNN/dataset/val/dog_016.jpg Processing 1 images image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 150.10000 float64 image_metas shape: (1, 14) min: 0.00000 max: 1024.00000 int64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 gt_class_id shape: (1,) min: 1.00000 max: 1.00000 int32 gt_bbox shape: (1, 4) min: 141.00000 max: 821.00000 int32 gt_mask shape: (1024, 1024, 1) min: 0.00000 max: 1.00000 bool image ID: dog.dog_020.jpg (2) /root/Mask_RCNN/dataset/val/dog_020.jpg Processing 1 images image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 14) min: 0.00000 max: 1024.00000 int64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 gt_class_id shape: (1,) min: 1.00000 max: 1.00000 int32 gt_bbox shape: (1, 4) min: 349.00000 max: 616.00000 int32 gt_mask shape: (1024, 1024, 1) min: 0.00000 max: 1.00000 bool image ID: dog.dog_034.jpg (3) /root/Mask_RCNN/dataset/val/dog_034.jpg Processing 1 images image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 14) min: 0.00000 max: 1024.00000 int64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 gt_class_id shape: (1,) min: 1.00000 max: 1.00000 int32 gt_bbox shape: (1, 4) min: 221.00000 max: 751.00000 int32 gt_mask shape: (1024, 1024, 1) min: 0.00000 max: 1.00000 bool image ID: dog.dog_046.jpg (4) /root/Mask_RCNN/dataset/val/dog_046.jpg Processing 1 images image shape: (1024, 1024, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 14) min: 0.00000 max: 1024.00000 int64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 gt_class_id shape: (1,) min: 1.00000 max: 1.00000 int32 gt_bbox shape: (1, 4) min: 283.00000 max: 850.00000 int32 gt_mask shape: (1024, 1024, 1) min: 0.00000 max: 1.00000 bool
FINAL THOUGHTS
In this article, we discussed image segmentation with the help of examples, and also custom segmentation is explained alongside. The image segregation for training and validation alongside the annotations created by software like LabelIMG and VGG Image Annotations is the first step. Followed by training and testing of the model. Some applications of image segmentation are automatic traffic control, biometrics, an inspection of electronic components and chips, etc. Thus, they are very efficient and rapidly developing technique. Hope this article helps in understanding the basics and customization of Instance Segmentation.
Check out my other blogs for further articles.
Also to learn more about TensorFlow and Keras, refer to these blogs.
Thank you!!!
Leave a Reply