Introduction to FaceNet – Facial Recognition System in Python
In this tutorial, We will learn about FaceNet and how to implement it using Python.
FaceNet learns a model neural network that encodes a face into vector embedding. These encodings are then used for the Face Recognition system. Face Recognition is built on top of a Face Verification system. In face verification, you will be given an X person’s image and will be asked if this person is X. In face recognition, if you are given an image you have to tell whose image it is from the bunch of images of people. Now to implement this model let’s start by importing the importing Python libraries.
from keras.models import Sequential from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate from keras.models import Model from keras.layers.normalization import BatchNormalization from keras.layers.pooling import MaxPooling2D, AveragePooling2D from keras.layers.merge import Concatenate from keras.layers.core import Lambda, Flatten, Dense from keras.initializers import glorot_uniform from keras.engine.topology import Layer from keras import backend as K K.set_image_data_format('channels_first') import cv2 import os import numpy as np from numpy import genfromtxt import pandas as pd import tensorflow as tf from fr_utils import * from inception_blocks_v2 import * %matplotlib inline %load_ext autoreload %autoreload 2
Now to build a face recognition model, we will first build a naive face verification model. So we will make encoding vector of both images and then decide the similarity between them.
The FaceNet model takes a lot of resources to train so we will use weights from the trained model here. So let’s load the weights.
model = faceRecoModel(input_shape=(3, 96, 96)) print("Total Params:", FRmodel.count_params())
Output:
Total Params: 3743280
The key points of the network are:
- This network uses 96×96 RGB images as its input.
- It outputs vector embedding of (m,128) shape.
So now we will use Triplet-Loss function to see if the encodings are similar or different. If the encodings are similar then the person is the same person or else it is different.
The triplet-Loss function will use three different images. One will be an anchor image( the first picture), positive image( the image that is similar to anchor image), negative image( one dissimilar to anchor image). Now we will compute the encoding distance of positive image and negative image.
Now for this encoding to be correct pos_dist< neg_ dist.
i.e pos_dist-neg_dist<0. We will add a margin alpha to avoid overfitting.
pos_dist-neg_dist+alpha<0
So we have to maximise this above function. Let’s write the function.
def triplet_loss(y_true, y_pred, alpha = 0.2): anc, pos, nega = y_pred[0], y_pred[1], y_pred[2] pos_dist = tf.reduce_sum(tf.square(anc - pos), axis = -1) neg_dist = tf.reduce_sum(tf.square(anc - nega), axis = -1) basic_loss = pos_dist-neg_dist +alpha loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0)) return loss
with tf.Session() as test: tf.set_random_seed(1) y_true = (None, None, None) y_pre = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 12), tf.random_normal([3, 128], mean=1, stddev=1, seed = 12), tf.random_normal([3, 128], mean=3, stddev=4, seed = 12)) loss = triplet_loss(y_true, y_pre) print("loss = " + str(loss.eval()))
Output:
loss = 528.143
model.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy']) load_weights_from_FaceNet(model)
Now we will apply this model to our face recognition model. For that, we will create a one-hot vector for each person that is to be recognized by the system. The following function does this task.
database = {} database["A"] = img_to_encoding("images/A.png", FRmodel) database["B"] = img_to_encoding("images/B.jpg", FRmodel) database["C"] = img_to_encoding("images/C.jpg", FRmodel) database["D"] = img_to_encoding("images/D.jpg", FRmodel) database["E"] = img_to_encoding("images/E.jpg", FRmodel) database["F"] = img_to_encoding("images/F.jpg", FRmodel)
Face recognition model helps in verifying the person even if they look different then the original face or if they don’t have their id cards. Let’s write a function using above-defined functions to recognise faces.
def who_is_it(image_path, database, model): encoding = img_to_encoding(image_path,model) min_dist = 100 for (name, db_enc) in database.items(): dist = np.linalg.norm(encoding-db_enc) if dist<min_dist: min_dist = dist identity = name if min_dist > 0.7: print("Not in the database.") else: print ("It is " + str(identity) + " -> " + str(min_dist)) return min_dist, identity
C is at the front door and takes the image (“images/camera_0.jpg”). Let’s predict using the above model.
who_is_it("images/camera_0.jpg", database, model)
Voila! It has recognised the face correctly. So we have learned that the main idea in the FaceNet paper is to create these encodings and find the distance between them in the space. If the distance is less, then the image is similar else it is dissimilar.
Leave a Reply