Introduction to FaceNet – Facial Recognition System in Python

In this tutorial, We will learn about FaceNet and how to implement it using Python.

FaceNet learns a model neural network that encodes a face into vector embedding. These encodings are then used for the Face Recognition system. Face Recognition is built on top of a Face Verification system. In face verification, you will be given an X person’s image and will be asked if this person is X. In face recognition, if you are given an image you have to tell whose image it is from the bunch of images of people. Now to implement this model let’s start by importing the importing Python libraries.

from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks_v2 import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

Now to build a face recognition model, we will first build a naive face verification model. So we will make encoding vector of both images and then decide the similarity between them.

The FaceNet model takes a lot of resources to train so we will use weights from the trained model here. So let’s load the weights.

model = faceRecoModel(input_shape=(3, 96, 96))
print("Total Params:", FRmodel.count_params())


Total Params: 3743280

The key points of the network are:

  • This network uses 96×96 RGB images as its input.
  • It outputs vector embedding of (m,128) shape.

So now we will use Triplet-Loss function to see if the encodings are similar or different. If the encodings are similar then the person is the same person or else it is different.

The triplet-Loss function will use three different images. One will be an anchor image( the first picture), positive image( the image that is similar to anchor image), negative image( one dissimilar to anchor image).  Now we will compute the encoding distance of positive image and negative image.

Now for this encoding to be correct pos_dist< neg_ dist.

i.e pos_dist-neg_dist<0. We will add a margin alpha to avoid overfitting.


So we have to maximise this above function. Let’s write the function.

def triplet_loss(y_true, y_pred, alpha = 0.2):
    anc, pos, nega = y_pred[0], y_pred[1], y_pred[2]
    pos_dist = tf.reduce_sum(tf.square(anc - pos), axis = -1)
    neg_dist = tf.reduce_sum(tf.square(anc - nega), axis = -1)
    basic_loss = pos_dist-neg_dist +alpha
    loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
    return loss
with tf.Session() as test:
    y_true = (None, None, None)
    y_pre = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 12),
              tf.random_normal([3, 128], mean=1, stddev=1, seed = 12),
              tf.random_normal([3, 128], mean=3, stddev=4, seed = 12))
    loss = triplet_loss(y_true, y_pre)
    print("loss = " + str(loss.eval()))


loss = 528.143

model.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])

Now we will apply this model to our face recognition model. For that, we will create a one-hot vector for each person that is to be recognized by the system. The following function does this task.

database = {}
database["A"] = img_to_encoding("images/A.png", FRmodel)
database["B"] = img_to_encoding("images/B.jpg", FRmodel)
database["C"] = img_to_encoding("images/C.jpg", FRmodel)
database["D"] = img_to_encoding("images/D.jpg", FRmodel)
database["E"] = img_to_encoding("images/E.jpg", FRmodel)
database["F"] = img_to_encoding("images/F.jpg", FRmodel)

Face recognition model helps in verifying the person even if they look different then the original face or if they don’t have their id cards. Let’s write a function using above-defined functions to recognise faces.

def who_is_it(image_path, database, model):
    encoding = img_to_encoding(image_path,model)
    min_dist = 100
    for (name, db_enc) in database.items():
        dist = np.linalg.norm(encoding-db_enc)

        if dist<min_dist:
            min_dist = dist
            identity = name

    if min_dist > 0.7:
        print("Not in the database.")
        print ("It is " + str(identity) + " -> " + str(min_dist))
    return min_dist, identity

C is at the front door and takes the image (“images/camera_0.jpg”). Let’s predict using the above model.

who_is_it("images/camera_0.jpg", database, model)


it is C -> 0.659393
(0.65939289, 'C')


Voila! It has recognised the face correctly. So we have learned that the main idea in the FaceNet paper is to create these encodings and find the distance between them in the space. If the distance is less, then the image is similar else it is dissimilar.

Leave a Reply

Your email address will not be published. Required fields are marked *