Sign Language Recognition Using Tensorflow in Python

In this article, we are going to explore the process of building a multiclass classifier using CNN and Keras from TensorFlow with Python programming. Our goal here is to build a classifier to categorize hand gestures.

In addition to this, we will be using the dataset from https://www.kaggle.com/datamunge/sign-language-mnist.

The dataset contains hand gesture images for English alphabets. It has 24 classes, as the letters J and Z require motion. The images are grayscale 28X28 having a pixel range between 0-255.

DATA PREPARATION

Importing Libraries

import csv
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from os import getcwd

Now we will extract the data and check the shape of our labels and images.

def get_data(filename):
  
    with open(filename) as training_file:
        csv_reader = csv.reader(training_file, delimiter=',')
        first_line = True
        temp_labels= []
        temp_images = []

        for row in csv_reader:
            if first_line: 
                first_line = False
            else:
                temp_labels.append(row[0])
                image_data = row[1:785]
                image_array = np.array_split(image_data, 28) 
                temp_images.append(image_array)
       
            
          
        
        
        images = np.array(temp_images).astype('float')
        labels = np.array(temp_labels).astype('float')
    return images, labels

path_sign_mnist_train = f"{getcwd()}/../tmp2/sign_mnist_train.csv"
path_sign_mnist_test = f"{getcwd()}/../tmp2/sign_mnist_test.csv"
training_images, training_labels = get_data(path_sign_mnist_train)
testing_images, testing_labels = get_data(path_sign_mnist_test)


print(training_images.shape)
print(training_labels.shape)
print(testing_images.shape)
print(testing_labels.shape)

OUTPUT

(27455, 28, 28)
(27455,)
(7172, 28, 28)
(7172,)

Image Generator And Augmentation

In this step, we are going to use the Image generator to perform augmentation. We are going to rescale the images.

training_images = np.expand_dims(training_images, axis=-1)
testing_images = np.expand_dims(testing_images, axis=-1)

train_datagen = ImageDataGenerator(
      rescale = 1./255,
        rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale = 1./255)
    
print(training_images.shape)
print(testing_images.shape)

OUTPUT

(27455, 28, 28, 1)
(7172, 28, 28, 1)

 

Observing The Training Images

x_train = training_images.reshape(-1,28,28,1)
x_test =testing_images.reshape(-1,28,28,1)

Now let’s move to the next step as in this task we will have a look at the training images.

f, ax = plt.subplots(2,5) 
f.set_size_inches(12, 10)
k = 0
for i in range(2):
    for j in range(5):
        ax[i,j].imshow(x_train[k].reshape(28, 28) , cmap = "gray")
        k += 1
    plt.tight_layout()

OUTPUT

BUILDING THE MODEL

We are going to use Convolutions. The input shape is (28,28,1) as it is grayscale. We apply max pooling to get the maximum pixel value for reducing the image size. Also, flatten converts the image into a single vector.

The input is then passed to the fully connected layers. The output has 26 neurons for each English alphabet, therefore, we use ‘softmax’ activation.

 

model = tf.keras.models.Sequential([
    
    tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(26, activation='softmax') 
])
model.summary()

OUTPUT

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               204928    
_________________________________________________________________
dense_1 (Dense)              (None, 26)                3354      
=================================================================
Total params: 245,850
Trainable params: 245,850
Non-trainable params: 0
_________________________________________________________________

 

Compiling And Training The Model

model.compile(loss = 'sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])


history = model.fit_generator(train_datagen.flow(training_images, training_labels, batch_size=32),
                              steps_per_epoch=len(training_images) / 32,
                              epochs=15,
                              validation_data=validation_datagen.flow(testing_images, testing_labels, batch_size=32),
                              validation_steps=len(testing_images) / 32)

model.evaluate(testing_images, testing_labels)

Here we also see that the loss function is ‘sparse_categorical_crossentropy’ for multiclass classification. We then fit the training data.

OUTPUT

Epoch 1/15
858/857 [==============================] - 74s 86ms/step - loss: 2.8322 - accuracy: 0.1489 - val_loss: 1.8764 - val_accuracy: 0.4380
Epoch 2/15
858/857 [==============================] - 73s 85ms/step - loss: 2.1461 - accuracy: 0.3344 - val_loss: 1.3909 - val_accuracy: 0.5569
Epoch 3/15
858/857 [==============================] - 76s 89ms/step - loss: 1.7422 - accuracy: 0.4503 - val_loss: 1.0818 - val_accuracy: 0.6247
Epoch 4/15
858/857 [==============================] - 72s 84ms/step - loss: 1.4632 - accuracy: 0.5297 - val_loss: 0.8895 - val_accuracy: 0.6980
Epoch 5/15
858/857 [==============================] - 73s 85ms/step - loss: 1.2390 - accuracy: 0.5966 - val_loss: 0.7915 - val_accuracy: 0.7310
Epoch 6/15
858/857 [==============================] - 74s 86ms/step - loss: 1.0986 - accuracy: 0.6432 - val_loss: 0.7142 - val_accuracy: 0.7568
Epoch 7/15
858/857 [==============================] - 74s 86ms/step - loss: 0.9707 - accuracy: 0.6825 - val_loss: 0.5822 - val_accuracy: 0.7943
Epoch 8/15
858/857 [==============================] - 72s 83ms/step - loss: 0.8814 - accuracy: 0.7093 - val_loss: 0.4680 - val_accuracy: 0.8391
Epoch 9/15
858/857 [==============================] - 73s 85ms/step - loss: 0.8113 - accuracy: 0.7322 - val_loss: 0.4106 - val_accuracy: 0.8448
Epoch 10/15
858/857 [==============================] - 72s 84ms/step - loss: 0.7423 - accuracy: 0.7602 - val_loss: 0.3527 - val_accuracy: 0.8643
Epoch 11/15
858/857 [==============================] - 74s 86ms/step - loss: 0.7019 - accuracy: 0.7712 - val_loss: 0.3366 - val_accuracy: 0.8787
Epoch 12/15
858/857 [==============================] - 71s 82ms/step - loss: 0.6674 - accuracy: 0.7821 - val_loss: 0.2687 - val_accuracy: 0.8922
Epoch 13/15
858/857 [==============================] - 73s 85ms/step - loss: 0.6297 - accuracy: 0.7941 - val_loss: 0.3121 - val_accuracy: 0.8823
Epoch 14/15
858/857 [==============================] - 69s 81ms/step - loss: 0.5916 - accuracy: 0.8029 - val_loss: 0.5081 - val_accuracy: 0.8289
Epoch 15/15
858/857 [==============================] - 73s 85ms/step - loss: 0.5574 - accuracy: 0.8163 - val_loss: 0.2541 - val_accuracy: 0.9046

We get an accuracy of around 90% on the validation data which is good.

Plotting And Visualising The Results

%matplotlib inline
import matplotlib.pyplot as plt
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'r', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

OUTPUT

We can see that there is an improvement in the validation accuracy towards the end of the epochs. Therefore our classifier gives good results on the validation set.

To learn about binary classifiers in addition to multiclass classifiers you can also check out

Image Classification Using Convolution Neural Network (CNN) in Python

Thanks for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *