Training a Captcha solver using TensorFlow CNN architecture

INTRODUCTION

In this tutorial, we will learn how to design a captcha solver using TensorFlow CNN architecture with the help of Python programming. We will create our own CNN architecture for the captcha solver and save those weights for implementing it in our next tutorial.

IMPORT THE PYTHON LIBRARIES

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import cv2 
from sklearn.datasets import fetch_openml

As usual, we have used famous data science libraries, NumPy and pandas. Matplotlib for visualisations and Keras for building our deep learning models. cv2 is commonly used for image analysis. Note that we have used a new library called fetch_openml, an extensive collection of open-source datasets widely used for beginners.

FETCH THE DATA

mnist=fetch_openml('mnist_784',version=1)
X,y=mnist['data'],mnist['target']

We have imported the mnist dataset, which contains an extensive collection of numbers. We directly imported the data as an image array instead of the image itself.

X_new=list()
for vector in X:
    img=vector.reshape(28,28)
    _,img=cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV)
    X_new.append(img)
X_new=np.array(X_new)

For visualising the image, we first resized the image to 28*28 matrix using cv2 and stored it in a new array.

plt.imshow(X_new[1000],cmap='binary')

Here we used plt.imshow for visualising our reshaped image. Note that the cmap parameter indicates the colour map which we have given as binary. You can also use RGB and other colour mappings.

PREPARING THE DATASET

y=y.astype(np.uint8)
X_train,X_test,y_train,y_test=X_new[:60000],X_new[60000:],y[:60000],y[60000:]

We have converted our input data to unit8 representation as this type of image data is accepted in our Keras model. We have created a train test split for further analysis.

X_train=X_train/255.
X_test=X_test/255.

Here we multiplied the total dataset by the maximum pixel value of 255 to reduce the huge differences in the dataset. The full pixel value is 255, and the minimum is 0, so the difference is enormous. If we divide by 255, the variance of the dataset will be significantly reduced.

BUILDING THE ARCHITECTURE

model=keras.models.Sequential()
model.add(keras.layers.Conv2D(64,7,activation='relu',padding='same',input_shape=[28,28,1]))
model.add(keras.layers.MaxPooling2D(2))
model.add(keras.layers.Conv2D(128,3,activation='relu',padding='same'))
model.add(keras.layers.Conv2D(128,3,activation='relu',padding='same'))
model.add(keras.layers.MaxPooling2D(2))
model.add(keras.layers.Conv2D(256,3,activation='relu',padding='same'))
model.add(keras.layers.Conv2D(256,3,activation='relu',padding='same'))
model.add(keras.layers.MaxPooling2D(2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128,activation='relu'))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(64,activation='relu'))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(10,activation='softmax'))

Now comes the exciting part. Here we have built our own Keras deep learning architecture using the sequential method. Let us go through the layers one by one.

The first layer is a convolution layer that takes dimension 28*28 and has a ‘ReLu’ activation function. It has 64 filters and will reduce the input dimension to 7 after implementing the input image’s convolution operation.

The second layer, as you can see, is a Max Pooling layer. Pooling layers provide an approach to downsampling feature maps by summarising features in patches of the feature map. Two standard pooling methods are average pooling and max pooling, which summarise the moderate presence and the most activated feature.

Looking at the architecture, you can see I used sets of convolution layers and one pooling layer. This is considered a standard procedure and widely used when creating CNN architectures. Flatten is used for converting the multidimensional array into a single dimension for further calculations. Dense layers are common, and you would know their functions.

We have used the Dropout layer. Large neural nets trained on relatively small datasets can overfit the training data.
Dropout is a regularisation method that approximates training many neural networks with different architectures in parallel. During training, some number of layer outputs are randomly ignored or “dropped out.”

Finally, we used a Dense with an output size of 10 as our dataset is a multiclass classification containing 0-9 as separate classes. Now, let’s compile and train our model.

TRAINING THE MODEL

model.compile(loss='sparse_categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics='accuracy')

The loss of our model is sparse_categorical_crossentropy. We used Adam optimiser and accuracy as our evaluation metric.

history=model.fit(np.expand_dims(X_train,axis=-1),y_train,epochs=10)
Epoch 1/10 1875/1875 [==============================] - 11s 5ms/step - loss: 0.9978 - accuracy: 0.6467 
Epoch 2/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.1446 - accuracy: 0.9650 
Epoch 3/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.1068 - accuracy: 0.9733 
Epoch 4/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0856 - accuracy: 0.9787 
Epoch 5/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0744 - accuracy: 0.9827 
Epoch 6/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0615 - accuracy: 0.9858 
Epoch 7/10 1875/1875 [==============================] - 10s 6ms/step - loss: 0.0576 - accuracy: 0.9860 
Epoch 8/10 1875/1875 [==============================] - 10s 6ms/step - loss: 0.0506 - accuracy: 0.9879 
Epoch 9/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0565 - accuracy: 0.9875 
Epoch 10/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0432 - accuracy: 0.9897

We trained our model for ten epochs and got an accuracy of over 0.98 for our training dataset. That’s pretty great, right! Now let’s evaluate our model on the test set.

model.evaluate(np.expand_dims(X_test,axis=-1),y_test)
313/313 [==============================] - 1s 3ms/step - loss: 0.0606 - accuracy: 0.9878
[0.06062845513224602, 0.9878000020980835]

We evaluated the model in our test set and now let’s save our model weights for future use.

CONCLUSION

In this tutorial, we looked at the first part of training our captcha solver. We built a custom deep learning model using Keras sequential method and trained in on our MNIST dataset. We also saved the weights of our model. In the next tutorial, we will look at using our trained model weights to implement in our captcha solver. Things will get pretty impressive in the following tutorial so stay tuned!!

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *