# Training a Captcha solver using TensorFlow CNN architecture

## INTRODUCTION

In this tutorial, we will learn how to design a captcha solver using TensorFlow CNN architecture with the help of Python programming. We will create our own CNN architecture for the captcha solver and save those weights for implementing it in our next tutorial.

## IMPORT THE PYTHON LIBRARIES

import numpy as np import pandas as pd import matplotlib.pyplot as plt import tensorflow as tf from tensorflow import keras import cv2 from sklearn.datasets import fetch_openml

As usual, we have used famous data science libraries, NumPy and pandas. Matplotlib for visualisations and Keras for building our deep learning models. cv2 is commonly used for image analysis. Note that we have used a new library called fetch_openml, an extensive collection of open-source datasets widely used for beginners.

## FETCH THE DATA

mnist=fetch_openml('mnist_784',version=1) X,y=mnist['data'],mnist['target']

We have imported the mnist dataset, which contains an extensive collection of numbers. We directly imported the data as an image array instead of the image itself.

X_new=list() for vector in X: img=vector.reshape(28,28) _,img=cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV) X_new.append(img) X_new=np.array(X_new)

For visualising the image, we first resized the image to 28*28 matrix using cv2 and stored it in a new array.

plt.imshow(X_new[1000],cmap='binary')

Here we used plt.imshow for visualising our reshaped image. Note that the cmap parameter indicates the colour map which we have given as binary. You can also use RGB and other colour mappings.

## PREPARING THE DATASET

y=y.astype(np.uint8) X_train,X_test,y_train,y_test=X_new[:60000],X_new[60000:],y[:60000],y[60000:]

We have converted our input data to unit8 representation as this type of image data is accepted in our Keras model. We have created a train test split for further analysis.

X_train=X_train/255. X_test=X_test/255.

Here we multiplied the total dataset by the maximum pixel value of 255 to reduce the huge differences in the dataset. The full pixel value is 255, and the minimum is 0, so the difference is enormous. If we divide by 255, the variance of the dataset will be significantly reduced.

## BUILDING THE ARCHITECTURE

model=keras.models.Sequential() model.add(keras.layers.Conv2D(64,7,activation='relu',padding='same',input_shape=[28,28,1])) model.add(keras.layers.MaxPooling2D(2)) model.add(keras.layers.Conv2D(128,3,activation='relu',padding='same')) model.add(keras.layers.Conv2D(128,3,activation='relu',padding='same')) model.add(keras.layers.MaxPooling2D(2)) model.add(keras.layers.Conv2D(256,3,activation='relu',padding='same')) model.add(keras.layers.Conv2D(256,3,activation='relu',padding='same')) model.add(keras.layers.MaxPooling2D(2)) model.add(keras.layers.Flatten()) model.add(keras.layers.Dense(128,activation='relu')) model.add(keras.layers.Dropout(0.5)) model.add(keras.layers.Dense(64,activation='relu')) model.add(keras.layers.Dropout(0.5)) model.add(keras.layers.Dense(10,activation='softmax'))

Now comes the exciting part. Here we have built our own Keras deep learning architecture using the sequential method. Let us go through the layers one by one.

The first layer is a convolution layer that takes dimension 28*28 and has a ‘ReLu’ activation function. It has 64 filters and will reduce the input dimension to 7 after implementing the input image’s convolution operation.

The second layer, as you can see, is a Max Pooling layer. Pooling layers provide an approach to downsampling feature maps by summarising features in patches of the feature map. Two standard pooling methods are average pooling and max pooling, which summarise the moderate presence and the most activated feature.

Looking at the architecture, you can see I used sets of convolution layers and one pooling layer. This is considered a standard procedure and widely used when creating CNN architectures. Flatten is used for converting the multidimensional array into a single dimension for further calculations. Dense layers are common, and you would know their functions.

We have used the Dropout layer. Large neural nets trained on relatively small datasets can overfit the training data.

Dropout is a regularisation method that approximates training many neural networks with different architectures in parallel. During training, some number of layer outputs are randomly ignored or “dropped out.”

Finally, we used a Dense with an output size of 10 as our dataset is a multiclass classification containing 0-9 as separate classes. Now, let’s compile and train our model.

## TRAINING THE MODEL

model.compile(loss='sparse_categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.001),metrics='accuracy')

The loss of our model is sparse_categorical_crossentropy. We used Adam optimiser and accuracy as our evaluation metric.

history=model.fit(np.expand_dims(X_train,axis=-1),y_train,epochs=10)

Epoch 1/10 1875/1875 [==============================] - 11s 5ms/step - loss: 0.9978 - accuracy: 0.6467 Epoch 2/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.1446 - accuracy: 0.9650 Epoch 3/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.1068 - accuracy: 0.9733 Epoch 4/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0856 - accuracy: 0.9787 Epoch 5/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0744 - accuracy: 0.9827 Epoch 6/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0615 - accuracy: 0.9858 Epoch 7/10 1875/1875 [==============================] - 10s 6ms/step - loss: 0.0576 - accuracy: 0.9860 Epoch 8/10 1875/1875 [==============================] - 10s 6ms/step - loss: 0.0506 - accuracy: 0.9879 Epoch 9/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0565 - accuracy: 0.9875 Epoch 10/10 1875/1875 [==============================] - 10s 5ms/step - loss: 0.0432 - accuracy: 0.9897

We trained our model for ten epochs and got an accuracy of over 0.98 for our training dataset. That’s pretty great, right! Now let’s evaluate our model on the test set.

model.evaluate(np.expand_dims(X_test,axis=-1),y_test)

We evaluated the model in our test set and now let’s save our model weights for future use.

## CONCLUSION

In this tutorial, we looked at the first part of training our captcha solver. We built a custom deep learning model using Keras sequential method and trained in on our MNIST dataset. We also saved the weights of our model. In the next tutorial, we will look at using our trained model weights to implement in our captcha solver. Things will get pretty impressive in the following tutorial so stay tuned!!

## Leave a Reply