Hyperparameter Tuning using TensorFlow in Python

Welcome, everyone… In this tutorial, we will learn to create and run hyperparameter tuning experiments using TensorFlow and Keras tuner with Python programming. And we will also learn to create custom Keras tuners.

So, the keras-tuner is an open-source package for Keras which helps in the automation of hyperplane tuning for the Keras models. Hyperparameters are variables that look after the training process and topology of the model. The variables remain constant through the training process and directly impact the performance of the program.

To install keras tuner run this line in your command prompt: “pip install keras-tuner”.

Importing Libraries

Let’s import all the necessary libraries required.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import kerastuner
%matplotlib inline

Loading The Dataset

So, here we are using a very common dataset i.e. fashion mnist dataset. This dataset consists of black and white images of different clothing articles. The images are 28*28 in dimension and have 10 different classes. It is already within the module. So no need to download it from any external URL.

The code below will download the data.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

Now, let us see how our images look like. The below code will show us the image of an article

plt.imshow(x_train[], cmap='binary')
plt.xlabel(y_train[1])
plt.show()

Output:

tshirt

above we can see that the image is of a T-shirt. you can also check the labels of different clothing items by changing the parameter in x_train and y_train.

After this, let us now check the dimensions of the data.

x_train.shape

Output:

(60000, 28, 28)

Here, we find that there are 60000 images of dimensions 28*28.

Creating The Model

Here, we will be using the sequential API of Keras and a simple neural network. Firstly, we will add the flatten layer to take the input image of 28*28 dimension then we will add a lambda layer to normalize our input image so that it remains in the range. Next, for the number of hidden layers using hyperparameters and for loop we are adding the dense and dropout layer with activation function taken as ‘relu’ and dropout rate as 0.1. Finally, for the output layer, we will add another dense layer with 10 limits and activation function as ‘softmax’ since we are classifying the image.

Lastly, while compiling the model we will take the loss as sparse_categorical_crossentropy because we have not on hot encoded the labels instead we are using numerical values. We will use adam optimizer with learning rate which is another hyperparameter and metrics as accuracy.

Now, let’s talk about the hyperparameters. So here, while creating the model if the hyperparameter object (i.e. hp) is not null then, the tuner would choose the different hyperparameters automatically from the given values.

def create_model(hp):
    if hp:
        dropout_rate = hp.Float('dropout_rate', min_value=0.1, max_value=0.5)
        num_units = hp.Choice('num_units', values=[8, 16, 32])
        learning_rate = hp.Float('learning_rate', min_value=0.0001, max_value=0.1)
        num_hidden_layers = hp.Choice('num_hidden_layers', values=[1, 2, 3])
    else:
        dropout_rate = 0.1
        num_units = 8
        learning_rate = 0.01
        num_hidden_layers = 1
    
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    model.add(tf.keras.layers.Lambda(lambda x: x/255.))
    
    for _ in range(0, num_hidden_layers):
        model.add(tf.keras.layers.Dense(num_units, activation='relu'))
        model.add(tf.keras.layers.Dropout(dropout_rate))
    
    model.add(tf.keras.layers.Dense(10, activation='softmax'))

    model.compile(
        loss='sparse_categorical_crossentropy',
        optimizer=tf.keras.optimizers.Adam(learning_rate),
        metrics=['accuracy']
    )
    
    return model

So now, let us look at our model:

create_model(None).summary()

Output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
lambda (Lambda)              (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 8)                 6280      
_________________________________________________________________
dropout (Dropout)            (None, 8)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                90        
=================================================================
Total params: 6,370
Trainable params: 6,370
Non-trainable params: 0
_________________________________________________________________

Creating The Tuner

For creating the custom tuner we are using BayesianOptimization in keras-tuner. We are getting a batch size of minimum 32 and maximum 128 with the step size of 32.

class CustomTuner(kerastuner.tuners.BayesianOptimization):
    def run_trial(self, trial, *args, **kwargs):
        kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 128, step=32)
        super(CustomTuner, self).run_trial(trial, *args, **kwargs)

Now, creating the instance of the class CustomTuner. Here, the function create_model is called and the objective is set to val_accuracy because we need to maximize it. Also, Maximum trials are set to 20 and we need to set a directory for logs which is named as logs as well. Next, we add the project name for logs and set overwrite to True because we may need to implement it again.

tuner = CustomTuner(
    create_model,
    objective='val_accuracy',
    max_trials=20,
    directory='logs',
    project_name='fashion_mnist',
    overwrite=True,
)

We will see the search_space _summary i.e. the hyperparameters on which the tuner would select values.

tuner.search_space_summary()

Output:

Search space summary
|-Default search space size: 4

dropout_rate (Float)
|-default: 0.1
|-max_value: 0.5
|-min_value: 0.1
|-sampling: None
|-step: None

num_units (Choice)
|-default: 8
|-ordered: True
|-values: [8, 16, 32]

learning_rate (Float)
|-default: 0.0001
|-max_value: 0.1
|-min_value: 0.0001
|-sampling: None
|-step: None

num_hidden_layers (Choice)
|-default: 1
|-ordered: True
|-values: [1, 2, 3]

Here you will see the summary of the hyperparameters which we have set earlier while creating the model.

Running The Tuner

Using search function we will now run the tuner for different batch size using the validation dataset.

tuner.search(
    x_train, y_train,
    validation_data=(x_test, y_test),
    epochs=5, verbose=False,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2)
    ]
)

Then, we will find the best one of all. You can also get best 3 just enter the number in the parameter.

tuner.results_summary(1)

Output:

Results summary
|-Results in logs\fashion_mnist
|-Showing 1 best trials
|-Objective(name='val_accuracy', direction='max')

Trial summary
|-Trial ID: 768614a634fe37c99c5a5ba2d0662c42
|-Score: 0.838100016117096
|-Best step: 0

Hyperparameters
|-batch_size: 32
|-dropout_rate: 0.1
|-learning_rate: 0.0001
|-num_hidden_layers: 3
|-num_units: 32
model = tuner.get_best_models(num_models=1)[0]
model.summary()

Output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
lambda (Lambda)              (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 32)                25120     
_________________________________________________________________
dropout (Dropout)            (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1056      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
dropout_2 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                330       
=================================================================
Total params: 27,562
Trainable params: 27,562
Non-trainable params: 0
__________________________________________________

Getting Results

Now, we will check the accuracy of the model on the best hyperparameters i.e. for batch_size = 32. In callbacks, we are using EarlyStopping function to get the maximum accuracy in limited time.

h = model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    epochs=10, verbose=2,
    batch_size=32,
    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3) ]
)

Output:

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 - 6s - loss: 0.3305 - accuracy: 0.8816 - val_loss: 0.3611 - val_accuracy: 0.8728
Epoch 2/10
60000/60000 - 6s - loss: 0.3284 - accuracy: 0.8820 - val_loss: 0.3585 - val_accuracy: 0.8736
Epoch 3/10
60000/60000 - 7s - loss: 0.3267 - accuracy: 0.8831 - val_loss: 0.3557 - val_accuracy: 0.8740
Epoch 4/10
60000/60000 - 6s - loss: 0.3231 - accuracy: 0.8841 - val_loss: 0.3567 - val_accuracy: 0.8736
Epoch 5/10
60000/60000 - 6s - loss: 0.3259 - accuracy: 0.8835 - val_loss: 0.3526 - val_accuracy: 0.8746
Epoch 6/10
60000/60000 - 7s - loss: 0.3210 - accuracy: 0.8844 - val_loss: 0.3564 - val_accuracy: 0.8740
Epoch 7/10
60000/60000 - 7s - loss: 0.3223 - accuracy: 0.8849 - val_loss: 0.3562 - val_accuracy: 0.8756
Epoch 8/10
60000/60000 - 7s - loss: 0.3177 - accuracy: 0.8871 - val_loss: 0.3554 - val_accuracy: 0.8745
Epoch 9/10
60000/60000 - 7s - loss: 0.3153 - accuracy: 0.8862 - val_loss: 0.3589 - val_accuracy: 0.8750
Epoch 10/10
60000/60000 - 8s - loss: 0.3164 - accuracy: 0.8871 - val_loss: 0.3534 - val_accuracy: 0.8746

Finally, the accuracy is evaluated:

model.evaluate(x_test, y_test)

Output:

 - 1s 57us/sample - loss: 0.2177 - accuracy: 0.8746
[0.35336311395168307, 0.8746]

So, in the end, we get a validation accuracy of 87%.

In this article we learned about hyperparameters, tuning of hyperparameters using TensorFlow and kera-stuner and also a little bit more about neural networks.

Thank you for reading and enjoy learning.

Leave a Reply

Your email address will not be published. Required fields are marked *