Hyperparameter Tuning using TensorFlow in Python
Welcome, everyone… In this tutorial, we will learn to create and run hyperparameter tuning experiments using TensorFlow and Keras tuner with Python programming. And we will also learn to create custom Keras tuners.
So, the keras-tuner is an open-source package for Keras which helps in the automation of hyperplane tuning for the Keras models. Hyperparameters are variables that look after the training process and topology of the model. The variables remain constant through the training process and directly impact the performance of the program.
To install keras tuner run this line in your command prompt: “pip install keras-tuner”.
Importing Libraries
Let’s import all the necessary libraries required.
import numpy as np import matplotlib.pyplot as plt import tensorflow as tf import kerastuner %matplotlib inline
Loading The Dataset
So, here we are using a very common dataset i.e. fashion mnist dataset. This dataset consists of black and white images of different clothing articles. The images are 28*28 in dimension and have 10 different classes. It is already within the module. So no need to download it from any external URL.
The code below will download the data.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
Now, let us see how our images look like. The below code will show us the image of an article
plt.imshow(x_train[], cmap='binary') plt.xlabel(y_train[1]) plt.show()
Output:
above we can see that the image is of a T-shirt. you can also check the labels of different clothing items by changing the parameter in x_train and y_train.
After this, let us now check the dimensions of the data.
x_train.shape
Output:
(60000, 28, 28)
Here, we find that there are 60000 images of dimensions 28*28.
Creating The Model
Here, we will be using the sequential API of Keras and a simple neural network. Firstly, we will add the flatten layer to take the input image of 28*28 dimension then we will add a lambda layer to normalize our input image so that it remains in the range. Next, for the number of hidden layers using hyperparameters and for loop we are adding the dense and dropout layer with activation function taken as ‘relu’ and dropout rate as 0.1. Finally, for the output layer, we will add another dense layer with 10 limits and activation function as ‘softmax’ since we are classifying the image.
Lastly, while compiling the model we will take the loss as sparse_categorical_crossentropy because we have not on hot encoded the labels instead we are using numerical values. We will use adam optimizer with learning rate which is another hyperparameter and metrics as accuracy.
Now, let’s talk about the hyperparameters. So here, while creating the model if the hyperparameter object (i.e. hp) is not null then, the tuner would choose the different hyperparameters automatically from the given values.
def create_model(hp): if hp: dropout_rate = hp.Float('dropout_rate', min_value=0.1, max_value=0.5) num_units = hp.Choice('num_units', values=[8, 16, 32]) learning_rate = hp.Float('learning_rate', min_value=0.0001, max_value=0.1) num_hidden_layers = hp.Choice('num_hidden_layers', values=[1, 2, 3]) else: dropout_rate = 0.1 num_units = 8 learning_rate = 0.01 num_hidden_layers = 1 model = tf.keras.models.Sequential() model.add(tf.keras.layers.Flatten(input_shape=(28, 28))) model.add(tf.keras.layers.Lambda(lambda x: x/255.)) for _ in range(0, num_hidden_layers): model.add(tf.keras.layers.Dense(num_units, activation='relu')) model.add(tf.keras.layers.Dropout(dropout_rate)) model.add(tf.keras.layers.Dense(10, activation='softmax')) model.compile( loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate), metrics=['accuracy'] ) return model
So now, let us look at our model:
create_model(None).summary()
Output:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ lambda (Lambda) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 8) 6280 _________________________________________________________________ dropout (Dropout) (None, 8) 0 _________________________________________________________________ dense_1 (Dense) (None, 10) 90 ================================================================= Total params: 6,370 Trainable params: 6,370 Non-trainable params: 0 _________________________________________________________________
Creating The Tuner
For creating the custom tuner we are using BayesianOptimization in keras-tuner. We are getting a batch size of minimum 32 and maximum 128 with the step size of 32.
class CustomTuner(kerastuner.tuners.BayesianOptimization): def run_trial(self, trial, *args, **kwargs): kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 128, step=32) super(CustomTuner, self).run_trial(trial, *args, **kwargs)
Now, creating the instance of the class CustomTuner. Here, the function create_model is called and the objective is set to val_accuracy because we need to maximize it. Also, Maximum trials are set to 20 and we need to set a directory for logs which is named as logs as well. Next, we add the project name for logs and set overwrite to True because we may need to implement it again.
tuner = CustomTuner( create_model, objective='val_accuracy', max_trials=20, directory='logs', project_name='fashion_mnist', overwrite=True, )
We will see the search_space _summary i.e. the hyperparameters on which the tuner would select values.
tuner.search_space_summary()
Output:
Search space summary |-Default search space size: 4 dropout_rate (Float) |-default: 0.1 |-max_value: 0.5 |-min_value: 0.1 |-sampling: None |-step: None num_units (Choice) |-default: 8 |-ordered: True |-values: [8, 16, 32] learning_rate (Float) |-default: 0.0001 |-max_value: 0.1 |-min_value: 0.0001 |-sampling: None |-step: None num_hidden_layers (Choice) |-default: 1 |-ordered: True |-values: [1, 2, 3]
Here you will see the summary of the hyperparameters which we have set earlier while creating the model.
Running The Tuner
Using search function we will now run the tuner for different batch size using the validation dataset.
tuner.search( x_train, y_train, validation_data=(x_test, y_test), epochs=5, verbose=False, callbacks=[ tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2) ] )
Then, we will find the best one of all. You can also get best 3 just enter the number in the parameter.
tuner.results_summary(1)
Output:
Results summary |-Results in logs\fashion_mnist |-Showing 1 best trials |-Objective(name='val_accuracy', direction='max') Trial summary |-Trial ID: 768614a634fe37c99c5a5ba2d0662c42 |-Score: 0.838100016117096 |-Best step: 0 Hyperparameters |-batch_size: 32 |-dropout_rate: 0.1 |-learning_rate: 0.0001 |-num_hidden_layers: 3 |-num_units: 32
model = tuner.get_best_models(num_models=1)[0] model.summary()
Output:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ lambda (Lambda) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 32) 25120 _________________________________________________________________ dropout (Dropout) (None, 32) 0 _________________________________________________________________ dense_1 (Dense) (None, 32) 1056 _________________________________________________________________ dropout_1 (Dropout) (None, 32) 0 _________________________________________________________________ dense_2 (Dense) (None, 32) 1056 _________________________________________________________________ dropout_2 (Dropout) (None, 32) 0 _________________________________________________________________ dense_3 (Dense) (None, 10) 330 ================================================================= Total params: 27,562 Trainable params: 27,562 Non-trainable params: 0 __________________________________________________
Getting Results
Now, we will check the accuracy of the model on the best hyperparameters i.e. for batch_size = 32. In callbacks, we are using EarlyStopping function to get the maximum accuracy in limited time.
h = model.fit( x_train, y_train, validation_data=(x_test, y_test), epochs=10, verbose=2, batch_size=32, callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3) ] )
Output:
Train on 60000 samples, validate on 10000 samples Epoch 1/10 60000/60000 - 6s - loss: 0.3305 - accuracy: 0.8816 - val_loss: 0.3611 - val_accuracy: 0.8728 Epoch 2/10 60000/60000 - 6s - loss: 0.3284 - accuracy: 0.8820 - val_loss: 0.3585 - val_accuracy: 0.8736 Epoch 3/10 60000/60000 - 7s - loss: 0.3267 - accuracy: 0.8831 - val_loss: 0.3557 - val_accuracy: 0.8740 Epoch 4/10 60000/60000 - 6s - loss: 0.3231 - accuracy: 0.8841 - val_loss: 0.3567 - val_accuracy: 0.8736 Epoch 5/10 60000/60000 - 6s - loss: 0.3259 - accuracy: 0.8835 - val_loss: 0.3526 - val_accuracy: 0.8746 Epoch 6/10 60000/60000 - 7s - loss: 0.3210 - accuracy: 0.8844 - val_loss: 0.3564 - val_accuracy: 0.8740 Epoch 7/10 60000/60000 - 7s - loss: 0.3223 - accuracy: 0.8849 - val_loss: 0.3562 - val_accuracy: 0.8756 Epoch 8/10 60000/60000 - 7s - loss: 0.3177 - accuracy: 0.8871 - val_loss: 0.3554 - val_accuracy: 0.8745 Epoch 9/10 60000/60000 - 7s - loss: 0.3153 - accuracy: 0.8862 - val_loss: 0.3589 - val_accuracy: 0.8750 Epoch 10/10 60000/60000 - 8s - loss: 0.3164 - accuracy: 0.8871 - val_loss: 0.3534 - val_accuracy: 0.8746
Finally, the accuracy is evaluated:
model.evaluate(x_test, y_test)
Output:
Leave a Reply