Normalization Layer in Keras

Keras provides us normalization layer which normalizes each feature so that they maintain the contribution of every feature and also reduces Internal Covariate Shift. It is a technique for improving the speed, performance, and stability of neural networks. In the process of preparing a model, we normalize the input layer by adjusting and scaling the activation functions to increase the stability of our neural network. Batch normalization is a very common layer that is used in Keras. It is used to normalize the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

This is what the structure of a Batch Normalization layers looks like and these are arguments that can be passed inside the layer.


Applying normalization to the input features results in an increase in the convergence rate of our algorithm. As a result, it also speeds up the learning process. Now we’ll come to point, “How does it actually work ?”.

Suppose you train a neural network on the images of black cats only. Then your model won’t perform well on different colored images of cats. The reason is the shift in the input distribution. This is known as covariate shift. The covariate shift is nothing but the change in the distribution of the input variables. Batch normalization helps us in reducing this covariate shift. The Batch Normalisation layer will transform inputs so that they are standardized, which means that they will have a mean of zero and a standard deviation of one.

The Batch Normalization layer takes ‘momentum’ as an argument. It affects the output of the previous activation layer by subtracting the batch mean, and then dividing by the batch’s standard deviation.

bn = BatchNormalization(momentum=0.0)

Now we’ll look at how we can actually use it in our models.

The first step would be instantiating a Sequential model. Now we can add different layers like the Dense layer, Conv2D, etc. It should be added after the activation function between the Dense layer and the hidden layer.

It can be imported in our codes like this :

from keras.layers import BatchNormalization

Here is an example for a better understanding of how we can use this in our models :

from keras.layers import Dense
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import BatchNormalization

model.add(Conv2D(32, (3,3), activation='relu'))
model.add(Conv2D(32, (3,3), activation='relu'))

Batch normalization adds two additional trainable parameters to a layer which are the normalized output that’s multiplied by a gamma (standard deviation) parameter, and the additional beta (mean) parameter.

It is also very helpful if we want to use a higher learning rate in our model. If we use a high learning rate in our neural network model, then we can face the vanishing gradient problem. But if we do batch normalization, small changes in parameter to one layer are not transferred to other layers. This makes it easier for us to use a relatively higher learning rate.

The batch Normalization layer in Keras also has some regularization effect. So usually dropouts can be reduced.


Resources for deeper understanding and references.

“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, is the name of the research paper that was authored by Sergey Ioffe and Christian Szegedy. You can take a look at this paper to get a deeper understanding of these regularization techniques and the mathematics behind this functionality. Click this: Batch Normalization Research Paper

I have added the link to Keras official documentation for this. You can take a look: Batch Normalization in Keras


Leave a Reply

Your email address will not be published. Required fields are marked *