Different Regularization techniques in Deep Learning

In this tutorial program, we will learn about Different Regularization techniques in Deep learning with the language used is Python. So here I am going to discuss what are the basics of regularization in Deep learning how to apply it.

This is mainly used in building models using Convolution Neural Network and Recurrent Neural Network and solving problems including computer vision and other Machine Learning problems.

The different types  of Regularization techniques are:

  1. Regularization (L1 and L2).
  2. Dropout Regularization.
  3. Early Stopping.
  4. Data Augmentation.


Regularization makes changes to the algorithm used such that the model does not overfit. This modification makes the Machine Learning model perform better on the validation or test set.

Import Libraries:

Importing basic python libraries for building Deep Learning models The basic Python libraries Numpy and Pandas are used along with Sklearn for importing models.

Apart from them Matplotlib and Seaborn are used for plotting the image.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import scipy.io
import seaborn as sns

L1 and L2 Regularization

We are going to use the formula for computing the cost of the regularization model. lambda represents the hyperparameter used.

L1 and L2 are mainly used for reducing overfitting on the training test due to high variance.

We are going to compute cost and feed into the model using backpropagation and various layers of neural networks to build a better model.

def compute_cost_with_regularization(A3, Y, parameters, lambd):
    m = Y.shape[1]
    W1 = parameters["W1"]#W are the weights
    W2 = parameters["W2"]
    W3 = parameters["W3"]
    cross_entropy_cost = compute_cost(A3, Y) 
    L1_regularization_cost = (1/m) * (lambd/2) * (np.sum(np.square(A1)) + np.sum(np.square(A2)) + np.sum(np.square(A3))
    L2_regularization_cost = (1/m) * (lambd/2) * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3))
    cost = cross_entropy_cost + L2_regularization_cost + L1_regularization_cost
    return cost

Dropout Regularization

Dropout is used to knock down units and reduce the neural network into a smaller number of units.

This regularisation technique that is widely used in computer vision problems. We can build simpler networks by backpropagation.

keep_prob is the parameter which is the threshold value used to prevent overfitting. This threshold value needs to be kept smaller for layers containing a lot of parameters.

We are going to use both Relu and sigmoid activation functions for different layers.

def dropout(X, keep_prob=0.5):

    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    Z1 = np.dot(W1, X) + b1
    A1 = relu(Z1)
    D1 = np.random.rand(A1.shape[0], A1.shape[1])                                
    A1 = A1 * D1                                      
    A1 = A1 / keep_prob                               
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    D2 = np.random.rand(A2.shape[0], A2.shape[1])     
    A2 = A2 * D2                                    
    A2 = A2 / keep_prob                               
    Z3 = np.dot(W3, A2) + b3
    A3 = srelu(Z3)
    return A3

Early Stopping and Data Augmentation

L1 and L2 Regularization along with Dropout are the more commonly used techniques of regularization.

We are now going to discuss Data Augmentation – In data augmentation we are going to use different views of the same image to reduce overfitting. This technique again is used to get more data for computer vision problems by changing the position or zooming in and out of the image to help prevent the problem of overfitting.

Early Stopping is another technique in which when the model starts performing worse on the dev/test set we immediately stop working on the model.

def Data(flip):
          model = Sequential()
          model.add(LSTM(5, input_shape=(1, window_size)))

def Early(gradient):
          optimizer = optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)

          model.compile(loss='mean_squared_error', optimizer=optimizer)


Leave a Reply

Your email address will not be published. Required fields are marked *