Implement the Transformer Encoder from Scratch using TensorFlow and Keras

In this tutorial, we will implement Transformer Encoder from scratch using python libraries like TensorFlow and Keras.

Let’s start by understanding what you mean by Transformer Encoder.

What is Transformer Encoder?

A transformer is a deep-learning model which has a self-attention mechanism and also the significance of each part of the input data weighs differently. It is used in fields like natural language processing(NLP) and computer vision(cv).

A transformer has an encoder-decoder architecture. Extracting features from an input sentence is done by an encoder and the decoder uses these features to produce an output sentence. The encoder in the transformer has multiple encoder blocks.

The encoder consists of two main sub-layers one comprises multi-head attention and another comprises a fully-connected feed-forward network. 

Implementing Transformer Encoder

Let’s first start by installing the libraries

pip install tensorflow
pip install keras
pip install positional-encodings

Let’s now start by importing all the necessary libraries:

from tensorflow.keras.layers import LayerNormalization, Layer, Dense, ReLU, Dropout,MultiHeadAttention
from positional_encodings import *
from numpy import random

Now let’s write the code for class add and normalization layer:

class Add_Normalization_layer(Layer):
    def __init__(self, **kwargs):
        super(Add_Normalization_layer, self).__init__(**kwargs)
        self.layer_normalization = LayerNormalization() 

    def calling(self, x, sublayer_x):
        sum = x + sublayer_x

        return self.layer_normalization(sum)

Now let’s implement feed-forward layer class

class FeedForwardLayer(Layer):
    def __init__(self, fforward, model, **kwargs):
        super(FeedForwardLayer, self).__init__(**kwargs)
        self.fully_connected1 = Dense(fforward)  
        self.fully_connected2 = Dense(model)  
        self.activation = ReLU()  

    def call(self, x):
        x_f1 = self.fully_connected1(x)

        return self.fully_connected2(self.activation(x_f1))

Now lets create a Encoder layer class  and initialize all the sub-layers that it consists of:

class Encoder_Layer(Layer):
    def __init__(self, h, k, v, model, fforward, rate, **kwargs):
        super(EncoderLayer, self).__init__(**kwargs)
        self.multihead_attention = MultiHeadAttention(h, k, v, model)
        self.dropout1 = Dropout(rate)
        self.add_normalization1 = AddNormalizationLayer()
        self.feed_forward = FeedForward(fforward, model)
        self.dropout2 = Dropout(rate)
        self.add_normlization2 = AddNormalizationLayer()

    def calling(self, x, paddingmask, train):
        multihead_attention_output = self.multihead_attention(x, x, x, paddingmask)
        multihead_attention_output = self.dropout1(multihead_attention_output, train=train)

        addnormalization_output = self.add_norm1(x, multihead_attention_output)
        feedforward_output = self.feed_forward(addnorm_output)
        feedforward_output = self.dropout2(feedforward_output, train=train)

        return self.add_normalization2(addnormalization_output, feedforward_output)

Now let’s implement transformer encoder class:

class TEncoder(Layer):
    def __init__(self, vocab_size, sequence_length, h, k, v, model, fforward, n, rate, **kwargs):
        super(Encoder, self).__init__(**kwargs)
        self.pos_encoding = PositionEmbeddingFixedWeights(sequence_length, vocab_size, model)
        self.dropout = Dropout(rate)
        self.encoderlayer = [EncoderLayer(h, k, v,model,fforward, rate) for _ in range(n)]

    def call(self, input_sentence, padding_mask, training):
        encoding_output = self.encoding(input_sentence)
        x = self.dropout(encoding_output, train=train)

        for i, layer in enumerate(self.encoderlayer):
            x = layer(x, padding_mask, train)

        return x

Now let’s check the output of the code:

vocab_size = 26
input_sequence_length = 5  
h = 6
k = 54  
v = 54  
fforward = 1458 
model = 216
n = 6  

batch_size = 54
dropout_rate = 0.1  

input_sequence = random.random((batch_size, input_sequence_length))

transformer_encoder = TEncoder(vocab_size, input_sequence_length, h, k, v, model, fforward, n, dropout_rate)
print(transformer_encoder(input_sequence, None, True))



Leave a Reply

Your email address will not be published. Required fields are marked *