Understanding LSTM with quick implementation in Keras

In this blog, we are going to focus primarily on LSTM implementation in Keras with an understanding of a few basic functionalities. Long Short Term Memory, abbreviated as LSTM was first introduced by Hochreiter and Schmidhuber in 1997. It is a recurrent neural network (RNN) that performs computations based on its feedback behavior. In general, RNN, are theoretically appealing but have limited practical advantages. Here’s where LSTM comes to the rescue, we will be developing this notion throughout our blog.

What is the Recurrent Neural Network?

In a feedforward neural network, there is a straightforward association of input with their respective outputs, something that is extensively used in pattern recognition. But it does not have a memory which limits it to perform certain tasks. While in RNN, signals travel from both directions developing a memory. It can generate one or more output vectors from one or more input vectors dependent on weights and prior hidden states. This distinction is clearly visible in the figure below.

 

Fig 1.1

Recurrent Neural Network and Feedforward Neural Network

RNN has a major advantage of memory due to its recurrent nature which has various applications in many domains. However, vanilla recurrent neural networks are not practically applicable due to the problem of vanishing gradient and exploding gradient. While learning is the utmost priority of any neural network, vanishing and exploding gradients act as a hindrance for the same. Backpropagation gradients accumulate and explode or diminish and vanish leading to no adjustment in weights and zero learning.

In layman terms, RNNs are best in remembering things for a short time span but when it comes to a longer period, there are modifications to information. There is rarely any demarcation between crucial and trivial information. However, ReLU (Rectified Linear Unit) can be used as a solution for this problem but LSTMs are the most promising ones as described in the next section.

What is Long Short Term Memory?

LSTM is a modified version of recurrent neural networks, which makes it easier to remember past data in memory. The problem of RNN is overcome by LSTM. Every LSTM module has three gates – Forget, Input, and Output trained by backpropagation.

The legend is as below:

it     ->  Input gate

ft     ->  Forget gate

ot     -> Output gate

ht-1   ->  Hidden state at timestamp( t-1)

ct-1   ->  Cell state (memory) at timestamp (t-1)

The following figure  shows a complete view of how LSTM and its gates work.

Fig 1.2 Long Short Term Memory

 

Implementation of RNN

There are many applications of RNN and LSTM such as used for speech recognition, language modeling, sentiment analysis and text prediction, autonomous driving systems. Also, LSTMs are widely used in various complex tasks but these topics are advanced for beginners. Hence, we will be using the MNIST dataset to understand the implementation of RNN and later, LSTM.

First, start by importing Keras and other required libraries

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, InputLayer, SimpleRNN, Dense
from keras.optimizers import Adam

Import MNIST data and normalize the data

(X_train, y_train),(X_test, y_test) = mnist.load_data() 
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

Create RNN Model

model=Sequential()
model.add(SimpleRNN(units=32, input_shape=(X_train.shape[1:]), activation="relu"))
model.add(Dense(10, activation='softmax'))

Compile and Fit the model

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.fit(X_train, y_train, epochs=3, validation_data=(X_test, y_test))

Output

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
60000/60000 [==============================] - 10s 168us/step - loss: 0.9045 - accuracy: 0.6897 - val_loss: 0.5755 - val_accuracy: 0.8159
Epoch 2/3
60000/60000 [==============================] - 10s 167us/step - loss: 0.5067 - accuracy: 0.8468 - val_loss: 0.4159 - val_accuracy: 0.8792
Epoch 3/3
60000/60000 [==============================] - 10s 168us/step - loss: 0.4050 - accuracy: 0.8799 - val_loss: 0.3306 - val_accuracy: 0.9034

Implementation of LSTM

Importing Libraries

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras.optimizers import Adam

Import MNIST data and normalize the data

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train/255.0, X_test/255.0

Create LSTM Model

model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1:]), activation='relu', return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(10, activation='softmax'))

Compile and Fit the model

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.fit(X_train, y_train, epochs=3, validation_data=(X_test, y_test))

Output

Train on 60000 samples, validate on 10000 samples 
Epoch 1/3 60000/60000 [==============================] - 155s 3ms/step - loss: 0.7145 - accuracy: 0.7681 - val_loss: 0.1817 - val_accuracy: 0.9473 
Epoch 2/3 60000/60000 [==============================] - 152s 3ms/step - loss: 0.1867 - accuracy: 0.9500 - val_loss: 0.1001 - val_accuracy: 0.9693 
Epoch 3/3 60000/60000 [==============================] - 151s 3ms/step - loss: 0.1211 - accuracy: 0.9665 - val_loss: 0.1227 - val_accuracy: 0.9652

Conclusion

It is evident from the implementation above that not only theoretically but even practically LSTM are more efficient than RNN. While working on MNIST dataset, RNN showed 90.34% accuracy and LSTM showed 96.52% accuracy at the end of 3 epochs. Hope you enjoyed reading!

Leave a Reply

Your email address will not be published. Required fields are marked *