Understanding LSTM with quick implementation in Keras
In this blog, we are going to focus primarily on LSTM implementation in Keras with an understanding of a few basic functionalities. Long Short Term Memory, abbreviated as LSTM was first introduced by Hochreiter and Schmidhuber in 1997. It is a recurrent neural network (RNN) that performs computations based on its feedback behavior. In general, RNN, are theoretically appealing but have limited practical advantages. Here’s where LSTM comes to the rescue, we will be developing this notion throughout our blog.
What is the Recurrent Neural Network?
In a feedforward neural network, there is a straightforward association of input with their respective outputs, something that is extensively used in pattern recognition. But it does not have a memory which limits it to perform certain tasks. While in RNN, signals travel from both directions developing a memory. It can generate one or more output vectors from one or more input vectors dependent on weights and prior hidden states. This distinction is clearly visible in the figure below.
Fig 1.1
Recurrent Neural Network and Feedforward Neural Network
RNN has a major advantage of memory due to its recurrent nature which has various applications in many domains. However, vanilla recurrent neural networks are not practically applicable due to the problem of vanishing gradient and exploding gradient. While learning is the utmost priority of any neural network, vanishing and exploding gradients act as a hindrance for the same. Backpropagation gradients accumulate and explode or diminish and vanish leading to no adjustment in weights and zero learning.
In layman terms, RNNs are best in remembering things for a short time span but when it comes to a longer period, there are modifications to information. There is rarely any demarcation between crucial and trivial information. However, ReLU (Rectified Linear Unit) can be used as a solution for this problem but LSTMs are the most promising ones as described in the next section.
What is Long Short Term Memory?
LSTM is a modified version of recurrent neural networks, which makes it easier to remember past data in memory. The problem of RNN is overcome by LSTM. Every LSTM module has three gates – Forget, Input, and Output trained by backpropagation.
The legend is as below:
it -> Input gate
ft -> Forget gate
ot -> Output gate
ht-1 -> Hidden state at timestamp( t-1)
ct-1 -> Cell state (memory) at timestamp (t-1)
The following figure shows a complete view of how LSTM and its gates work.
Fig 1.2 Long Short Term Memory
Implementation of RNN
There are many applications of RNN and LSTM such as used for speech recognition, language modeling, sentiment analysis and text prediction, autonomous driving systems. Also, LSTMs are widely used in various complex tasks but these topics are advanced for beginners. Hence, we will be using the MNIST dataset to understand the implementation of RNN and later, LSTM.
First, start by importing Keras and other required libraries
import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, LSTM, InputLayer, SimpleRNN, Dense from keras.optimizers import Adam
Import MNIST data and normalize the data
(X_train, y_train),(X_test, y_test) = mnist.load_data() X_train = X_train.astype('float32') / 255.0 X_test = X_test.astype('float32') / 255.0
Create RNN Model
model=Sequential() model.add(SimpleRNN(units=32, input_shape=(X_train.shape[1:]), activation="relu")) model.add(Dense(10, activation='softmax'))
Compile and Fit the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',metrics=['accuracy']) model.fit(X_train, y_train, epochs=3, validation_data=(X_test, y_test))
Output
Leave a Reply