Prediction Model using LSTM with Keras
In this tutorial, we will learn to build a recurrent neural network (LSTM) using Keras library. Keras is a simple tool used to construct neural networks.
There will be the following sections:
- Importing libraries
- Importing Dataset
- Data Preprocessing
- Building an LSTM model
- Training the model on the dataset
- Predicting the test results
What is a recurrent neural network?
These are a class of neural networks that allow previous outputs to be used as inputs having hidden states. Recurrent neural networks are mostly used for NLP(Natural Language Processing), Speech Recognition, Time series prediction, etc.
What is LSTM?
Long Short Term Memory is a type of recurrent neural network. An LSTM unit contains an input gate, output gate, and a forget gate.
PREDICTION MODEL using LSTM
We will be building a model to predict the stock price of a company.
1. IMPORTING LIBRARIES
import math
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
Pandas: A python package which is a fast, powerful, and open-source data manipulation tool.
Numpy: A python package used for scientific computing.
Math: A python package used for mathematical functions like ceil.
MinMaxScaler: Transforms the features to a given range.
Sequential: The model used will be sequential.
Dense, LSTM: These are the layers we will use.
So, these are the initial libraries you need to have. You can find documentation of pandas, NumPy, and sklearn libraries at the end of this tutorial.
2. IMPORTING DATASET
I am using a dataset of an NSE company. So, I will be using close, high, and low prices as inputs.
dataset = pd.read_csv('NSE-TATAGLOBAL.csv')
training_set = dataset.iloc[:, 1:2]
dataset = training_set.values
training_data_len = math.ceil(len(dataset) * 0.8)
80 percent of the data has been used as a training dataset.
3. DATA PREPROCESSING
Scaling:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:training_data_len , :]
print(train_data)
We are scaling the data between 0 and 1 as specified in the range.
OUTPUT:
[[0.6202352 ]
[0.62226277]
[0.64436334]
...
[0.13260341]
[0.13807786]
[0.15794809]]
Now next we will create data with timesteps. And here I am giving 80 timesteps.
x_train = []
y_train = []
for i in range(80, len(train_data)):
x_train.append(train_data[i-80:i, 0])
y_train.append(train_data[i, 0])
Next, convert the data into a 3D array with x_train samples, 80 timestamps, and one feature at each step.
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
x_train:
print(x_train)
OUTPUT:
[[[0.6202352 ]
[0.62226277]
[0.64436334]
...
[0.75344688]
[0.736618 ]
[0.70721817]]
[[0.62226277]
[0.64436334]
[0.61719384]
...
[0.736618 ]
[0.70721817]
[0.74574209]]
[[0.64436334]
[0.61719384]
[0.61820762]
...
[0.70721817]
[0.74574209]
[0.7648013 ]]
...
[[0.14557989]
[0.14497161]
[0.14801298]
...
[0.14760746]
[0.15369019]
[0.14801298]]
[[0.14497161]
[0.14801298]
[0.14476886]
...
[0.15369019]
[0.14801298]
[0.13260341]]
[[0.14801298]
[0.14476886]
[0.11719384]
...
[0.14801298]
[0.13260341]
[0.13807786]]]
y_train:
print(y_train)
OUTPUT:
[0.74574209 0.7648013 0.75385239 ... 0.13260341 0.13807786 0.15794809]
4. BUILDING THE LSTM MODEL
Now, we will construct a model with three lstm layers, one hidden layer, and an output layer.
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= True))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
Here, for the first lstm layer we give input shape since we have considered it as the input layer. And return sequences will be ‘True’ as we have the next layer as lstm layer. Even in the second layer return sequences is ‘True’ for the above-mentioned reason.
Next, we will compile the model.
model.compile(optimizer='adam', loss='mean_squared_error')
I have used adam as the optimizer and mean square error as the loss function.
Thus, the model has been built.
5. TRAINING THE MODEL ON THE DATASET
To train our model with the dataset, we will pass x_train and y_train into the fit() function. And I am giving the number of epochs as 1 as it takes a lot of time. This takes time as I am using three lstm layers.
model.fit(x_train, y_train, batch_size=1, epochs=1)
OUTPUT:
Epoch 1/1
1548/1548 [==============================] - 133s 86ms/step - loss: 0.0040
<keras.callbacks.callbacks.History at 0x7fe26d8eaf28>
Thus, we have trained our model with our dataset.
6. PREDICTING THE TEST RESULTS
test_data = scaled_data[training_data_len + 1:, :]
We need to do everything just like we did for the training dataset.
x_test = []
y_test = dataset[training_data_len + 1:, :]
for i in range(80, len(test_data)):
x_test.append(test_data[i-80:i, 0])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
Now, we will predict for the testing dataset using the predict method.
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
print(predictions)
OUTPUT:
[[ 97.71862 ]
[ 98.4509 ]
[ 99.20563 ]
[ 99.916046]
[100.537575]
[101.07158 ]
[101.50879 ]
[101.78858 ]
[101.854805]
[101.6909 ]
[101.35759 ]
[100.91639 ]
[100.46959 ]
[100.12367 ]
[ 99.947334]
[ 99.88831 ]
[ 99.85217 ]
[ 99.844055]
[ 99.80163 ]
[ 99.70996 ]
[ 99.49889 ]
[ 99.15326 ]
[ 98.73468 ]
[ 98.32447 ]
[ 97.99761 ]
[ 97.76013 ]
[ 97.65024 ]
[ 97.726845]
[ 97.9334 ]
[ 98.189156]
[ 98.38071 ]
[ 98.649506]
[ 99.03978 ]
[ 99.50269 ]
[ 99.96796 ]
[100.47448 ]
[100.99707 ]
[101.468185]
[101.87214 ]
[102.208015]
[102.56024 ]
[102.94398 ]
[103.35189 ]
[103.65423 ]
[103.83893 ]
[103.98069 ]
[104.13641 ]
[104.20892 ]
[104.23321 ]
[104.2002 ]
[104.14319 ]
[104.06225 ]
[103.87655 ]
[103.50484 ]
[103.239044]
[103.25357 ]
[103.71132 ]
[104.5674 ]
[105.74619 ]
[107.07483 ]
[108.33799 ]
[109.43311 ]
[110.4667 ]
[111.474686]
[112.3064 ]
[112.96142 ]
[113.534706]
[114.05379 ]
[114.502304]
[114.83853 ]
[115.0587 ]
[115.11143 ]
[114.91723 ]
[114.52541 ]
[114.07454 ]
[113.654976]
[113.24799 ]
[112.9232 ]
[112.62125 ]
[112.3348 ]
[112.182495]
[112.05153 ]
[112.0243 ]
[112.093 ]
[112.208786]
[112.40671 ]
[112.48131 ]
[112.34605 ]
[111.89008 ]
[111.16317 ]
[110.25264 ]
[109.21262 ]
[108.14005 ]
[107.08519 ]
[106.192345]
[105.53933 ]
[105.24163 ]
[105.236115]
[105.450935]
[105.85888 ]
[106.41979 ]
[106.92909 ]
[107.31822 ]
[107.62943 ]
[107.80515 ]
[107.792465]
[107.56435 ]
[107.21245 ]
[106.74933 ]
[106.11824 ]
[105.34554 ]
[104.524475]
[103.65504 ]
[102.84883 ]
[102.26724 ]
[102.05765 ]
[102.243996]
[102.70519 ]
[103.33343 ]
[104.04343 ]
[104.75268 ]
[105.456825]
[106.10178 ]
[106.73338 ]
[107.34779 ]
[107.853355]
[108.221344]
[108.45254 ]
[108.57008 ]
[108.658165]
[108.76307 ]
[108.92206 ]
[109.18098 ]
[109.58771 ]
[110.13589 ]
[110.685616]
[111.172424]
[111.52854 ]
[111.75751 ]
[111.84017 ]
[111.88292 ]
[111.93912 ]
[111.974236]
[112.05834 ]
[112.34029 ]
[112.774956]
[113.27403 ]
[113.77154 ]
[114.05832 ]
[113.99121 ]
[113.586914]
[112.94568 ]
[112.21199 ]
[111.48157 ]
[110.85792 ]
[110.332756]
[109.89753 ]
[109.549576]
[109.29335 ]
[109.10546 ]
[108.9615 ]
[108.87493 ]
[108.862175]
[108.9958 ]
[109.08713 ]
[109.05344 ]
[108.86061 ]
[108.37708 ]
[107.65524 ]
[106.80602 ]
[105.951225]
[105.13722 ]
[104.349655]
[103.615234]
[103.153015]
[103.045074]
[103.22854 ]
[103.58782 ]
[104.19318 ]
[104.8473 ]
[105.43034 ]
[105.87804 ]
[106.09947 ]
[105.99642 ]
[105.58811 ]
[105.097824]
[104.7387 ]
[104.627396]
[104.72557 ]
[104.992195]
[105.4015 ]
[105.96034 ]
[106.569756]
[107.34747 ]
[108.2911 ]
[109.35128 ]
[110.3774 ]
[111.31862 ]
[112.11805 ]
[112.75516 ]
[113.25819 ]
[113.637764]
[113.9063 ]
[114.04697 ]
[114.04433 ]
[113.97714 ]
[114.01504 ]
[114.30222 ]
[114.8553 ]
[115.61134 ]
[116.43383 ]
[117.139206]
[117.64049 ]
[117.99163 ]
[118.237045]
[118.41954 ]
[118.62825 ]
[118.88099 ]
[119.20031 ]
[119.65225 ]
[120.22389 ]
[120.80476 ]
[121.32122 ]
[121.741425]
[122.0206 ]
[122.17899 ]
[122.1734 ]
[122.152626]
[122.15675 ]
[122.21184 ]
[122.24368 ]
[122.30823 ]
[122.41776 ]
[122.49144 ]
[122.484825]
[122.41258 ]
[122.26925 ]
[122.25379 ]
[122.40935 ]
[122.647255]
[123.0376 ]
[123.53991 ]
[124.15395 ]
[124.92322 ]
[125.83799 ]
[126.87965 ]
[128.1354 ]
[129.43643 ]
[130.55664 ]
[131.41574 ]
[132.1196 ]
[132.57607 ]
[132.82675 ]
[132.91206 ]
[132.89775 ]
[132.8839 ]
[132.94919 ]
[133.08257 ]
[133.28975 ]
[133.57274 ]
[133.96915 ]
[134.38637 ]
[134.75049 ]
[135.10233 ]
[135.40602 ]
[135.79541 ]
[136.27269 ]
[136.72694 ]
[137.18466 ]
[137.64066 ]
[138.08485 ]
[138.35399 ]
[138.34808 ]
[137.99863 ]
[137.35211 ]
[136.42262 ]
[135.34392 ]
[134.27187 ]
[133.34346 ]
[132.66563 ]
[132.158 ]
[131.70966 ]
[131.32217 ]
[131.04486 ]
[130.81989 ]
[130.62836 ]
[130.49905 ]
[130.46019 ]
[130.5434 ]
[130.80264 ]
[131.18602 ]
[131.63939 ]
[131.98648 ]
[132.14552 ]
[132.09966 ]
[131.94319 ]
[131.66672 ]
[131.16017 ]
[130.504 ]
[129.79497 ]
[129.08365 ]
[128.34822 ]
[127.683975]
[127.00351 ]
[126.237434]
[125.345314]
[124.375626]
[123.45163 ]
[122.76458 ]
[122.2867 ]
[121.96092 ]
[121.763504]
[121.638596]
[121.524895]
[121.461235]
[121.54154 ]
[121.81381 ]
[122.23814 ]
[122.71856 ]
[123.16286 ]
[123.4079 ]
[123.58764 ]
[123.756714]
[124.0065 ]
[124.398125]
[124.86505 ]]
Thus, we have built our prediction model using LSTM.
DOCUMENTATIONS:
Also, you may visit:
Thank you for reaching till here.
Leave a Reply