Multivariate time series model using LSTM in Python for put call ratio prediction

In this blog, we will understand about how to build a multivariate time series model using LSTM in Python programming. Usually, time series predictions play a major role in our day to day life and we would have at least one time dependent variable in almost all real-life datasets.

So here, we will learn about how to handle such multiple time dependent variables to predict another variable with an example. Here we will predict the put call ratio of the stocks.

Let’s get started !!

Steps for the put call ratio prediction using LSTM

https://github.com/MadhumithaSrini/put-call-ratio-time-series-dataset

  • The above link is where the data set is provided for reference where the put-call ratio of the stock for 6 days are given.
  • Let’s first take the time series data set, analyse it and then arrive at a time series prediction model for put-call ratio prediction for all the stocks on 16th august using LSTM.

 

Analysing the multivariate time series dataset and predicting using LSTM

Look at the Python code below:

#THIS IS AN EXAMPLE OF MULTIVARIATE, MULTISTEP TIME SERIES PREDICTION WITH LSTM 
#import the necessary packages
import numpy as np
import pandas as pd
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

import matplotlib.pyplot as plt
import seaborn as sns

Now, let’s import the necessary libraries for analysis, visualisation and building the model.

 

#importing the test dataset that contains only put call ratio
df = pd.read_csv("C:/Users/madhumitha/Downloads/TestDatasetPutCall_TS.csv")
df.columns

Let’s import the time series dataset that gives us the values of put-call ratio for different stocks and it is a time dependent variable.

 

#Renaming the columns
df=df.rename(columns={'Put-Call Ratio':'Aug10'})
df=df.rename(columns={'Unnamed: 2':'Aug11'})
df=df.rename(columns={'Unnamed: 3':'Aug12'})
df=df.rename(columns={'Unnamed: 4':'Aug13'})
df=df.rename(columns={'Unnamed: 5':'Aug14'})
df=df.rename(columns={'Unnamed: 6':'Aug15'})

We have changed the column names, so that it will be easy for us to understand the timestamps.

df.columns

OUTPUT:

Index(['Stock Index', 'Aug10', 'Aug11', 'Aug12', 'Aug13', 'Aug14', 'Aug15'], dtype='object')
df.head()

 

#dropping unwanted rows

df = df.drop(index=0,axis=0)
#(or) df = df.drop(df.index[2])

In the above block of code, we are dropping the stock index column which is alphanumeric and is not very important for the model.

 

#cleaning test data
df = df.replace([np.inf, -np.inf], np.nan)
df = df.dropna()

df

Let us remove the rows that contain infinity or null values in any of their columns.

 

df =df.reset_index()

Let’s reset the index from 0, so that we don’t have any ambiguity in the row index.

 

df_new = df.drop(columns = ["Stock Index", "index"])
df_new = df_new.T     #having the time series as columns and the time stamps as the row indices
df_new

Now, we will take transpose so as to make each column represent one complete time series (for each stock) and get the output for each time series, ie) for all different stocks, the put call ratio in the next time series can be predicted. Thus at each time stamp, the value of put call ratio of different stocks are the multiple time dependent inputs.

 

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
df_scaled = scaler.fit_transform(df_new.values)
df_scaled

Each row in the transpose or each array set in the above array has different features with same timestamp. Features are nothing but the time dependent variables and multiple features are to be considered for every time stamp. We have also scaled the values between 0 and 1 for better accuracy using minmaxscaler.

Building the LSTM model

 

# having the dataset as x and y . Making x into a 3-d data and y as 2-d data ie.) reshaping
X_train = []
y_train = []

n_output_steps = 1  # Number of outputs we want to predict into the future
n_input_steps = 1   # Number of past inputs that we want to use to predict the future

for i in range(n_input_steps, len(df_scaled) - n_output_steps +1):
    X_train.append(df_scaled[i - n_input_steps:i, 0:df_new.shape[1] - 1])
    y_train.append(df_scaled[i + n_output_steps - 1:i + n_output_steps, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

print('X_train shape == {}.'.format(X_train.shape))   # no.of samples, no. of time stamps, no. of features
print('y_train shape == {}.'.format(y_train.shape))   # no. of features, no. of output time steps

We have to reshape the dataset into a three dimensional input , because the lstm accepts only three dimensional input. Since we have many columns(features) and only a few rows(time steps), let’s take the time steps accordingly. Thus we have taken the input time step to be 1 so that the next prediction depends on the previous time stamp values.

Also let us take the output steps as 1 since we need only the values of put-call ratio for 16th august(one time step ahead into the future).

 

# creating the lstm model

from keras.layers import Dropout
from keras.optimizers import Adam

model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_input_steps, X_train.shape[2])))
model.add(LSTM(100, activation='relu'))
model.add(Dense(2714))             # since we need the prediction of these many stocks
model.compile(optimizer='adam', loss='mse')

Here comes the lstm model building, with two hidden layers. We need the output for each of 2714 stocks or in other words for each of 2714 different time series. Thus the output dense layer’s argument is given as 2714. Here we have used optimizer as “adam” and loss function as “mse”.

 

from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint, TensorBoard

es = EarlyStopping(monitor='val_loss', min_delta=1e-10, patience=10, verbose=1)
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, verbose=1)
mcp = ModelCheckpoint(filepath='weights.h5', monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=True)

tb = TensorBoard('logs')

history = model.fit(X_train, y_train, shuffle=True, epochs=100, callbacks=[es, rlr, mcp, tb], validation_split=0.2, verbose=1, batch_size=20)

In the above piece of code we have fit the x_train and y _train and let’s run it for 100 epochs with early stopping specifications, so that once the loss is very less, the training stops.

 

predictions_future = model.predict(X_train[-n_output_steps:])
predictions_train = model.predict(X_train[n_input_steps:])

In the above code we will predict the values inside the training set itself and also into the future.

 

print(predictions_train.shape)
predictions_train
OUTPUT: 

(4, 2714)
array([[0.02264719, 0.02422681, 0.02334069, ..., 0.0217242 , 0.02757692,
        0.02045876],
       [0.6620008 , 0.9652636 , 0.8852825 , ..., 0.79204714, 0.73258543,
        0.5228946 ],
       [0.68178076, 0.7280743 , 0.7424579 , ..., 0.59399563, 0.7865799 ,
        0.9612661 ],
       [1.3266679 , 1.7623531 , 1.378633  , ..., 1.5259216 , 1.4811774 ,
        1.1819019 ]], dtype=float32)

Now, we have got the predictions made within the training set.

 

print(predictions_future.shape)
predictions_future

We have arrived at the solution. We have predicted the values for one time step ahead into the future. Let’s see the output.

OUTPUT : 

(1, 2714) 

array([[1.3266679, 1.7623526, 1.3786325, ..., 1.5259216, 1.4811774, 1.1819018]], dtype=float32)

 

y_pred = pd.DataFrame(predictions_future)
y_pred = y_pred.T            
#y_pred = y_pred.loc[0:2414]
y_pred

We will now change the values from array into a dataframe and hence convert it into a column.

 

plt.plot(y_pred.values);

Let’s plot the future values and visualise the time series.

Result of time series prediction using lstm

Output of predicted time series

Hurray ! We have built the multivariate time series model using lstm and predicted the put call ratio for multiple stocks for one time step ahead into the future.

 

THANK YOU

Leave a Reply

Your email address will not be published. Required fields are marked *