Multivariate time series model using LSTM in Python for put call ratio prediction
In this blog, we will understand about how to build a multivariate time series model using LSTM in Python programming. Usually, time series predictions play a major role in our day to day life and we would have at least one time dependent variable in almost all real-life datasets.
So here, we will learn about how to handle such multiple time dependent variables to predict another variable with an example. Here we will predict the put call ratio of the stocks.
Let’s get started !!
Steps for the put call ratio prediction using LSTM
https://github.com/MadhumithaSrini/put-call-ratio-time-series-dataset
- The above link is where the data set is provided for reference where the put-call ratio of the stock for 6 days are given.
- Let’s first take the time series data set, analyse it and then arrive at a time series prediction model for put-call ratio prediction for all the stocks on 16th august using LSTM.
Analysing the multivariate time series dataset and predicting using LSTM
Look at the Python code below:
#THIS IS AN EXAMPLE OF MULTIVARIATE, MULTISTEP TIME SERIES PREDICTION WITH LSTM #import the necessary packages import numpy as np import pandas as pd from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense import matplotlib.pyplot as plt import seaborn as sns
Now, let’s import the necessary libraries for analysis, visualisation and building the model.
#importing the test dataset that contains only put call ratio df = pd.read_csv("C:/Users/madhumitha/Downloads/TestDatasetPutCall_TS.csv") df.columns
Let’s import the time series dataset that gives us the values of put-call ratio for different stocks and it is a time dependent variable.
#Renaming the columns df=df.rename(columns={'Put-Call Ratio':'Aug10'}) df=df.rename(columns={'Unnamed: 2':'Aug11'}) df=df.rename(columns={'Unnamed: 3':'Aug12'}) df=df.rename(columns={'Unnamed: 4':'Aug13'}) df=df.rename(columns={'Unnamed: 5':'Aug14'}) df=df.rename(columns={'Unnamed: 6':'Aug15'})
We have changed the column names, so that it will be easy for us to understand the timestamps.
df.columns OUTPUT: Index(['Stock Index', 'Aug10', 'Aug11', 'Aug12', 'Aug13', 'Aug14', 'Aug15'], dtype='object') df.head()
#dropping unwanted rows df = df.drop(index=0,axis=0) #(or) df = df.drop(df.index[2])
In the above block of code, we are dropping the stock index column which is alphanumeric and is not very important for the model.
#cleaning test data df = df.replace([np.inf, -np.inf], np.nan) df = df.dropna() df
Let us remove the rows that contain infinity or null values in any of their columns.
df =df.reset_index()
Let’s reset the index from 0, so that we don’t have any ambiguity in the row index.
df_new = df.drop(columns = ["Stock Index", "index"]) df_new = df_new.T #having the time series as columns and the time stamps as the row indices df_new
Now, we will take transpose so as to make each column represent one complete time series (for each stock) and get the output for each time series, ie) for all different stocks, the put call ratio in the next time series can be predicted. Thus at each time stamp, the value of put call ratio of different stocks are the multiple time dependent inputs.
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1)) df_scaled = scaler.fit_transform(df_new.values) df_scaled
Each row in the transpose or each array set in the above array has different features with same timestamp. Features are nothing but the time dependent variables and multiple features are to be considered for every time stamp. We have also scaled the values between 0 and 1 for better accuracy using minmaxscaler.
Building the LSTM model
# having the dataset as x and y . Making x into a 3-d data and y as 2-d data ie.) reshaping X_train = [] y_train = [] n_output_steps = 1 # Number of outputs we want to predict into the future n_input_steps = 1 # Number of past inputs that we want to use to predict the future for i in range(n_input_steps, len(df_scaled) - n_output_steps +1): X_train.append(df_scaled[i - n_input_steps:i, 0:df_new.shape[1] - 1]) y_train.append(df_scaled[i + n_output_steps - 1:i + n_output_steps, 0]) X_train, y_train = np.array(X_train), np.array(y_train) print('X_train shape == {}.'.format(X_train.shape)) # no.of samples, no. of time stamps, no. of features print('y_train shape == {}.'.format(y_train.shape)) # no. of features, no. of output time steps
We have to reshape the dataset into a three dimensional input , because the lstm accepts only three dimensional input. Since we have many columns(features) and only a few rows(time steps), let’s take the time steps accordingly. Thus we have taken the input time step to be 1 so that the next prediction depends on the previous time stamp values.
Also let us take the output steps as 1 since we need only the values of put-call ratio for 16th august(one time step ahead into the future).
# creating the lstm model from keras.layers import Dropout from keras.optimizers import Adam model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_input_steps, X_train.shape[2]))) model.add(LSTM(100, activation='relu')) model.add(Dense(2714)) # since we need the prediction of these many stocks model.compile(optimizer='adam', loss='mse')
Here comes the lstm model building, with two hidden layers. We need the output for each of 2714 stocks or in other words for each of 2714 different time series. Thus the output dense layer’s argument is given as 2714. Here we have used optimizer as “adam” and loss function as “mse”.
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint, TensorBoard es = EarlyStopping(monitor='val_loss', min_delta=1e-10, patience=10, verbose=1) rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, verbose=1) mcp = ModelCheckpoint(filepath='weights.h5', monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=True) tb = TensorBoard('logs') history = model.fit(X_train, y_train, shuffle=True, epochs=100, callbacks=[es, rlr, mcp, tb], validation_split=0.2, verbose=1, batch_size=20)
In the above piece of code we have fit the x_train and y _train and let’s run it for 100 epochs with early stopping specifications, so that once the loss is very less, the training stops.
predictions_future = model.predict(X_train[-n_output_steps:]) predictions_train = model.predict(X_train[n_input_steps:])
In the above code we will predict the values inside the training set itself and also into the future.
print(predictions_train.shape) predictions_train
OUTPUT: (4, 2714) array([[0.02264719, 0.02422681, 0.02334069, ..., 0.0217242 , 0.02757692, 0.02045876], [0.6620008 , 0.9652636 , 0.8852825 , ..., 0.79204714, 0.73258543, 0.5228946 ], [0.68178076, 0.7280743 , 0.7424579 , ..., 0.59399563, 0.7865799 , 0.9612661 ], [1.3266679 , 1.7623531 , 1.378633 , ..., 1.5259216 , 1.4811774 , 1.1819019 ]], dtype=float32)
Now, we have got the predictions made within the training set.
print(predictions_future.shape) predictions_future
We have arrived at the solution. We have predicted the values for one time step ahead into the future. Let’s see the output.
OUTPUT : (1, 2714) array([[1.3266679, 1.7623526, 1.3786325, ..., 1.5259216, 1.4811774, 1.1819018]], dtype=float32)
y_pred = pd.DataFrame(predictions_future) y_pred = y_pred.T #y_pred = y_pred.loc[0:2414] y_pred
We will now change the values from array into a dataframe and hence convert it into a column.
plt.plot(y_pred.values);
Let’s plot the future values and visualise the time series.

Output of predicted time series
Hurray ! We have built the multivariate time series model using lstm and predicted the put call ratio for multiple stocks for one time step ahead into the future.
THANK YOU
Leave a Reply