Gold Price Prediction using Random Forests in Python

In this tutorial, we will discuss gold price prediction using Random Forests in Python with the help of scikit-learn machine learning module.

We can predict the gold price using the gold data set from 2008 to 2018.

->U can download the data set from here gld_price_data.

To analyze the rates of the gold by using graphs and heat maps.

So let’s try to implement the code

First, of all, we need to check the panda’s library in the respective ide if it is not installed we have to install it and we have to import the library as PD.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics

For further information about libraries, you can study from here,

For metrics.

Load the data set

#load the data csv
gold_data=pd.read_csv("C:\\Users\\users name in system\\OneDrive\\Desktop\\gld_price_data.csv")

To check whether the data set is loaded or not print the first five rows of the data set and the last five rows of the data set by using the data frame.

#print the data frame
gold_data.head()
#GLD=gold price we are going to predcit the gld value

OUTPUT:-

->To check the last five frames.

gold_data.tail()
#out of 6 columns we are predicting the Gld column

OUTPUT:-

->To check the rows and columns in the given data set.

gold_data.shape

OUTPUT:-

(2290, 6)

->Now let’s check the basic information for the data.

gold_data.info()

OUTPUT:-

->Here no null represents no null values present in the given data set.

-> Let’s check the missing values in the given data.

gold_data.isnull().sum()

OUTPUT:-

-> Let’s generate some statistical values for the data like mean-variance and max.

gold_data.describe()

OUTPUT:-

->25 percent means 25 percent of the values less than 1239.

we can get the correlation values.

correlation=gold_data.corr()

->Now we will construct a heatmap to understand the correlation values.

#constrcuting the heat map to understand the correlation
plt.figure(figsize=(10,10))
sns.heatmap(correlation,cbar=True,square=True,fmt='.1f',annot=True,annot_kws={'size':10},cmap='Greens')

OUTPUT:-

In the heat map, a negative correlation has negative values and a positive correlation has positive values.
For example, if no value decreases then the spx value increases this is about the corelation heat map or corelation statistics
-> let’s see the correlation values of gold.

print(correlation['GLD'])
#only silver is positively correlated

OUTPUT:-

->Gold price distribution.

sns.displot(gold_data['GLD'],color='green')

OUTPUT:-

Split

X=gold_data.drop(['Date','GLD'],axis=1)#when we are dropping the column we need to keed axis is 1 for row its 0
Y=gold_data['GLD']
print(X)

OUTPUT:-

print(Y)

output:-

->Now we will split the data into testing data and training data.

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=2)
#Model training
#random forest regressor
regressor=RandomForestRegressor(n_estimators=100)
regressor.fit(X_train,Y_train) 
#both x tarin and y train represents the data but x train represents all the features and y train represents the gold prices which are yh x train

->Model evaluation.

#model evaluation
#prediction on test data 
#create a model on test data
test_data_prediction=regressor.predict(X_test)
print(test_data_prediction)
#these are the values predicted by the model

OUTPUT:-

#Now we need to compare the predicted values with the actual values thats why we are using the metrics
#R squared error
error_score = metrics.r2_score(Y_test,test_data_prediction)
print("R squared error : ",error_score)

OUTPUT:-

R squared error :  0.9891268309653707
#compare the actual values and predicted values
Y_test=list(Y_test)
plt.plot(Y_test,color='blue',label='Actual value')
plt.plot(test_data_prediction,color='green',label='Predicted value')
plt.title('Actual price vs predicted price')
plt.xlabel('Number of values')
plt.ylabel('Gold price')
plt.legend()
plt.show()

Generated OUTPUT:-

#actual are slightly mode than predicted because of that .98 error 
#for actual value
plt.plot(Y_test,color='blue',label='Actual value')
plt.title('Actual price vs predicted price')
plt.xlabel('Number of values')
plt.ylabel('Gold price')
plt.legend()
plt.show()

OUTPUT:-

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *