Global Warming Prediction using Machine Learning in Python

As we all know global warming is affecting the earth and the main reason is the increase in temperature and here analyzing the increase and decrease in temperature through many aspects like major cities to developed and under developing countries where we get information about what are the main reason for temperature.

Through machine learning, predicting the future average temperature for developed and developing countries, performing data analytics on data and getting a quite brief idea about each insight.

We are performing all operations on the temperature dataset from Kaggle.

Here are some essential Python libraries and datasets

import pandas as pd
import numpy as np
import datetime as dt
import seaborn as sns
import math
import matplotlib.pyplot as plt
%matplotlib inline

city_df = pd.read_csv('B:/datasets/climate-data/GlobalLandTemperaturesByCity.csv')
country_df = pd.read_csv('B:/datasets/climate-data/GlobalLandTemperaturesByCountry.csv')
global_temp_df = pd.read_csv('B:/datasets/climate-data/GlobalTemperatures.csv')

These are some important libraries:

  • pandas library in Python use for manipulation of data.
  • Numpy library help in performing scientific manipulation on data.
  • datetime library is for assigning certain months, year or time.
  • seaborn for data visualization on a statistical basis.
  • math for performing mathematics in code.
  • matplotlib for plotting the results and variations of data.

For reading the dataset file:

we have three datasets and here assigning three variables, through pd.read_csv we can read csv file from our computer or any online source and have to give the location of it.  Link to get info about reading csv

city_df.head(5)
  • getting the data for city DataFrame and head for selecting the starting for data.
global_temp_df.shape
  • First is DataFrame variable name for the global dataset and shape for knowing how many row and columns are there.

(3192,9)

Data Analysis

city_df.Country.unique()
  • It’s for selecting the unique country, only those parameters are selected that are not repeated in the dataset.
array(['Denmark', 'Turkey', 'Kazakhstan', 'China', 'Spain', 'Germany',
       'Nigeria', 'Iran', 'Russia', 'Canada', "Côte D'Ivoire",
       'United Kingdom', 'Saudi Arabia', 'Japan', 'United States',
       'India', 'Benin', 'United Arab Emirates', 'Mexico', 'Venezuela',
       'Ghana', 'Ethiopia', 'Australia', 'Yemen', 'Indonesia', 'Morocco',
       'Pakistan', 'France', 'Libya', 'Burma', 'Brazil', 'South Africa',
       'Syria', 'Egypt', 'Algeria', 'Netherlands', 'Malaysia', 'Portugal',
       'Ecuador', 'Italy', 'Uzbekistan', 'Philippines', 'Madagascar',
       'Chile', 'Belgium', 'El Salvador', 'Romania', 'Peru', 'Colombia',
       'Tanzania', 'Tunisia', 'Turkmenistan', 'Israel', 'Eritrea',
       'Paraguay', 'Greece', 'New Zealand', 'Vietnam', 'Cameroon', 'Iraq',
       'Afghanistan', 'Argentina', 'Azerbaijan', 'Moldova', 'Mali',
       'Congo (Democratic Republic Of The)', 'Thailand',
       'Central African Republic', 'Bosnia And Herzegovina', 'Bangladesh',
       'Switzerland', 'Equatorial Guinea', 'Cuba', 'Lebanon',
       'Mozambique', 'Serbia', 'Angola', 'Somalia', 'Norway', 'Nepal',
       'Poland', 'Ukraine', 'Guinea Bissau', 'Malawi', 'Burkina Faso',
       'Slovakia', 'Congo', 'Belarus', 'Gambia', 'Czech Republic',
       'Hungary', 'Burundi', 'Zimbabwe', 'Bulgaria', 'Haiti',
       'Puerto Rico', 'Sri Lanka', 'Nicaragua', 'Zambia', 'Honduras',
       'Taiwan', 'Bolivia', 'Guinea', 'Ireland', 'Senegal', 'Latvia',
       'Qatar', 'Albania', 'Tajikistan', 'Kenya', 'Guatemala', 'Finland',
       'Sierra Leone', 'Sweden', 'Botswana', 'Guyana', 'Austria',
       'Uganda', 'Armenia', 'Dominican Republic', 'Jordan', 'Djibouti',
       'Sudan', 'Lithuania', 'Rwanda', 'Jamaica', 'Togo', 'Macedonia',
       'Cyprus', 'Gabon', 'Slovenia', 'Bahrain', 'Swaziland', 'Niger',
       'Lesotho', 'Liberia', 'Uruguay', 'Chad', 'Bahamas', 'Mauritania',
       'Panama', 'Suriname', 'Cambodia', 'Montenegro', 'Mauritius',
       'Papua New Guinea', 'Iceland', 'Croatia', 'Reunion', 'Oman',
       'Costa Rica', 'South Korea', 'Hong Kong', 'Singapore', 'Estonia',
       'Georgia', 'Mongolia', 'Laos', 'Namibia'], dtype=object)

India = city_df[city_df.Country == 'India']
India.head(5)
  • Now select column India and store it into a DataFrame named India, In the output it shows all collected data that is present under the India parameter.
mean_temp_month = India.groupby([India.index.month.rename('month'),India.City])['AverageTemperature'].mean().reset_index()
mean_temp_month.head(5)
  • mean temperature of months is a variable that store three colunms with an index, month, city name and the average temperature in the city. this will help getting all required variebles of india dataframe.
main_cities = mean_temp_month[mean_temp_month.City.isin(['Hyderabad','Bhopal','Calcutta','New Delhi','Bombay'])]
main_cities = main_cities.set_index('month')
main_cities.head(5)
  • here we are selecting the main cities of country, for example we select these 5 cities and try to figure out each aspects that effect temperature.
main_cities.plot(figsize=(12,5))
plt.title('Mean Temperature of main cities ')
plt.ylabel('Average Temperature')
plt.grid(True)
  • For plotting the average temperature and mean temperature of selected 5 cities. It displays the graph for better understanding.
main_cities = main_cities.fillna(main_cities.mean())
main_cities.tail(5)
  • Filling the null values with mean for all present values in main cities DataFrame, this will help in getting accurate results.
India_mean_temperature_yearly = India.groupby([India.index.year.rename('Year'),India.City])['AverageTemperature'].mean().reset_index()
India_mean_temperature_yearly.tail(10)
  • India_mean_temperature_yearly DataFrame make 3 column index, year, city and average temperature, here we analysis the data on year basis.
main_cities_yearly = India_mean_temperature_yearly[India_mean_temperature_yearly.City.isin(['Hyderabad','Bhopal','Calcutta','New Delhi','Bombay'])]
main_cities_yearly = main_cities_yearly.set_index('Year')
main_cities_yearly.head()
  • main_cities_yearly Similarly the DataFrame store for main cities data yearly with 5 selected cities.
main_cities_yearly.plot(color = 'yellow',figsize =(12,5))
plt.title('Yearly Mean Temperature of Top Cities')
plt.ylabel("Average Temperature")
plt.grid(True)
  • Plotting main cities yearly data with a yearly mean temperature of selected cities and the average temperature of data.
global_temp_df = global_temp_df.set_index('dt')
global_temp_df.index = pd.to_datetime(global_temp_df.index)
global_temp_df = global_temp_df.resample('A').mean()
global_temp_df.head(5)
  • Adding datetime ‘dt’ to global_temp_df dataframe and filling the NA values with mean().
x = global_temp_df.loc[:,['LandAverageTemperature']]
x.plot(figsize=(12,5))
plt.title('Global Temperature')
plt.xlabel("Year --->")
plt.ylabel("Temperature --->")
plt.grid(True)
  • Plotting global temperature and year to get info about the progress in temperature over the years.
country_df = country_df.set_index('dt')
country_df.index = pd.to_datetime(country_df.index)
country_df.head(5)
  • Here adding datetime ‘dt’ index to country_df for getting the minimum, maximum and average temperature in country wise.
country_diff = country_df.groupby([country_df.index.year.rename('year'),'Country']).AverageTemperature.mean().reset_index()
country_diff.head(5)
  • For analyzing the country data we modify the DataFrame country_diff and add index, year, country column with average temperature mean with it.
country_diff = country_diff.groupby(['Country']).AverageTemperature.agg(['max','min']).reset_index()
country_diff['diff'] = country_diff['max']-country_diff['min']
country_diff.head(5)
  • Arranging max and min values for a temperature of country_diff and the difference for both min and max.
country_temp_max = country_diff.nlargest(8, columns = 'diff')
country_temp_max
  • for plotting the maximum temperature in countries we use nlargest and that help make other column ‘diff’.
plt.figure(figsize=(12,5))
plt.title('Countries with max Temperature diff')
plt.xlabel('Country')
plt.xticks(rotation = 90)
plt.ylabel('Temperature diff')
plt.plot(country_temp_max['Country'],country_temp_max['diff'], color='r')
plt.grid(True)
  • Plotting graph with matplotlib for country with maximum temperature.
country_temp_min = country_diff.nsmallest(8, columns = 'diff')
country_temp_min
  • Similarly for minimum values of temperature in countries.
plt.figure(figsize=(12,5))
plt.title('Countries with min Temperature diff')
plt.xlabel('Country')
plt.xticks(rotation = 90)
plt.ylabel('Temperature diff')
plt.plot(country_temp_min['Country'],country_temp_min['diff'], color='b')
plt.grid(True)
  • Plotting graph with matplotlib for country with the manimun temperature.
developed = ['Norway', 'United Kingdom', 'France', 'Germany', 'Japan', 'Canada', 'Switzerland', 'United States', 'Sweden', 'South Korea', 'Australia']
developed_df=country_df[country_df.Country.isin(developed)]
developed_df=developed_df.groupby([developed_df.index.year.rename('Year'),'Country']).AverageTemperature.mean().reset_index()
developed_df.head()
  • We make a new DataFrame developed for analyzing global warming and the temperature in developed countries(11) with different aspects.
  • developed_df in this DataFrame we sort developed countries from country_df and make three column index, year, country and average temperature.
developing = ['China', 'India', 'Columbia', 'Brazil', 'Mexico', 'Indonesia', 'Philippines', 'Maldives', 'Turkey', 'South Africa', 'Libya']
developing_df=country_df[country_df.Country.isin(developing)]
developing_df=developing_df.groupby([developing_df.index.year.rename('Year'),'Country']).AverageTemperature.mean().reset_index()
developing_df.head()
  • We make a new DataFrame developed for analyzing global warming and the temperature in developed countries(11) with different aspects.
  • developed_df in this DataFrame we sort developed countries from country_df and make three column index, year, country and average temperature.
developing = ['China', 'India', 'Columbia', 'Brazil', 'Mexico', 'Indonesia', 'Philippines', 'Maldives', 'Turkey', 'South Africa', 'Libya']
developing_df=country_df[country_df.Country.isin(developing)]
developing_df=developing_df.groupby([developing_df.index.year.rename('Year'),'Country']).AverageTemperature.mean().reset_index()
developing_df.head()
  • Similarly for developing countries.
fig, axs = plt.subplots(ncols=2,figsize=(12,5))
sns.regplot(x='AverageTemperature',y='Year',fit_reg=True,data=developing_df, ax=axs[0])
axs[0].set(title = 'Developing Countries')
sns.regplot(x='AverageTemperature',y='Year',fit_reg=True,data=developed_df, ax=axs[1])
axs[1].set(title ='Developed Countries');
  • With the help of seaborn, we are plotting two graphs for developed and developing countries.

Machine Learning

developing_df = developing_df[developing_df['Year'] > 1900]

X = developing_df['Year'].values.reshape(-1,1)
Y = developing_df['AverageTemperature']
#split dataset into train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

#fit the model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#predict the test set results
y_pred_1 = regressor.predict(X_test)
y_pred_1
  • Applying Linear regression model on developing countries for predicting the average temperature in year.
  • First, select the year from 1900.
  • Training and testing is done with the help of year data for ‘x’ and AverageTemperature data for ‘y’.
  • Using scikit-learn Python library for train test splitting.
  • Using Linear Regression for predicting the average temperature year wise and fit both x and y in it.
  • Here predicting the x_test with the help of regression.
regressor.predict([[2024]])
  • Predicting the temperature in the year —> 2024.
developed_df = developed_df[developed_df['Year'] > 1900]

X = developed_df['Year'].values.reshape(-1,1)
Y = developed_df['AverageTemperature']
#split dataset into train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

#fit the model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#predict the test set results
y_pred_2 = regressor.predict(X_test)
y_pred_2
  • Applying machine learning model Linear regression on developed countries for predicting the average temperature in year.
  • First selecting the year from 1900.
  • Training and testing is done with the help of year data for ‘x’ and AverageTemperature data for ‘y’.
  • Using scikit-learn library for train test splitting.Using Linear Regression for predicting the average temperature year wise and fit both x and y in it.
  • Here predicting the x_test with the help of regression.
regressor.predict([[2024]])
  • Predicting the temperature of developed countries.

Leave a Reply

Your email address will not be published.