Human Activity Recognition using Smartphone Data | Python
Hello Folks! Excited about today’s tutorial on using Neural Networks for classification? Let me tell you about it.
Human Activity Recognition is predicting a person’s actions, such as sitting, standing, walking, and many more. The sensors located on the vest or smartphone analyze the activities. It was a difficult task to analyze tens or hundreds of observations before, but Deep Learning has successfully solved the problem.
So basically, Our task is to recognize the type of activity the user is doing using Python as the programming language. For this, we will use the concept of Artificial Neural Networks implemented in Python. Let’s get started…
Key Take-aways of the tutorial
- How to download and load the dataset.
- Data visualization techniques for a better understanding of data.
- Data preparation, modeling, and evaluating the model.
- Understanding the concept of Neural Networks.
Libraries/Packages to be installed beforehand
Libraries are useful to implement fundamental concepts into reality. The libraries to be installed are:
- Pandas
- Matplotlib
- Seaborn
- Sklearn
- TensorFlow
- Keras
pip install package-name
After installing we import the libraries.
About the dataset
Now, let’s gain some knowledge about the dataset. The dataset has records of acceleration and angular velocity measurements from different physical aspects in all three spatial dimensions (X, Y, Z). You can download the dataset from here. After downloading, make sure you have the dataset in your current working directory.
To start with, let’s do some Exploratory Data Analysis(EDA) of the given data.
EDA steps
First, we will import the Python libraries.
# import necessary libraries import pandas as pd import numpy as np from matplotlib import pyplot as plt import seaborn as sb %matplotlib inline
After importing libraries, load the dataset.
# load the data train = pd.read_csv("train.csv") test = pd.read_csv("test.csv") print('Train Data', train.shape,'\n', train.columns) print('\nTest Data', test.shape)
Output:
From the output, we infer that the data has 7352 observations with 563 attributes for training and 2947 observations for test set. Now we will check for null values if any.
print("Missing values:",train.isnull().values.any())
Output:
Missing values: False
We got the output as False, which means no null values present in the dataset. Then, we will see the number of unique labels in the Activity column and also inidividual label count.
# Unique Labels print('Train labels', train['Activity'].unique(), '\nTest Labels', test['Activity'].unique()) print("------------------------------") print('Train labels count', train.Activity.value_counts()) print('Test labels count', test.Activity.value_counts())
We have 6 activities in all. Let’s Plot the bar plot for value counts with the Python code given below:
sns.set(rc={'figure.figsize':(13,6)}) fig = sns.countplot(x = "Activity" , data = train) plt.xlabel("Activity") plt.ylabel("Count") plt.title("Activity Count for Train set") plt.grid(True) plt.show(fig)
Output:
Next, we will look at how many observations are recorded by each subject.
# observations recorded by each subject pd.crosstab(train.subject, train.Activity)
Output:
From the above table, we observe that the data is almost evenly distributed for all the activities among all the subjects which is good.
Now, We select subject 15 and compare the activities with mean body acceleration in 3 spatial dimensions to get more insights into the data.
sub15 = train.loc[train['subject']==15] fig = plt.figure(figsize=(32,24)) ax1 = fig.add_subplot(221) ax1 = sns.stripplot(x='Activity', y=sub15.iloc[:,0], data=sub15, jitter=True) ax2 = fig.add_subplot(222) ax2 = sns.stripplot(x='Activity', y=sub15.iloc[:,1], data=sub15, jitter=True) plt.show()
Output:
From the plot, we find the mean body acceleration is more variable for walking activities than for passive ones especially in the X direction. Now we look for max acceleration.
fig = plt.figure(figsize=(32,24)) ax1 = fig.add_subplot(221) ax1 = sns.stripplot(x='Activity', y='tBodyAcc-max()-X', data=sub15, jitter=True) ax2 = fig.add_subplot(222) ax2 = sns.stripplot(x='Activity', y='tBodyAcc-max()-Y', data=sub15, jitter=True) plt.show()
Output:
We can now see the difference in the distribution between the active and passive activities with the walkdown activity (i.e. values between 0.5 and 0.8) distinct from all others especially in the X-direction. The passive activities are indistinguishable and present no clear pattern in any direction (X, Y, Z).
Feature Scaling
This step deals with Pre-processing and data preparation to feed into Neural Network. We are using the MinMaxScaler function to pre-process the data.
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() # for training set scaler.fit(train.iloc[:,0:562]) mat_train = scaler.transform(train.iloc[:,0:562]) print(mat_train)
Output:
[[0.64429225 0.48985291 0.43354743 ... 0.79825103 0.47068654 0. ] [0.63920942 0.49179472 0.4382399 ... 0.79848665 0.47284164 0. ] [0.63982653 0.49026642 0.44326915 ... 0.79872236 0.47544109 0. ] ... [0.63669369 0.49149469 0.47748909 ... 0.84506893 0.52040559 1. ] [0.64482708 0.49057848 0.42085971 ... 0.84323381 0.51266974 1. ] [0.67575173 0.49378844 0.39806642 ... 0.84348837 0.51834742 1. ]]
scaler = MinMaxScaler() # for test set scaler.fit(test.iloc[:,0:562]) mat_test = scaler.transform(test.iloc[:,0:562]) print(mat_test)
Output:
[[0.6718788 0.55764282 0.52464834 ... 0.62209457 0.46362736 0. ] [0.69470427 0.57426358 0.42707858 ... 0.62446791 0.45014396 0. ] [0.68636345 0.55310221 0.42794829 ... 0.62380956 0.45251181 0. ] ... [0.74529355 0.64526771 0.43015674 ... 0.62088108 0.58803909 1. ] [0.65638384 0.62620241 0.44817885 ... 0.61581385 0.59135763 1. ] [0.58994885 0.56560474 0.41032069 ... 0.61537208 0.59163879 1. ]]
Our next move is to deal with the categorical variable i.e “Activity”.
# Training data temp = [] for i in train.Activity: if i == "WALKING": temp.append(0) if i == "WALKING_UPSTAIRS": temp.append(1) if i == "WALKING_DOWNSTAIRS": temp.append(2) if i == "SITTING": temp.append(3) if i == "STANDING": temp.append(4) if i == "LAYING": temp.append(5) train["n_Activity"] = temp # Test data temp = [] for i in test.Activity: if i == "WALKING": temp.append(0) if i == "WALKING_UPSTAIRS": temp.append(1) if i == "WALKING_DOWNSTAIRS": temp.append(2) if i == "SITTING": temp.append(3) if i == "STANDING": temp.append(4) if i == "LAYING": temp.append(5) test["n_Activity"] = temp train.drop(["Activity"] , axis = 1 , inplace = True) test.drop(["Activity"] , axis = 1 , inplace = True)
from keras.utils import to_categorical y_train = to_categorical(train.n_Activity , num_classes=6) y_test = to_categorical(test.n_Activity , num_classes=6) X_train = mat_train X_test = mat_test print(X_train.shape , y_train.shape) print(X_test.shape , y_test.shape)
Output:
(7352, 562) (7352, 6) (2947, 562) (2947, 6)
Neural Network Model
We are now preparing our Neural Network architecture. We can see different layers in the architecture. The dense layer creates hidden layers of neurons, using the ‘relu’ activation function. For output we can use ‘sigmoid’ or ‘softmax’ based on the output labels. Also, we will use ‘categorical_crossentropy’ for loss as there are several categories in the output. Another attribute is ‘batch_size’ which tells the number of observations to give during a single epoch. We will also take necessary callbacks of checkpoint and learning rate reducer. By adjusting the learning rate and batch_size, we can get best result.
filepath="HAR_weights.hdf5" from keras.callbacks import ReduceLROnPlateau , ModelCheckpoint lr_reduce = ReduceLROnPlateau(monitor='val_acc', factor=0.1, epsilon=0.0001, patience=1, verbose=1) checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
# importing required packages/libraries from keras.models import Sequential from keras.layers import Dense, Dropout , BatchNormalization from sklearn.model_selection import train_test_split from keras.utils import np_utils from keras.optimizers import RMSprop, Adam model = Sequential() # building the model model.add(Dense(64, input_dim=X_train.shape[1] , activation='relu')) model.add(Dense(64, activation='relu')) model.add(BatchNormalization()) model.add(Dense(128, activation='relu')) model.add(Dense(196, activation='relu')) model.add(Dense(32, activation='relu')) model.add(Dense(6, activation='sigmoid')) # compiling the model model.compile(loss='categorical_crossentropy', optimizer = Adam(lr = 0.0005), metrics=['accuracy']) print(model.summary())
Output:
# Train and fit the model history = model.fit(X_train, y_train , epochs=25 , batch_size = 256 , validation_data=(X_test, y_test))
We can see the loss and accuracy of each epoch using the line plot. The code below helps in plotting the same.
from pylab import rcParams rcParams['figure.figsize'] = 10, 4 plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy']) plt.title('model accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='upper left') plt.show() # summarize history for loss plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='upper left') plt.show()
Output:
Model Performance
We are using confusion matrix to evaluate the performance of our model. The diagonal of the confusion matrix gives insights of model performance.
from sklearn.metrics import confusion_matrix pred = model.predict(X_test) pred = np.argmax(pred,axis = 1) y_true = np.argmax(y_test,axis = 1)
cm = confusion_matrix(y_true, pred) from mlxtend.plotting import plot_confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm , figsize=(10, 5)) plt.show()
from sklearn.metrics import classification_report , accuracy_score print(classification_report(y_true, pred))
Output:
precision recall f1-score support 0 0.99 0.91 0.95 496 1 0.97 0.91 0.94 471 2 0.84 0.98 0.90 420 3 0.85 0.93 0.89 491 4 0.96 0.84 0.89 532 5 0.98 1.00 0.99 537 accuracy 0.93 2947 macro avg 0.93 0.93 0.93 2947 weighted avg 0.93 0.93 0.93 2947
Conclusion
So here we come to the end of our tutorial on Human Activity Recognition using Neural Networks. We have learnt how to implement Artificial Neural Networks for multi-class classification problems. How Neural Networks are not only limited to image datasets, today’s tutorial proves that.
Hope you liked it and would try to implement it in your projects. Also, don’t forget that you can change the values of parameters to tune the result according to your requirements.
Thank You for your time!
Leave a Reply