Human Activity Recognition using Machine Learning in Python

Human activity detection is a way of anticipating what a person is doing based on a given sequence of the person’s actions and surrounding variables. Several societal applications, such as smart surveillance, visual search/retrieval, smart robotics, and other tracking systems, face the difficulty of activity recognition.

In this tutorial, we will learn how to use OpenCV Python library to develop a human activity recognition model for predicting what a person seems to be doing.


  • Python IDE(Jupyter Notebook or your preferred IDE)
  • RestNet-34 Model(action_recognition_kinetics.txt and resnet-34_kinetics.onnx)


So, let’s get this tutorial started….

First, make sure you have all of the required Python libraries loaded on your system, and then import all of them as shown below.

import numpy as np
import cv2
from collections import deque

Include important paths and constants in a class named parameter and initialize the instance after importing the required libraries.

class Parameters:
    def __init__(self):
        self.CLASSES = open("model/action_recognition_kinetics.txt"
        self.ACTION_RESNET = 'model/resnet-34_kinetics.onnx'
        self.VIDEO_PATH = "test_vid.mp4"
        self.SAMPLE_DURATION = 16
        self.SAMPLE_SIZE = 112

param = Parameters()

In a double-ended queue, we’re storing our frames captured over time, and old frames will pop out. The human activity recognition model is then loaded, and the video in mp4 format is assessed; if no video is available, the webcam is turned on.

captures = deque(maxlen=param.SAMPLE_DURATION)
net = cv2.dnn.readNet(model=param.ACTION_RESNET)
vs = cv2.VideoCapture(param.VIDEO_PATH if param.VIDEO_PATH else 0)

Now we’re reading the capture and looping through the video file. We resized the frame and put it in our de queue, which will keep processing until it’s full. After our captures array is filled, we can create an image blob. We’ll use SAMPLE SIZE as the height and width to change the captured frame. Modify the photo blob so that it may be used as a feed for OpenCV’s already-trained Human Action Recognition Model. After indexing the maximum probability, show the expected activity.

while True:
    (grabbed, capture) =

    if not grabbed:
        print("[INFO] no capture read from stream - exiting")

    capture = cv2.resize(capture, dsize=(550, 400))

    if len(captures) < param.SAMPLE_DURATION:

    imageBlob = cv2.dnn.blobFromImages(captures, 1.0,
                                       (114.7748, 107.7354, 99.4750),
                                       swapRB=True, crop=True)

    imageBlob = np.transpose(imageBlob, (1, 0, 2, 3))
    imageBlob = np.expand_dims(imageBlob, axis=0)
    outputs = net.forward()
    label = param.CLASSES[np.argmax(outputs)]
    cv2.rectangle(capture, (0, 0), (300, 40), (255, 255, 255), -1)
    cv2.putText(capture, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX,
                0.8, (0, 0, 0), 2)
    cv2.imshow("Human Activity Recognition", capture)
    key = cv2.waitKey(1) & 0xFF
    if key == ord("s"):

It’s all finished, yay! Let us have a look at the results.

So, we’ve successfully learned how to create an action recognition model in this tutorial. I hope you all enjoyed this tutorial.

Leave a Reply

Your email address will not be published.