Anomaly Detection using PyCaret in Machine learning

Anomaly Detection is the task of identification of items or events that do not conform to an expected pattern or other items present in a dataset. It refers to problems such as bank fraud, technical glitch, or change in consumer behavior. There are three types of Anomaly detection techniques:-

  • Supervised Anomaly detection
  • Unsupervised Anomaly detection
  • Semi-supervised Anomaly detection

We will be covering unsupervised Anomaly detection

This can be implemented by using PyCaret library

PyCaret is an open-source low code end-to-end machine learning library in Python. Its primary objective is to reduce the cycle time and make data scientists more productive in their experiments. In comparison to other open-source libraries, it is simple and easy to use.

Now PyCaret supports five main models:

  1. Clustering-Group data points with similar characteristics.
  2. Anomaly Detection-Identify rare items/events which raise suspicions.
  3. Classification-Predict categorical Binary labels(1 or 0).
  4. Regression-Predict values such as Sales, Price, etc.
  5. Natural Language Processing-Discovery of hidden semantic structures in text data(topic modeling).
  6. Association Rule Mining-To find relations between variables in the database.

But, we will be focussing on Anomaly Detection technique

Let’s start with step by step Implementation

Step 1: Installing PyCaret Python Library

First, we have to install PyCaret. To do so open Anaconda prompt and type the following command-

pip install pycaret

It would take some time to install

After installing, open the Jupyter notebook.

Step 2: Importing dataset

We need to import the dataset, you can do by using Pycaret’s data repository to load data or by using pandas(It’s up to you)

from pycaret.datasets import get_data
data = get_data('anomaly') #get_data is a function used to load data

Step 3: Setting up Environment

Now its time to set up the environment by using setup() function. This function initializes and does some pre-processing tasks to configure the data and checks whether the data types are correct or not and prepares the dataset for modeling and deployment.

from pycaret.anomaly import *  #pycaret.anomaly is a library for anomaly detection technique
anomaly= setup(data, normalize = True)

Step 4: Train the model

Now, we need to train the anomaly detector, it is done by creating a create_model function.

knn = create_model('knn')
print(knn)
iforest = create_model('iforest')
print(iforest)

Step 5: Test the model

We need to analyze this trained model. This function takes a trained model object and returns a plot on the dataset passed during the setup stage.

plot_model(knn,plot='tsne')

Output:

 

plot_model(knn,plot='umap')

Output:

Step 5:Assign Anomaly labels

At last, we need to assign Anomaly labels to the dataset.assign_model function flags each of the data points in the dataset passed during the setup stage as either outlier or inlier (1=outlier, 0= inlier) using the trained model. The higher the outlier scores are, the more abnormal.

results=assign_model(knn)
results.head()  #Returns the first n rouws of data

Output:

As we observe that the “label” column is showing (1  or 0) that is inlier or outlier. Hence Anomaly can be detected using this algorithm.

Leave a Reply

Your email address will not be published.