Anomaly Detection using PyCaret in Machine learning
Anomaly Detection is the task of identification of items or events that do not conform to an expected pattern or other items present in a dataset. It refers to problems such as bank fraud, technical glitch, or change in consumer behavior. There are three types of Anomaly detection techniques:-
- Supervised Anomaly detection
- Unsupervised Anomaly detection
- Semi-supervised Anomaly detection
We will be covering unsupervised Anomaly detection
This can be implemented by using PyCaret library
PyCaret is an open-source low code end-to-end machine learning library in Python. Its primary objective is to reduce the cycle time and make data scientists more productive in their experiments. In comparison to other open-source libraries, it is simple and easy to use.
Now PyCaret supports five main models:
- Clustering-Group data points with similar characteristics.
- Anomaly Detection-Identify rare items/events which raise suspicions.
- Classification-Predict categorical Binary labels(1 or 0).
- Regression-Predict values such as Sales, Price, etc.
- Natural Language Processing-Discovery of hidden semantic structures in text data(topic modeling).
- Association Rule Mining-To find relations between variables in the database.
But, we will be focussing on Anomaly Detection technique
Let’s start with step by step Implementation
Step 1: Installing PyCaret Python Library
First, we have to install PyCaret. To do so open Anaconda prompt and type the following command-
pip install pycaret
It would take some time to install
After installing, open the Jupyter notebook.
Step 2: Importing dataset
We need to import the dataset, you can do by using Pycaret’s data repository to load data or by using pandas(It’s up to you)
from pycaret.datasets import get_data data = get_data('anomaly') #get_data is a function used to load data
Step 3: Setting up Environment
Now its time to set up the environment by using setup() function. This function initializes and does some pre-processing tasks to configure the data and checks whether the data types are correct or not and prepares the dataset for modeling and deployment.
from pycaret.anomaly import * #pycaret.anomaly is a library for anomaly detection technique anomaly= setup(data, normalize = True)
Step 4: Train the model
Now, we need to train the anomaly detector, it is done by creating a create_model function.
knn = create_model('knn') print(knn) iforest = create_model('iforest') print(iforest)
Step 5: Test the model
We need to analyze this trained model. This function takes a trained model object and returns a plot on the dataset passed during the setup stage.
Step 5:Assign Anomaly labels
At last, we need to assign Anomaly labels to the dataset.assign_model function flags each of the data points in the dataset passed during the setup stage as either outlier or inlier (1=outlier, 0= inlier) using the trained model. The higher the outlier scores are, the more abnormal.
results=assign_model(knn) results.head() #Returns the first n rouws of data
As we observe that the “label” column is showing (1 or 0) that is inlier or outlier. Hence Anomaly can be detected using this algorithm.