Flair : A Python library used for Token Classification
In this tutorial, you will learn how can we do token classification (NLP-based problem) from a text using Flair Library with the help of Python Programming.
What is Token Classification?
Token classification is a natural language understanding operation that labels specific tokens in text.
The token-classification subtasks are described below :
- Named Entity Recognition(NER): Locate the entities (such as people, places, or organizations) in a sentence. This can be constructed as a descriptive label for each token by having one category for each entity and one category for “no entity.”
- Part-of-Speech-Tagging(POS): Mark each word in a sentence as part of a specific expression (like noun, verb, adjective, etc.)
- Chunking: Locate the tokens that are all part of a similar entity. This function (which can be combined with POS or NER) can be constructed as merging one label (usually B-) into any tokens at the beginning of a chunk, and another label (usually I-) to tokens within the chunk, and a third label (usually O) for non-component tokens.
- Extraction of Data from Invoices: Using Named Entity Recognition (NER) models, you may automatically extract entities of interest from invoices. Optical Character Recognition models can read invoices, and the result may be utilized to infer using NER models. Important information such as the date, firm name, and other listed entities can be retrieved in this manner.
- Named Entity Recognition: The job of NER is to identify entities named in the text. It can be the names of persons, locations, or organizations. The main task of NER is to label each token for each named entity class and the so-called “0” category for tokens that do not contain any entities. This task takes text as input and returns annotated text with named entities as output.
Introduction: Flair Library
A basic framework for cutting-edge NLP. It is created by the Humboldt University of Berlin and a group of companions.
- A strong NLP library. Flair allows you to use advanced natural language processing (NLP) models in your text, such as NER, PoS, specialized support for bioinformatics, word processing and classification, and the help of a continuous number of languages.
- A library for text embedding. Flair embeddings, BERT embeddings, and Elmo embeddings, as well as other word and document embeddings, may all be used and combined using Flair’s straightforward interfaces.
- A Python-based NLP framework. Our framework is based on PyTorch, allowing it simple to train your models and try out novel techniques using Flair embeddings and classes.
English NER in Flair
This is the big 4-class English NER model that comes with Flair.
F1-Score: 94,36 (corrected CoNLL-03)
Predicts 4 tags:
Most Flair sequence tagging models (such as part-of-speech tagging, named entity recognition, and so on) are now available on the HuggingFace model hub! You may explore models, get full information on how they were taught, and even try each one out online!
Setup and Prerequisites
Gradio3.0+ and Python 3.6+ are used in the project. Install Python 3.6 first if you don’t already have it. For Ubuntu users follow this link. Then, in your preferred virtual environment, simply perform the following:
pip install flair
pip install gradio==30.19
First, we will import all the required libraries for token classification and we will use a Gradio-Interface for inputting a sentence and then predicting the output.
from flair.data import Sentence from flair.models import SequenceTagger import gradio as gd import tensorflow as tf
Now let’s make a custom function that does the following task :
- Make a Sentence.
- Load the NER tagger.
- Run NER over Sentence.
- Print NER Tags.
def tokenclass(text): tagger=SequenceTagger.load("flair/ner-english") # Load the NER Tagger sentence = Sentence(text) # Make a Sentence res=tagger.predict(sentence) # Run NER over Sentence rs= sentence.get_spans('ner') # Print NER tag return rs
Let’s pass the input sentence to a function made earlier.
s=tokenclass("George went to washington") s
2022-06-19 06:30:41,766 loading file /root/.flair/models/ner-english/4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4 2022-06-19 06:30:43,475 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP> [Span[0:1]: "George" → PER (0.9959), Span[3:4]: "washington" → LOC (0.9162)]
So, the entities “George ” (labeled as a person) and “washington” (labeled as a location) are found in the sentence “George went to washington“.
Let’s call the function which we created earlier, in Gradio Interface to predict the NER tags. First, we will :
- Creates a textbox to render output text or number.
- Define title = “Token Classification” in Interface.
- Define description in Interface.
- Launches the webserver that serves the UI for the interface.
gd.Interface(fn=tokenclass,inputs=gd.Textbox(lines=2,placeholder="Enter text here.."),outputs="text",title="Token Classification",description="Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging.").launch(debug=True)