Word Cloud formation with a given shape with Python

Introduction

No doubt, the Word cloud is the best method for the analysis and representation of text data. But, it would be more fun and interesting if the words appear in a specific shape rather than a routine rectangular box.

The data taken to build word cloud is a hotel review data consisting of nearly 6300 reviews of customers.

https://valueml.com/wp-content/uploads/2021/05/Data.xlsx

Now start with our Python code for word cloud formation with a given shape:

import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib as mpl
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
%matplotlib inline

Data Cleaning

Remove empty reviews from the data frame. Remove all URLs and punctuation marks.

import string
df=pd.read_excel('Data.xlsx')
df[df['Review'].isna()].index
df.drop([  33,  263,  560,  877, 1034, 1091, 1114, 1150, 1217, 1244, 1300,
            1355, 1361, 1385, 1401, 1479, 1561, 1625, 1898, 2119, 2291, 2365,
            2366, 2391, 2533, 2610, 2654, 2663, 2670, 2671, 2689, 2742, 2895,
            2925, 2957, 3137, 3174, 3283, 3548, 3968, 4012, 4069, 4089, 4235,
            4351, 4386, 4399, 4797, 4862, 4878, 5136, 5171, 5254, 5255, 5317],axis=0,inplace=True)

def clean_text(text):
    '''Make text lowercase, remove text in square brackets,remove links,remove punctuation
    and remove words containing numbers.'''
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text
df['Review']=df['Review'].apply(lambda x:clean_text(x))

Data Visualisation using Word Cloud

You can use any dark background image for masking. I saved it as twitter.jpeg on the current directory.

Now let us perform the following simple steps:

#A string for analysis
reviews_text=''
for i in df['Review']:
    reviews_text=reviews_text+" "+i

mask = np.array(Image.open("twitter.jpeg"))
#open mask

wc = WordCloud(background_color="white", mask=mask) #create an object WordCloud   

wc.generate(reviews_text) #pass string

plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.show()

 

Leave a Reply

Your email address will not be published. Required fields are marked *