Word Cloud formation with a given shape with Python
Introduction
No doubt, the Word cloud is the best method for the analysis and representation of text data. But, it would be more fun and interesting if the words appear in a specific shape rather than a routine rectangular box.
The data taken to build word cloud is a hotel review data consisting of nearly 6300 reviews of customers.
https://valueml.com/wp-content/uploads/2021/05/Data.xlsx
Now start with our Python code for word cloud formation with a given shape:
import re import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import matplotlib as mpl from PIL import Image from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator %matplotlib inline
Data Cleaning
Remove empty reviews from the data frame. Remove all URLs and punctuation marks.
import string df=pd.read_excel('Data.xlsx') df[df['Review'].isna()].index df.drop([ 33, 263, 560, 877, 1034, 1091, 1114, 1150, 1217, 1244, 1300, 1355, 1361, 1385, 1401, 1479, 1561, 1625, 1898, 2119, 2291, 2365, 2366, 2391, 2533, 2610, 2654, 2663, 2670, 2671, 2689, 2742, 2895, 2925, 2957, 3137, 3174, 3283, 3548, 3968, 4012, 4069, 4089, 4235, 4351, 4386, 4399, 4797, 4862, 4878, 5136, 5171, 5254, 5255, 5317],axis=0,inplace=True) def clean_text(text): '''Make text lowercase, remove text in square brackets,remove links,remove punctuation and remove words containing numbers.''' text = str(text).lower() text = re.sub('\[.*?\]', '', text) text = re.sub('https?://\S+|www\.\S+', '', text) text = re.sub('<.*?>+', '', text) text = re.sub('[%s]' % re.escape(string.punctuation), '', text) text = re.sub('\n', '', text) text = re.sub('\w*\d\w*', '', text) return text df['Review']=df['Review'].apply(lambda x:clean_text(x))
Data Visualisation using Word Cloud
You can use any dark background image for masking. I saved it as twitter.jpeg on the current directory.
Now let us perform the following simple steps:
#A string for analysis reviews_text='' for i in df['Review']: reviews_text=reviews_text+" "+i mask = np.array(Image.open("twitter.jpeg")) #open mask wc = WordCloud(background_color="white", mask=mask) #create an object WordCloud wc.generate(reviews_text) #pass string plt.imshow(wc, interpolation='bilinear') plt.axis("off") plt.show()
Leave a Reply