ResagratiaResagratiaResagratia
+234 806 299 1199
contact@resagratia.com

Visualizing WhatsApp Chats using Python and Power BI – Part 2

  • Home
  • Business
  • Visualizing WhatsApp Chats using Python and Power BI – Part 2
Resagratia WhatsApp Analytics

In the first part of this tutorial, we exported our Whatsapp chat data, cleaned and transformed it into a dataframe and saved our cleaned data in CSV format. Now we will import this CSV file in Power BI and derive insights by creating awesome visuals.

PREREQUISITES

  1. Power BI Desktop Application
  2. WordCloud custom visual

To install the Word Cloud custom visual, first, you have to sign in to your Microsoft Power BI account on Power BI Desktop. On the Home tab, navigate to custom visuals and click on the From Marketplace icon. Search for Wordcloud and select Add.

NOTE: If you don’t have a business email, you cannot sign in to the Power BI service and hence cannot import custom visuals from the marketplace. Alternatively, you can download the WordCloud visual here https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA104380752?tab=Overview.

Next to the From Marketplace option above, select the From File option and import your downloaded Wordcloud file from the folder where you saved it.

Learn Data Analysis with Excel

DATA MODELING

First, we import the CSV file we created from the first tutorial. Click on the Get Data icon, select Text/CSV and click connect

Navigate to your project folder where you saved the “WhatsappChat.csv” file and select it. You should see a table containing our data. Click on Transform Data. This will transfer our data directly to the Power Query editor where we can perform some data transformations.

We remove the changed type step as we will manually “change type” for each column in our table.

The first column is not useful as it will not be needed in our visualization. Right-click on the first column and select Remove.

Next, we change the type for each column. Right-click on the Date column, click the Change Type option and select Date.

Do the same for the other columns. Select Time for the Time column, Text for the Name, Content and weekday columns, Date/Time for the DateTime column and Whole Number for the Letter_Count, Word_Count and Hour columns.

Remember when we exported our chat data from Whatsapp, we selected the  WITHOUT MEDIA option, Python indicates all those instances where media files were removed with a “<Media omitted>” message.

We will create a new column named “Media” showing message type as either “media” or “text”. Navigate to the Add Column tab and click on Conditional Column.

Format the values as shown below then click OK:

A new Media column is added showing media for rows in the Content column containing “<Media omitted>” and text for non-media files.

Next, we transform the Content column to lower-case to ensure uniformity of words when creating our WordCloud visual. Right-click on the Content column, click on Transform and select lowercase.

Check the Word_Count column, you will notice that Python adds an additional word count for space at the end of each message (Word_count exceeds the actual number of words in each message by 1). We will create a new column where we will subtract 1 for each row in Word_count.

Navigate to Add Column and select Custom Column. 

Fill in the fields as shown below and select OK.

Right-click and remove the Word_Count column then rename the newly created column from Custom to Word_Count. Change the type to “Whole Number“.

Next, we will create a table to sort days in the correct order i.e Sunday to Saturday.

Navigate to the Home tab and select Enter Data.

Fill in the column and rows as shown below, set name as sort weekday and select OK.

We now have two tables: WhatsappChat and sort weekday. Select Close & Apply.

This takes us to our report page. We will create a Date Table and Measures Table to help us create our visuals seamlessly and add more dimensions to our visualization.

Learn Data Analysis with Excel

First, we will disable Auto Date/Time when loading data in the current file. Navigate to the File tab, go to Options and Settings and select Options.

In the Options window, go to Current File and select Data Load. Turn off Auto Date/Time and select OK.

Now navigate to the Modeling tab and select New Table. Insert the following DAX code in the window provided and hit ENTER:

Dates Table =
var baseCalendar = CALENDAR(MIN(WhatsappChat[Date]),MAX(WhatsappChat[Date]))
return
GENERATE(baseCalendar,
var myDate = [Date]
var Year = YEAR(myDate)
var MonthNo = MONTH(myDate)
var MonthName = FORMAT(myDate,"mmm")
var Day = DAY(myDate)
return
ROW("Year",Year,"Month No",MonthNo,"Month Name",MonthName,"Day",Day)
)

Next, click the ellipsis on the Dates Table and select Mark as a date table.

This opens the window below. Select Date and click OK.

Now select the Day column, go to Default Summarization and select Don’t summarize.

Do the same for the Month No and Year columns.

Navigate to Model and create a relationship between WhatsappChat Table and Dates Table by dragging Date in the Dates Table and dropping it on Date in the WhatsappChat Table.

Next, we will create a Measures table. Measures are calculations used in Data Analysis. Navigate to Home tab and select Enter Data:

Rename the table to Measures Table and select Load.

Now we can add Measures to the Measures Table. Right-click on the Measures Table and select New measure.

Now enter the following DAX formula in the window that pops up:

ave messages per day =
var MessageCountByDate =
  GROUPBY(WhatsappChat,'Dates Table'[Date],"Messages Count",COUNTX(CURRENTGROUP(),WhatsappChat[Content]))
return
AVERAGEX(MessageCountByDate,[Messages Count])

Now we create other Measures in the same way. Right-click, select New measure and enter the formula:

Average Letters Per Message = AVERAGE(WhatsappChat[Letter_Count])

Repeat the process and enter the formula:

Average Words Per Message = CALCULATE(AVERAGE(WhatsappChat[Word_Count]))

Repeat again and enter the formula:

count Messages = COUNT(WhatsappChat[Content])

Repeat again and enter the formula:

No of Days = DATEDIFF(MIN('Dates Table'[Date]),MAX('Dates Table'[Date]),DAY)

Repeat again and enter the formula:

sum Letters = SUM(WhatsappChat[Letter_Count])

Repeat again and enter the formula:

sum Words = SUM(WhatsappChat[Word_Count])

Delete Column1 from the Measures Table. Your Fields view should look like this:

Now we can start creating visuals. First I’ll insert a logo of RESAGRATIA (If your Whatsapp group has a logo you can insert this). On the Home tab, select Image and load the logo.

Next, I’ll insert a Text box and write some text describing my analysis. On the Home tab select Text box and format it to your preference.

Next, we will create Card visuals showing Total Active Members, No of Days, Total Messages, Total Words, and Total Letters.

Navigate to the Visualizations pane and select Card.

Now navigate to the WhatsappChat table and select Name. On the Visualization pane under the Fields icon, click the drop-down next to First Name and select Count (Distinct).

You should have this:

Now we rename the category from Count of Name to Total Active Members. Click the dropdown next to Count of Name in the Fields pane, select Rename and enter Total Active Members.

Learn Data Analysis with Excel

Navigate to the Visualizations pane and select another Card visual. On the Measures Table select No of Days. This gives us:

Insert another Card visual and select count Messages from the Measures Table.

Insert another Card visual and select sum Words from the Measures Table.

Insert another Card visual and select sum Letters from the Measures Table.

Rename accordingly as shown in the image below. Your report should look like this:


Next, we will create a Stacked bar chart visual showing Top 10 Active Members. Navigate to the Visualizations pane and select Stacked bar chart. Select Name and Content from the WhatsappChat Table. In the Fields pane, format as shown below:

Now we will filter this visual by Top 10. On the Filter pane, select the dropdown next to Name, format as shown below and click Apply. You will have to drag Content from the WhatsappChat Table to the By value tab and select Count (Distinct).

Next, we can make some edits to our visual like renaming the Title, disabling the X-axis, turning Data labels on etc.

Next, we will create a WordCloud visual using the custom visual we installed earlier. Navigate to the Visualizations pane and select WordCloud 2.0.0. Go to the WhatsApp chat Table and select Content.

Now navigate to the Format pane and under Stopwords turn Default Stopwords On. In the Words tab, enter letters and words that are not informative and relevant (enter each letter/word followed by a space and then the next letter/word). Rename Title to Most Frequently Used Words.

Next, we will create a Stacked column chart showing Messages Type. Navigate to the Visualizations pane and select Stacked column chart. Select count Messages from the Measures Table and Media from the WhatsappChat Table and in the Fields pane format as shown below:

Rename the visual to Message Type and change the colour type to your preference.

Rename the page to Overview and open a New Page by clicking the + icon.

First copy the logo and Text box from the Overview page to the new page.

Next, we will insert a Slicer of the names of members in the group. Navigate to the Visualizations pane and select Slicer. Select Name from the WhatsappChat Table.

Click the dropdown icon and select Dropdown.

I encourage you to play around with the formatting options and choose the format that suits you best. In the Format pane, turn off the Slicer header and set Title on. Rename title to Select User.

Next, we will create an Area chart that shows the trend of messages in the group. Navigate to the Visualizations pane and select Area chart. Select count Messages from the Measures Table and Date from the Dates Table and format as shown below:

In the Format pane, carry out the following transformations; Turn Y-axis off, turn off the title in X-axis, set Data colors to your preferred colour, turn Data labels on and set position to under, rename Title to Daily Messages Sent. You should have something similar to this:

Next, we will insert Card visuals. Navigate to the Visualizations pane and insert a Card visual. Select ave messages per day from the Measures Table. Just like we did earlier, rename from ave messages per day to Average Messages Per Day. In the Format pane reduce the text size of the Data labels to fit the visual.

Now we insert two more Card visuals. Insert a Card visual and select Average Words Per Message from the Measures Table. Insert another Card visual and select Average Letters Per Message from the Measures Table. You should have this:

Next, we will create a Line chart showing Total Messages by Hour of Day. Navigate to the Visualizations pane and insert a Line Chart visual. Select count Messages from the Measures Table and Hour from the WhatsappChat Table and format as shown below:

Rename Axis in the Fields pane from Hour to 24 Hour Period. In the Format pane, click the Shapes option and turn on Stepped.

For X-axis, in the Start option put 0 and in End put 23. Click Y-axis and turn off Title and turn off Gridlines. In the Title option, rename to Total Messages by Hour of the Day. You should have this:

Lastly, we will insert a Stacked column chart visual showing Most Active Day of Week by Total Messages. Go to the Data view and select the sort weekday table. Click on the weekday column, now click the Sort by Column option on the window above and select order. This ensures our visual is arranged in order starting from Sunday and ending on Saturday.

Learn Data Analysis with Excel

Now navigate to the Visualizations pane and insert a Stacked column chart visual. Select count Messages from the Measures Table and weekday from the sort weekday table and format as shown below:

In the Format pane, turn off Y-axis and turn off Title in X-axis. Turn Data label on and set position as Inside end. Now rename Title from count Messages by weekday to Most Active Day of Week by Total Messages. Rename the page to User Usage Analysis. You should have something similar to this:

Now you can publish your report to Power BI, pin it to a dashboard and share your dashboard wherever you want to.

Having completed this tutorial, you have gained amazing skills in extracting Whatsapp text data, cleaning and transforming this data to CSV format using Python and creating awesome visuals in Power BI.

Do you have any questions, suggestions, or is there any topic you would love me to create a topic on? Kindly leave a comment below.

Author: Obinna Iheanachor | LinkedIn


To find out more about what we do, subscribe to our insights below. You can reach out to us directly for training and analytics consulting needs.

Receive Fresh, Weekly Analytics Content

Join 1K+ subscribers to receive original, freshly curated content every week.

Related article – Case Study: Analytics for a Polythene Manufacturing Company

Related article – Insights Dashboards – Nigeria’s 2020 Proposed Budget

The previous article – Visualizing WhatsApp Chats using Python and Power BI