This Data Challenge and tutorial will focus on getting you to learn how to calculate age and age group distribution using DAX in Power BI. The goal of our data challenge series is to motivate you, our users and readers, to imbibe an attitude of continuous learning and to solve some common problems encountered in the field of data analytics and data science.
This challenge is a relatively easy one. We’ll provide a dataset containing customers’ date of birth and gender and your task will be to design a histogram to better communicate insights on the customer base.
The problem statement
The dataset provided contains attributes on customer ID, date of birth and customer gender.
Your task is use Power BI to:
- Extract the appropriate age of each customer from the date of birth
- Categorize/group the customers by their age range as thus: 18-30, 31-40, 41-50,51-60 and Over 60.
- Visualize the data using a histogram, where each bar contains the frequency of occurrence of each Age group and is also split along gender lines to show the proportion of each gender.
- Also, make sure the Age group is ordered from the lowest Age group (18 – 30) to the highest (over 60). An image to guide you has been provided below.
- Finally, identify the Age group with the highest male-female ratio.
Download the dataset
After downloading the dataset above, first, try your hands at the task before proceeding to view the step-by-step solution below.
The solution to the data challenge
Step 1 – Import data into Power BI
Import the data into Power BI query editor and ensure that your columns are in the required data type as shown below:
Ensure to format the columns as:
- Customer_ID as Numerical
- Date_Of_Birth as Date
- Gender as Text
Step 2 – Calculate Age in DAX, Power BI
Close & Apply and Import the data into Power BI
It may be straightforward to quickly apply the DATEDIFF function in DAX, however, that results in lower accuracy because the function simply calculates the difference in years and not the difference between the actual dates.
For instance, with DATEDIFF, you can create a calculated column with the DAX formula like this:
Age = DATEDIFF(Customers_Sample_Data[Date_Of_Birth],TODAY(),YEAR)
The DATEDIFF function produces the output below.
Now, if you take a look at the customer with ID = 1, the birthday is not until November 11 (today’s date is September 9, 2020) but the formula already returns the age of 40. That is erroneous.
Accurate Age = INT(YEARFRAC(Customers_Sample_Data[Date_Of_Birth],TODAY()))
Which results in the accurate output below:
If you notice, the formulas that we have used to far resemble a lot of Excel functions that you might already be familiar with. I think that’s the beauty of DAX – it’s like an assembly of Excel functions and SQL query language. Simply, it’s beautiful.
Step 3 – Use the SWITCH statement in DAX
Now we need to create the age group classification to create our histogram. We are simply binning our age column into the following groups: 18-30, 31-40, 41-50, 51-60, and over 60.
Next, we’ll create a calculated column using the SWITCH statement in DAX.
Age Group = SWITCH( TRUE(), Customers_Sample_Data[Accurate Age]<=30, "18 - 30", Customers_Sample_Data[Accurate Age]<=40, "31 - 40", Customers_Sample_Data[Accurate Age]<=50, "41 - 50", Customers_Sample_Data[Accurate Age]<=60, "51 - 60", "Over 60" )
The SWITCH statement is essentially a container for multiple IF statements. So, you can write NESTED IF syntax, similar to how it’s written in Excel, to achieve the same output as using the SWITCH statement.
The final thing is to be able to sort the Age Group according to their natural order i.e. from 18-30 to Over 60
Step 4 – Create an Index Table in Power BI
Within your report designer, create a new table to type in new values manually.
Then create a table with two(2) columns – Age Group and Index. Ensure that the Age Group values match exactly how it’s written in your formula from above i.e. 18 – 30, 31 – 40 etc. We’ll call this table the Index table.
Alternatively, you can import another table like this directly from Excel. Your choice. Importing from Excel is a more advisable option should you wish to modify your values.
After creating or importing the new table, you would have a relationship that exists between the sample Customers dataset and your newly created Index table. The relationship will be a 1-to-many relationship active on the Age Group column.
In your model view within Power BI, drag-n-drop the Age Group column from the Index table to the Age Group column in the Customers dataset to create your data model as shown below:
Sweet! You’re almost done. Kudos on making it this far.
Step 5 – Customise visuals in Power BI
Design your histogram visual by choosing a Stacked Column chart. Place the Age Group column from your Index table into the Axis field. For the values, you can use the count of records from the Customer dataset. To split and stack the chart, place the Gender column in the legend field well. Your chart fields should look like the image below.
Ensure to sort your Age Group column by the Index column like so:
Then ensure to sort the visual by the Age Group field which has already been sorted by the Index column in the Index table.
If this still sounds confusing, why not consider joining our next Analytics Masterclass cohort here.
The final output should look like this:
And that’s it. Not too difficult eh?
You got this. Catch you at the next Data Analytics Challenge. Until then, happy data sleuthing!
Check out this 3-hour crash course designed for you.
Check out our new How-To series page for video tutorials in DAX, Power BI visualizations, Power Query and Excel.
To find out more about what we do, subscribe to our insights below or you can reach out to us directly.
Related article – Visualizing WhatsApp Data using Python and Power BI