Instructor: Benny Ifeanyi (LinkedIn)
Power Query is a data connection technology and ETL (Extract, Transform, and Load) tool that is used to simplify the process of connecting to, shaping, transforming, and combining data from various sources. With Power Query, you can connect to various data sources, including databases, web services, and text files. Once connected, you can use the intuitive user interface to shape and clean the data, such as filtering and removing duplicates, and then load the data into your workbook or data model.
Power Query also offers advanced features such as merging and aggregating data from multiple sources, creating calculated columns, and transforming data using formulas. It is also a data cleaning tool that help to handle dirty data by removing missing key elements, extra characters, poorly compiled and formatted data, poor data entries, duplicates or miss-spelling, data type error and invalid data and transforming it into an organized data for further analysis.
Additionally, you can save your data transformation steps as a query, making it easy to reuse the steps in the future. This can in turn help to save your time and effort in working with data, especially when dealing with large, complex data sets from multiple sources.
It is worthy of note that Power Query was first introduced as a feature in Microsoft Excel 2010 and can now be found in the following application for usage:
For the purpose of this project, the dataset used is found here
Central to this tutorial, I will be cleaning this dataset and transforming it to the clean data showed in the image below.
Dirty Data
Clean Data
To clean and transform the data and get the clean format above, we will go through the following process
In Power Query, transposing data refers to the process of rotating a table or range of data so that the rows become columns and the columns become rows. This can be useful when the original data is not in the desired orientation for analysis or visualization.
To transpose this data, click the transform tab and then the circled transpose
There is a pattern in column 1 with respect to the different segment in the dataset. So I will fill down to get the values in place.
The segment column (column 1) is then filled downward to get this:
Note: Unlike other tools were you can undo easily, power query as an applied steps tab on the bottom right corner where you can undo specific steps carried out on a data. You can also move a step back to see previous changes with this feature.
Click the Home tab, then click on the Use first row as Headers
This automatically replace the existing column header with the actual column header as shown in the clean data
This Filter menu pops up which allows me to filter with text that doesn’t contain total.
Note: In Power Query, unpivoting other columns means converting multiple columns in a table from a wide format to a long format, where the column headers become rows in a single column, and their corresponding values are stacked in another column. This transformation is useful when you have a table with multiple columns containing similar types of data, and you want to combine them into a single column for further analysis.
So I will select the segment and Ship mode column and unpivot other columns
Note: Then the column other columns are converted into long format as all the order_ID are transformed into a single column named "attribute" (order_ID) and the corresponding values in another column named “values” (Cost)
Note: Power Query uses a functional programming language called "M" (also known as "Power Query Formula Language") to perform data transformations. M is used to write custom functions, create queries, and manipulate data within Power Query.
Creating a function in Power Query involves defining a reusable set of data transformations that can be applied to similar data structures or data sources. This allows you to write a set of transformations once and then apply them to multiple tables or files with similar data structures.
To create a function in Power Query, you can use the "view" tab and select "Advanced Editor", you can define the function using the M language or take existing function from a previous transformation. The function can take input parameters, such as the table or file to be transformed, and return a transformed table.
To create this function, follow the following steps:
1. Remove the first two lines of code after the let function. This is because this code is used to import the dataset from a folder and can be limiting for data from a different source.
2. Change the data source name and note that Power Query Language is case sensitive. I will call mine (Cleaned) for the purpose of this tutorial.
The final code will look like this.
You will get a view that looks like this:
Once the function is defined, you can apply it to similar data structures by calling it from other queries or using it in the "Invoke Custom Function" feature. This can save time and effort when working with large datasets or multiple data sources with similar structures.
You can get data from GitHub into Power Query by using the "From Web" feature in Power Query. Here are the steps:
This is the link to the GitHub Repository I will be getting the data from here
Note: This is a view of the data from GitHub that I will be transforming in Power Query
Note: Click on Raw and copy the link which will be used to get the data from web.
To start to this follow the steps below from your Microsoft Excel Sofware:
1. Open Power Query by clicking on the "Data" tab in Excel and then selecting "Get Data" > "From Other Sources" > "From Web".
2. In the "From Web" dialog box, enter the URL of the GitHub repository you want to get data from.
3. . Click OK and wait for the "Navigator" dialog box to appear.
4. In the "Navigator" dialog box, select the table or file you want to import and click "Transform" to import the data into Power Query.
5. So this is the dataset successfully imported into Power Query
Note: Select “Space as Delimiter” and split at “Right-most Delimiter”. Perform the same operation for the other split columns but for date, split at “Left-most Delimiter”. Then rename the columns appropriately.
I will be combining the data in this folder with Power Query in Power BI in a bid to use them for further analysis and visualization.
Note: This can be manually done by opening and appending the excel files at the end of one another but I will walk us through this can be done using Power Query in Power BI.
Let’s get started…
Note: Instead of just load and transform button in the navigator, you will get an additional button to combine since you have multiple data. Then click on Combine and transform.
Note: This dataset doesn’t need any form of transformation, so will just load into my Power BI Desktop
Power Query is a powerful data transformation tool that is available in Microsoft Excel, Power BI, and other Microsoft applications. It allows users to easily extract, transform, and load data from various sources, including databases, files, and web pages.
Power Query is a valuable tool for data analysts, business users, and anyone who needs to work with data on a regular basis. Its ease of use, flexibility, and powerful features make it a must-have tool for anyone who needs to perform data transformations and analysis.
If you want to get started with data analytics and looking to improving your skills, you can check out our Learning Track
Empowering individuals and businesses with the tools to harness data, drive innovation, and achieve excellence in a digital world.
Copyright 2025Resagratia. All Rights Reserved.