Tons and tons of data remain largely unexploited. The key to unlocking and achieving innovative feats and designing the most ingenious solutions may lie in being able to extract insights from the data.
With the availability of and accessibility to a wealth of tools and resources that can help us better understand ourselves, it’s time we leveraged such opportunities in designing better systems for ourselves. It’s time we demanded improved transparency on governance, processes, and access to data on the performance of the MDAs (ministries, departments, and agencies).
This initiative aims to provide a platform to promote public discourse and deliberation on the Nigerian budget by taking advantage of visual storytelling, data analytics, and open data.
A host of professionals – some data analysts and engineers – have contributed to this body of work. This will be the largest set of structured analytical data detailing the FGN 2020 Executive Budget Proposal. A detailed list of the contributors can be found at the end of this article.
If you need to catch up on our previous articles that covered the extraction and automation process of creating structured data from the 2000-page budget document, kindly review part 1, part 2 and part 3 of the process.
Connect to the open budget dataset
Kindly download the attached text file below with detailed links to the cloud databases.
Download the text file>>
There are five individual links to separate tables in the downloaded file. Open the text file and you will find links to:
- Total Expenditure Items
- Capital Projects
- Summary by MDAs
- Summary by sub-MDAs
- Budget Schema
The links look like this:
Copy the link for Total Expenditure Items and connect to the web data source from within Power BI or your preferred business intelligence (BI) tool. Essentially, the database is hosted on Google Sheets and we will leverage Google’s architecture to enable collaboration on this project.
Then paste the copied Expenditure link into the dialog box and click OK. Connect as anonymous to the Google Sheets as there is no need to sign in.
Next, right-click on the document and select Excel as the output.
An error will be thrown out stating that the table is not in the required format. This error is expected and it can easily be handled by replacing the last characters in the link (ods) with xlsx, the code for Excel.
If you can’t see this formula bar, go to View menu and check the Formula Bar option.
After the replacement, you should now see a list of tables contained within the Expenditure database. Each table corresponds to a single MDA and there are 48 of them, thanks to the incredible efforts of the analysts and professionals who contributed to this project.
Continue by expanding the Data column to combine all the tables into a single one. This is a neat trick here that can save you a lot of time in combining multiple files. This method also ensures any new tables are automatically included in the combine operation.
Next, we remove other unwanted columns and to focus on the 5 columns of interest.
From this point, a select number of transformations will be applied as listed below:
- Make the first row as headers
- Select the Code column and filter out/uncheck “Code”
- Add a custom column which is a list of zeroes (“0”)
- Merge the custom column with the MDA Code column. This operation adds 0 to the preceding character of the MDA Code
- Remove the previous Code column
- Extract first 4 characters from the new MDA Code column and rename as “Code”
- Ensure to change the formatting of the Item Code column to text then add a custom column to calculate the length of the Item Code column
- Filter for Item Codes with length = 8
The next sets of images detail the transformations to be performed. After making the first row as the header, ensure to remove the Changed Type transformation that is applied automatically.
Then from the Code column, uncheck “Code” from the list.
Add a custom column filled with zeroes (“0”) as text.
Merge the custom column with the MDA code and rename it as MDA code.
After merging, remove the Code column.
Next, select the recently modified MDA Code column and extract the first 4 characters, counting from the left. This will serve as the new Code column.
Change the formatting of the Item Code column to type text.
Then create a custom column to calculate the length of the Item Code. Item codes either have a length of 1, 2, 4, 6, or 8. We will then filter for Item Codes with length = 8.
Item Codes with length = 8 are the most detailed items for describing the expenditure items. Essentially, they’re the building blocks for the budget. The schema will cover the groupings and classifications of these item codes to ensure there are no duplicate entries and hence, accurate analysis can be carried out.
After filtering the item codes and removing the length column, the expenditure items data should look like this:
We just covered the connection to and transformation of the Expenditure items for the 48 MDAs in the Nigerian budget. Next up, connect to the other databases in the downloaded text file to include the Summary by MDAs, Summary by sub-MDAs and the Budget Schema. For Capital Projects, there are no Item Codes, which means fewer transformation will be needed to get the data in the most usable format.
Close and apply your transformations from Power Query into Power BI and edit your Data Model to a star schema like the one below:
This concludes the connection to the open analytics budget database. From this point, you can design some insightful and thought-provoking dashboards to bring to life the vast amounts of information contained within the budget.
Write beautiful DAX codes, create compelling narratives and visualisations that touch on practical issues within the government. This data is available to enable you vividly exercise your imaginations by exploring the intricate details of the budget. We hope to compare and corroborate this data with the final budget and the 2020 audited statements from the Office of the Auditor General of the Federation.
What does the “sharing formula” for revenue allocation mean? Can we decipher a pattern contained within the 2020 executive budget proposal? Are there details from which we can further deduce unprecedented insights?
A couple of professionals and analysts worked on the backend and automation of the data that feeds the database of MDAs. Their valued contributions, involving many hours of work, have made this project possible.
We have an upcoming event: A Visual Presentation of the FGN 2020 Proposed Budget coming up on Friday, December 6, 2019 at the Rise Labs. This event is hosted by Rise Networks.
To find out more and register, join the tribe, subscribe to our latest insights and watch this space!
Next article – A Visual Presentation of Insights – Part 1
The previous article – Breaking down the FGN executive budget – Part 3