Pagination in Power Query (Part 2)
This post is the 4th in a “mini-series” where I’m documenting how I’m working with some different APIs for custom reporting. Today’s post is about ClickUp, an app I use for managing projects, but like post #2 about Harvest, it’s less about ClickUp then it is about iterating over an unknown number of pages of results.
The specific nuance here is ClickUp returns a list every time, it does not return null when you hit a page with no records. There are several examples of List.Generate that are great, but they all appear to assume the condition for iterating is to stop once nulls are reached which was not applicable in my case.
Bottom line: if only everyone providing an API would tell you how many pages of data you have, pagination would be SO MUCH SIMPLER!
The previous posts in this series are:
- Connecting to Harvest in Power Query
- Pagination in Power Query (Part 1 - dealing with a known number of pages)
- Connecting to ClickUp in Power Query
In this post, I will describe the following items:
- Creating a function that accepts a Page Number to loop through
- Using List.Generate to loop through the pages similar to an If/Then or Do/While loop
- Transforming the results into data
- How to validate the results
Background
In this post, I’m not covering “when to use pagination” as I covered that in the Harvest post that mirrors this one, post #2 above. Instead I’m jumping right into the details.
The short version of the starting point is ClickUp’s API returns virtually identical results no matter what endpoint you are using, one that does not give you any clue as to how many records are in it. Most notably, it doesn’t return a null if there are no results, which is problematic since it does fit the scenarios I’ve read/watched that trigger based on “once you get a null result, stop the loop”. That didn’t work for this situation.
At the core, what we need to do here is get a page of results, check how many records are in it, and if there are more than 0 records, repeat.
I found some excellent examples that helped me ultimately guide me in the right direction but as a relative newbie in Power BI, I still struggled to figure out how to make this work for me in this situation. The last section of this is more about the “how did I figure it out” which I’m hoping the thought process and troubleshooting I did may help someone else.
One video from “The Power User”: Pagination and DO/WHILE in Power BI / Power Query - YouTube was the most helpful resource I found. They reference a blog from Matt Masson that seemed to be exactly what I was hoping to find (based on the title of the blog post) but it appears his site has a WordPress issue that prevents the content from loading. In the end, I pieced things together based mostly on the techniques in the video, but it took a while!
Creating a function to get the page results
The first step was to create a function that would accept a page number input, that would call the necessary endpoint and return that page’s results. The code for that is this, for the “task” endpoint (see the text in RelativePath):
|
|
PS if you’re new to functions, this can be copied and pasted into a blank query in Advanced Editor. When saved, it becomes a function, and has an fx icon to visually differentiate it from a query. I named this function “GetTaskPages”. The function name is used in the calls from other queries so making it meaningful is useful later.
The function itself starts with naming the parameter (“PageNo”), and the “=>” is the syntax to indicate it is a function. The rest of this should look familiar, it follows the same pattern I’ve used in the previous posts of using Web.Contents with RelativePath, and Query to reach the API. The line “Headers=” is new. (I covered that a bit in the previous post (post #3 of the series) and won’t repeat it all here).
Using List.Generate to loop through the pages
This part you will need to excuse my explanation if I am not describing this 100% perfectly. I think I understand it now but then again, I’m still learning so how I think this behaves might not be technically accurate. Bear with me… :)
The base code for the query to loop through the pages is something like this. The results after this would be a list of pages similar to post #2 with Harvest, where from that point onwards, you can convert it to a table, expand the column etc. to continue with transformation steps as you see fit.
|
|
Code overview
The code itself does not resemble looping like I’m used to seeing it (with a “For X / Next X” kind of structure). It is a query with a function inside it, identified by the “=>” after List.Generate(().
List.Generate has 4 elements to it:
- The initial value
- In my case there are 2 pieces - (a) telling it the Source is the contents of calling my function, with page 0 as the parameter to pass in; and (b) creating a variable called Page and setting its initial value at 0.
- Side note: ClickUp’s page numbering starts at 0 not 1, hence the 0. If I had to do this with Harvest’s API, I would be putting a 1 in both places instead since that is the start of their page numbering.
- The condition to continue looping
- THIS is the part I could not figure out initially. This was often a “while not null” type of statement.
- In my case, as long as the list returned has 1 or more records in it, it was a valid page of records, so this is counting the records in the returned list.
- However, since the result returned was 1 row every time with 1 list value, I struggled to figure this syntax out and that’s where I’ll explain more at the end of this post.
- The increment portion of the loop (kind of like the “Next X” part)
- This gets the next value using Page+1 and increments the variable “Page” by 1
- The 4th part is optional and it’s the selector of what to return.
- This is simply returning the results of the query
Transforming the results into data
Here is what the results look like for my task list: 3 pages of results in all. My next question is: how can I verify the results are correct? With Harvest, I knew to expect 9 pages and 874 records so it was easy to validate. Here I wasn’t sure how to check if it was working correctly, more on that later in the post.
In the meantime, with this, transforming this to a table of data is similar to how I did this on the previous blogs: convert this result to a table, expand “Column1” from records to list (i.e., the page results), then expand the list to new rows, so there is 1 row per record (per task).
How to validate the results
Here is the “explanation” part of the blog which you can skip if everything above makes sense. As I mention earlier, all other examples I read or watched appear to be either very complex or based on eventually getting “null” as a result.
Where my thought process started was knowing how to do pagination like I did for Harvest, the “problem” I thought I needed to solve was finding out how many pages were in the query.
I was wrong.
I’m not looping through a page count; what is needed is to loop through until a certain criteria is met. In the video I link to, the criteria is as long as the result is not null, keep going. In my scenario, as long as the page has 1 or more records, keep going. I was already looping through all the pages with the List.Generate function so the # of pages in the end is irrelevant.
So, breaking down the List.Generate syntax itself, line 1 and 3 make perfect sense to me. Paraphrased, they are “start with getting the first page of results” and “increment the page number and get the next page”. Got it.
The second part of List.Generate was where I was stuck. Here is how I eventually figured out what to do and what the syntax needed to be. I knew (now) it needed to be a “count” based function but I was stuck on the fact that every time I ran the query it returns 1 result: a list. Wouldn’t a “count” return a value of 1 each time? Don’t I need to drill down into the list to count the records first? This is the light bulb moment or it was for me… short answer: NO, you don’t need to drill into the list to count the records.
The next challenge was “what” am I counting exactly? I could not get the right syntax of what is in List.Count initially. This took a bit of trial and error.
First I called the function GetTaskPages via a blank query with something like the code below. Just call the function, nothing else. Start with the first page (0 in my case, 1 in some cases depending on the API).
|
|
Next, view the results of the query. It’s a single row list referred to ask “tasks” (where the arrow is pointing). This is the key for the count formula later so watch for this value in your data if you need to use this technique.
Now drill into the “List” to see if records are returned and how many (for validation). Undo that step and change the page number parameter value to 1, then 2, etc. then use something you know has no data. I used page 99, I could have used page 3. Every result looked like this, even if the list was empty.
Each time, I changed the page number then I drilled down and counted the records to know what to expect overall. For page “0”, I got 100 records. For page “1”, same. For page “2”, I got 12 records. AHA, I’ve found the last page and now I know I have 212 task records. For page “99” or anything beyond “3”, I got this: a list header but zero records.
Now let’s go back to the 2nd element of List.Generate: the criteria to keep looping. I needed to count the records in the list, without drilling into it first like I did manually. Therefore, I needed to provide “what” I’m counting AND the data field name. I had trouble with brackets (what to use where) and wasn’t sure the syntax but trial and error resulted in this code below.
- [Source] is because in my query, I’m calling this “Source” in my query.
- [tasks] because in this endpoint, it is returning tasks (which is shown in both screenshots to understand where to look for this). ** if your result says “data”, you would “data” etc.
|
|
To confirm my own understanding, I changed the term “Source” in my query with “MyResults” and the query still worked. See below, the exact same query as earlier in this blog except I swapped out the word Source in my query with MyResults. Here’s the new line 2 of List.Generate:
|
|
I was hoping to do another example on a different endpoint to prove that the “term” tasks was because the endpoint was tasks. However, I attempted to use this same logic with other ClickUp API endpoints but they don’t appear to have page limits. I.E., I tried “spaces” and “time_entries”, no matter what page number I pass in, I get all the results each time (>100 records in some cases) so looping attempts resulted in an infinite loop! Oops! So, I believe I know why I needed to use the word “tasks” in this case as the field name to count, but have no way to vet that for sure.
Summary
In closing, I hope this helps someone out there with pagination, whether it is specific to ClickUp or another similar example of anything where you don’t know how many pages of results are being returned.
*This post is locked for comments