Solved

Fetching reports about a shop for a two year span

assembledbrands
Tourist
4 0 0

Hey all,

 

I am a developer integrating with Shopify for the first time.  The application I am building with my company requires multiple years of financial information about a retail company to help with determining if our company should loan them money.  We are hoping to programmatically fetch financial information from Shopify about a retail company given their permission using OAuth.  Ideally we want the same information that the "Sales over time" and "First-time vs returning customers" reports display in the Analytics tab of the myshopify.com UI.  However, through looking at the API documentation and reaching out to general Shopify support, it appears there are not any APIs to fetch pre-made reports.

 

The support representative told me instead to fetch orders from a Shopify shop and then produce any type of report that we would need, since we have the raw order transaction data.  I was able to get this working using the following three API documentation pages:  

 

OAuth: https://help.shopify.com/en/api/getting-started/authentication/oauth

Orders: https://help.shopify.com/en/api/reference/orders/order

Request "all orders" access: https://help.shopify.com/en/api/getting-started/authentication/public-authentication#orders-permissi...

 

My code would direct users who went to our website through the Shopify OAuth flow and then fetch orders in a paginated way with the resulting access token.  We would fetch orders 250 at a time (the max per request) and request the next batch of orders using the "since_id" query parameter. 

 

This all went well until we got the "read_all_orders" access approved, and then I tested out our application with a shop that had 30,000 orders in the last 2 years.  It took on average 15 seconds to fetch 250 orders for a shop, which would come up to about 30 minutes to fetch all 30k orders for this test shop (the second time I tested it, Shopify's caching kicked in and sped it up a decent amount).  On top of that, my first three tries of just waiting for that process to happen resulted in one of the requests returning a 500 error half way through with a body that just said: "{"errors":"Internal Server Error"}"

 

While we don't make our users wait with a spinner on our website, nor does our use case require this data to be fetched instantaneously, 30 minutes does feel like a long time.  So I began looking into ways to speed this up, and my first thought was parallelizing the 250 chunks for orders.  I found this documentation explaining the rate limiting pretty well:

 

Rate limits: https://help.shopify.com/en/api/reference/rest-admin-api-rate-limits

 

My plan was to make the "orders/count.json" API first to see how many chunks of 250 there were for this shop and then kick off as many request as needed in parallel while staying in compliance with the rate limits defined above.  However, the official documentation for the "2019-04" version of the orders API doesn't define a "page" query param that would accept values like "1" or "2", and instead only has a cursor style pagination query param of "since_id".   

 

So at this point, here are the ideas I can think of:

 

1) Fetch the orders as I currently do (without parallelization) and change our application to accept the case that some processes may take 1-3 hours.  We know of our shop that has had 150k orders in the last 2 years that we would want to know about.

 

2) Attempt to parallelize the requests.  Is the "page" query param actually gone or is there a way to achieve the same behavior?  However, is this even worth it due to the rate limits for large shops with lots of orders?

 

We are ultimately trying to get the data displayed in the reports that the Shopify UI shows in the Analytics tabs.  Am I approaching this problem correctly?  Is there a simpler solution?

 

Thanks in advance!!

 

 

Accepted Solution (1)

Josh
Shopify Staff
1134 84 233

This is an accepted solution.

Hey there, 

 

I do think that you're going about this the best way currently available, this is an issue that we're aware of and we are considering different methods of pagination in the future. 

 

There is a page parameter available though which I think is worth mentioning, but I generally discourage using it as it's much more inefficient than since_id and I think you'll find it is a lot more error-prone as well. Especially when you get into higher page numbers. Additionally, if the offset generated by a request including the 'page' parameter is over 100,000 (so over 400 pages if using a limit of 250) the request will fail outright. So typically since_id is the better option to go with. 

 

With that being said though, if you want to send parallel requests then you'd need the page parameter. I just think that if you do decide to go this route you should expect to encounter more errors and slower requests, and it is not ideal for larger shops. 

 

 

 

 

Josh | Shopify 
 - Was my reply helpful? Click Like to let me know! 
 - Was your question answered? Mark it as an Accepted Solution
 - To learn more visit the Shopify Help Center or the Shopify Blog

View solution in original post

Reply 1 (1)

Josh
Shopify Staff
1134 84 233

This is an accepted solution.

Hey there, 

 

I do think that you're going about this the best way currently available, this is an issue that we're aware of and we are considering different methods of pagination in the future. 

 

There is a page parameter available though which I think is worth mentioning, but I generally discourage using it as it's much more inefficient than since_id and I think you'll find it is a lot more error-prone as well. Especially when you get into higher page numbers. Additionally, if the offset generated by a request including the 'page' parameter is over 100,000 (so over 400 pages if using a limit of 250) the request will fail outright. So typically since_id is the better option to go with. 

 

With that being said though, if you want to send parallel requests then you'd need the page parameter. I just think that if you do decide to go this route you should expect to encounter more errors and slower requests, and it is not ideal for larger shops. 

 

 

 

 

Josh | Shopify 
 - Was my reply helpful? Click Like to let me know! 
 - Was your question answered? Mark it as an Accepted Solution
 - To learn more visit the Shopify Help Center or the Shopify Blog