Extracting Data using Python and REST API - Consistent ConnectionResetError

Solved
Highlighted
Shopify Partner
6 0 0

So I am building a data pipeline from Shopify by running a script in order to extract the orders, line items and their respective price, quantity, tax bracket & category (taken from product tags) and cost. This is so we can take the data to visualize into dashboards for different team members. This is a proof of concept for management so we can't use any paid automated tools.  

The psuedocode is found below. 

We have over 14,000 orders since we started so the script takes several hours to run and I keep facing a ConnectionResetError. I can't help but feel there is a simpler way to extract the data required. All retail prices and costs and inclusive of the sales tax (therefore the profitability report is incorrect as it subtract Net Costs without tax from the costs with tax). 

Pseudocode:

Get orders url. 

Loop through the orders in the json dictionary. 

For each order: 

Loop through each line item

For each line item -->

1) Retrieve the item_id, price and quantity. 

2) Use the item_id and variant_id to retrieve a) product tags and b) product inventory_id 

3) Use product tags to retrieve the tax bracket and category

4) use inventory_id to retrieve the product cost. 

Save everything into a pandas dataframe. 

 

I'm genuinely stumped so any help would be highly appreciated. 

 

 

 

 

 

0 Likes
Highlighted
Tourist
24 0 1

Are you processing those 14,000 orders in linear time? If so, I would recommend trying to break it down to asynchronous tasks which can be run in parallel.

Edit: though with parallel connections, you'll need to be more cautious of hitting rate limit thresholds

0 Likes
Highlighted
Tourist
24 0 1

Since you're using Python, check out the multithreading module in the standard library. I've used the Pool class to process millions of log lines in parallel and finished processing in under 30 minutes for over 150 files of the same size. Somewhat of a learning curve, but highly recommended.

0 Likes
Highlighted
Shopify Partner
6 0 0

Im not sure I follow. Isn't multithreading running different functions concurrently? But what would I run at the same time? 

0 Likes
Highlighted
Tourist
24 0 1

Multithreading can also run the same function concurrently. You'd break your workload of 14,000 into however many threads the host processor can support (might take some tweaking, but I find that 2-4 is a safe number to start with). Each thread will handle a unique portion of that original workload, run the same function, and collate the results for you at the end.

 

0 Likes
Highlighted
Shopify Staff
Shopify Staff
176 21 27

This is an accepted solution.

Hey @Maqda7 

You might want to consider doing this in GraphQL. The orders object will have connections to the line items which will have connections to the various inventory and tax details you would need. Then you can perform a bulk query so you don't need to worry about rate limiting.

Kevin_A | Developer Support @ Shopify
 - Was my reply helpful? Click Like to let me know! 
 - Was your question answered? Click Accept as Solution 

0 Likes
Highlighted
Shopify Partner
6 0 0

Hello @Kevin_A

 

Looks like its time to learn GraphQL, thank you! 

0 Likes