I am currently in development and have implemented both a create and update webhook for Orders and a once per day cron job for Orders in case a webhook was missed or some error happened.
I am wondering what ya'll do
Should I be worried about API limits at all?
Let's say a store can do 10,000 orders in a single day, am I at risk of hitting the API limit?
API limits are only related to the REST API (web hooks are not affected) and would rather address the rate of the calls than the number of orders. To fetch 10k orders by 250, you'll only need 40 calls an if you keep it to not more than 2 requests per second you should be good.
If I made one call to get all Orders from the previous day, say the last day had 1000 orders. Would this be 1000 requests? Or would this be considered one?
I have a cron job every 24 hours to get any orders that were possibly missed from the webhooks is not fired on action, etc
As for incoming webhooks. Since you have experience greater than I... Would you consider using background jobs for incoming webhooks to be a good idea in my case? My app relies on webhooks and it's information. I am worried about Redis going down, etc. and losing any data.
I keep reading webhooks are recommended to be used with background jobs but I am unsure of this for my situation
In my scenerio...
A webhook comes in.
It takes 0.04 seconds to give me a:
Completed 204 No Content in 34ms (ActiveRecord: 0.0ms)
Within my console
And the entire CreateJob ends less (a little bit less than) than 2 seconds from the incoming webhooks start.
Is the 204 the status Shopify is looking for, or waiting for the entire File to finisha?
What would you do in this case?
We're building a similar system.
Our approach relies on webhooks on the one hand:
- Capturing webhooks and responding 200 ASAP
- Put the webhook in a queue
- Handle the messages in the queue to feed data to our repository
And a CRON job on the other hand (as a fallback solution because we assume there is a chance webhooks are missed or not sent).
It would run at a scheduled interval and will implement the following logic:
- Get list of entities (cycle through pages, 1 call per page)
- Compare updated_at against our data, update data if newer data is found
Finally, our façade layer will:
- Check if it finds data in our cache first and respond based on that
- If data is missing it will get the data from the REST API and update our cache
This approach would work well with most entities available through the REST API of Shopify.
Some entities, however, do not have the required webhooks (example: PriceRule/Discount).
That is very similar and good to see other devs are doing alike code
You don't have any worries about Redis or your background worker DB going down temporarily at all?
I'm starting to think that using a background job for the webhook may not be neccessary or maybe even a risk not work taking.
When I receive the webhook:
I get 204 success in .04 seconds
The entire Webhook file actions complete in under 2 seconds.
With Shopify only sending 1 webhooks at every 5 seconds at most (not confirmed as fact, but what I heard, please let me know?), I may be in the clear with no background worker.
With regards to handling webhooks:
It's never a good idea to handle those synchronously.
Sending them to a queue has many advantages: 1/ reduce the risk of your webhook handler going down when a large volume of webhooks is received, 2/ be able to recover when one of the dependencies of your webhook goes down, 3/ access to retry mechanisms (depends on your implementation) if your webhook handler fails to handle specific webhook payloads you did not anticipate.
With regards to our cache mechanism going down:
We no longer rely on Redis for caching. While it is fast we were unable to make Microsoft's Azure implementation scale to our needs. Instead we now use CosmosDB (also Microsoft Azure stack), which has better support for scaling and georeplication at very similar speeds, which are predictable: O(1). In theory any caching mechanism should be "optional" in the sense that your implementation should not rely on it. Our implementation falls back to the REST API in the unlikely case our caching layer fails (potentially resulting in 429s, so it's not strictly "optional").
With regards to our background workers going down:
Not really worried. We use queues and serverless computing (Microsoft Azure functions) to handle webhooks. In theory both scale infinitely and the performance is predictable. If implemented properly the risk is limited.