I recently switched from HTTP to EventBridge for receiving webhooks. I've noticed consistent delays, averaging ~20 seconds for `orders/create` and ~ 12 seconds for `orders/paid`, for example.
I send EventBridge events to a lambda function, where I then calculate how long it took the function to receive the payload by taking the difference of the `event['time'` from the EventBridge event and the `payload['created_at']` field. Code for that below. Then I generate a metric in DataDog to track this over time.
According to https://docs.aws.amazon.com/eventbridge/latest/userguide/aws-events.html the `event['time'] attribute is likely sent by Shopify, so by using this time (rather than `Time.now`), this should take any lambda issues out of the equation (e.g. function cold starts, concurrency, function code, etc).
How I calculate the delay:
def calculate_event_delay(event_time, topic, payload) action = topic.split('/').last timestamp = case action when 'create' payload['created_at'] when 'update', 'updated', 'paid', 'delete', 'edited' payload['updated_at'] when 'cancelled' payload['cancelled_at'] when 'uninstalled' nil end if timestamp (event_time - Time.parse(timestamp)) * 1000 end end
Where `event_time` is the timestamp provided directly by the event notification.
Here are my metrics in DataDog:
These two webhook topics should be delivered in near real time, as opposed to a customers/update or products/update webhook, which I understand can be throttled so the app doesn't get spammed.
Would really appreciate this gets looked into because my app relies on a timely `orders/create` webhook as I have a widget on the Order Status Page (confirmation page).
Hi @Jeff-Blake ,
When did you notice the increase in times? Both HTTP and EventBridge delivery latency should be roughly the same. We're rolling out some improvements to try and reduce the latency while not increasing any risk of duplicates. You should start seeing a decrease across both HTTP and EventBridge delivery times starting today.
Hey Mike. Thanks for responding.
Unfortunately, I only started tracking this recently... as after switching to EventBridge I noticed that my order status page widget was simply not loading fast enough (even though my app was doing all the processing it needed to within 300-500ms), i.e. I have the widget set to poll every 2 seconds, and with HTTP for the past ~600 days has always been mostly great.. my widget would load in 1-3 seconds. After 10 polls (20 seconds) my app kicks off a job to look for that order manually, and I saw a huge spike in those jobs recently, which led me to tracking this metric. I never noticed any serious latency with HTTP, and I would have heard complaints if that was the case.
I just checked my metrics and it looks like you guys made improvements around 7:30 AM PST! This is great, but still, this seems like way higher latency than it ought to be (or was, with HTTP).
Past 1 hour (average around 13s):
Hope this helps and if you need any more information let me know.
We strive for low-latency on the delivery but there's a lot of factors that can cause some delays along the way. The size of the order being one of them. Larger orders (and therefore, larger payloads) may contribute to a slight delay in delivery.
You should see most webhooks arrive within 10s from the time the event gets triggered. If there's a failure on the destination (ex: a destination API endpoint is failing/timing out) it will get retried several times with each attempt adding more delays.