We are hoping to keep our app in sync with the merchants product catalog.
We’ll use a webhook notify us when a product is deleted from the catalog, but also want an alternative process to reconcile products in the catalog if there should be a comm error or the webhook is missed.
Understand that when a merchant deletes a product it is hard deleted.
Can anyone recommend a bullet proof way to validate and reconcile the product catalog at any point in time?
The only implementation that comes to mind for such a reconciliation would be to poll the products endpoint to retrieve all active ids. You can then compare that with what you have stored on your servers to determine which products have been deleted.
The easiest way to retrieve all products would be to use the since_id parameter for pagination. Your first call would be of the form:
GET /admin/products.json?since_id=1&limit=250
You would then take the largest product id returned in that request and pass it in as the since_id parameter of the subsequent call until you have retrieved all products. Therefore, any products which you have stored locally which weren’t returned during the API calls must be deleted.
Isn’t there a risk that if not all the products are returned for some reason, like a comm error or other, then we might accidentally delete products which are actually still active?
As long as the API call is successful (ie: 200 is returned), then there should never be a case where an active product is not included in the list when you’re using the since_id parameter.
There are instances when products could be skipped when page is used for pagination, but for that reason, among other performance issues, we don’t recommend using it.
The best way to check for deleted products will still be a combination of subscribing to the products/delete webhook and occasionally polling our API for a list of products on the store.
If you were not wanting to deal with paginating results when making a query for a list of products, you can always use a Bulk Operation Query to run an asynchronous GraphQL query to get a list of ALL products on the store at once, without needing to worry about pagination or api rate limits.
Here’s a Shopify.dev article with more information on how to run Bulk Operation Queries.
Here is my code to pull all product IDs using bulk query. The bulk query itself takes about 5 seconds to complete, in which time a new product may have been created and will be subsequently deleted. I’m sure you will also agree that this is a lot of complex code for not very much gain. This race condition is guaranteed to eventually trip someone up.