GraphQL Bulk Operation CANCELED after long time running

GraphQL Bulk Operation CANCELED after long time running

ingjuliob
Shopify Partner
2 0 0

I ran a bulk operation to get all products from a store (8.5 million) and after more than 48 hours running it was CANCELED without any error.

I was checking the status every 5 minutes and at some point it ended up as CANCELED

{
  id: 'gid://shopify/BulkOperation/416348766468',
  status: 'RUNNING',
  errorCode: null,
  createdAt: '2023-01-17T17:54:23Z',
  completedAt: null,
  objectCount: '26291427',
  fileSize: null,
  url: null,
  partialDataUrl: null
}

5 minutes later

{
  id: 'gid://shopify/BulkOperation/416348766468',
  status: 'CANCELED',
  errorCode: null,
  createdAt: '2023-01-17T17:54:23Z',
  completedAt: null,
  objectCount: '26291427',
  fileSize: null,
  url: null,
  partialDataUrl: null
}


Note: As the documentation says, the objectCount value includes all the objects that are retrieved with the query (this includes the products but also variants, collections etc)

Did anyone face the same issue?

Did you find an answer?

Could someone from Shopify take a look at the bulk operation ID?

Thanks in advance!!

Replies 2 (2)

ShopifyDevSup
Shopify Staff
1451 238 497

Hi @ingjuliob,

 

I couldn't find any details on the specific event or time that your bulk operation was cancelled, but that would be the expected behaviour if the job had stalled. 

 

Instead of trying to export such a large amount of data all at once, have you tried using query arguments on the productConnection (like created_at, or updated_at) to break up the data into smaller chunks? That should make it easier to request all of the products across multiple bulkOperationRunQuery mutations. 

 

We also recommend subscribing to the bulk_operations/finish webhook topic rather than polling every 5 minutes for the duration of the job. 

 

Developer Support @ Shopify
- Was this reply helpful? Click Like to let us know!
- Was your question answered? Mark it as an Accepted Solution
- To learn more visit Shopify.dev or the Shopify Web Design and Development Blog

ingjuliob
Shopify Partner
2 0 0

Thank you for your prompt and helpful response.

Do you have an idea on how much a recommended small chunk would be?

Or do you have any statistics on jobs being stalled/cancelled?
as to know if there is an average of data in which the Job is stuck and then canceled, this would be quite useful.


Regarding your suggestions:


It seems using the created_at/updated_at query as suggested does not ensure that the chunk size is small enough to prevent the job from being stalled and canceled (since customers can create products at any time and often do so with data dumps).


Rather, do you think that using a query like products(query: "id:>=8078324100000 and id:<=8078324200000") would work to ensure that no more than 100K products are returned? even considering that product ids are unique throughout Shopify.

And to get the bulk operation response, yes, I was considering using the bulk_operations/finish webhook instead of polling every x time, this was just a test.

Thank you very much!