Streaming issues with line reading of bulk operations via JSONL / API 2019-10

New Member
1 0 1

Hi there,

 

I'm currently implementing a feature that is supposed to bulk import orders, convert them into csv and upload them to a custom storage endpoint.

 

However, the documentation states that nested connections are no longer nested in the JSONL output. Therefore, it might happen that a product / lineItem etc. that is part of an order, might not be the direct successor (in terms of lines) of the related order. 

 

Using the JSONL format is perfect for streaming data and preventing high memory usage. But how do I know how much data I need to load at least to ensure that I covered all relations for a specific item? Is it possible that a connection for an entity of line #1 might appear at the very last line of the file? This would break the concept of streaming as I would still be required to push the whole file into memory to restore the relations.

 

Would be great if you can provide feedback on this topic.

 

Thank you a lot!!! 

 

Best, Bastian

1 Like
Highlighted
Shopify Staff
Shopify Staff
228 23 25

Hey @bastian12 ,

 

It is possible for a connection on line #1 to appear at the bottom of the file, but using the JSONL format gives you the advantage of not having to load all the data into memory at once. Since it provides all data as separate objects, you can load the file using far less memory by parsing it line-by-line. You'll still need to restore the connections using parent_id as you mentioned, but you can also do this one object at a time without the need to load everything into memory at once.

 

To ensure you've covered all relations, you can query currentBulkOperations and look at the objectCount. This will tell you how many objects are contained in the file, and you can use that to ensure you've parsed the same number of objects when restoring the connections.

JB | Developer Support @ Shopify
 - Was my reply helpful? Click Like to let me know! 
 - Was your question answered? Click Accept as Solution 

0 Likes
Highlighted
Shopify Partner
9 0 1

hi @_JB ,

 

We got a use case to download all products, along with their connections: variants, metafields, images... to sync to our system. We need to construct whole product data into a single object.  With current JSONL structure, objects are flatten. So even if we can parse line-by-line, but in order to construct whole product object, we must parse the whole file content to temporary storage as a buffer. This buffer can be either local memory or a db.

  • If it is local memory, so the whole benefit of JSONL is removed...
  • If it is db (Redis, NoSQL db), it would cost more computation, make the process more complicated and slow down performance.

In conclusion, it would be best if whole product data is stored in single line for this use case. Am I wrong? Is there something that I missed?

 

1 Like
Highlighted
New Member
4 0 0

It is possible for a connection on line #1 to appear at the bottom of the file

@_JB - this makes it almost impossible to maintain the low-memory footprint. I don't see any problem with using JSONL, but I do see trouble when things can get out of order. Say we have a file like following:

{id: 'a'}
{id: 'b', __parentId: 'a'}
{id: 'c', __parentId: 'a'}
{id: 'd'}
{id: 'e', __parentId: 'd'}
{id: 'f', __parentId: 'd'}
{id: 'g'}
... 500MB Later ...
{id: 'h', __parentId: 'a'}
{id: 'i', __parentId: 'd'}

To reconstruct root-level objects, we would need to keep them all in memory while parsing, unless I'm missing something really clever. 

0 Likes
Highlighted
New Member
4 0 0
To reconstruct root-level objects, we would need to keep them all in memory while parsing, unless I'm missing something really clever. 

Nevermind! I actually was missing something really clever after all!

0 Likes
Highlighted
Shopify Partner
9 0 1

@Adverbly,

I would very much like to know what you did miss, because maybe I miss that too! Would you mind explaining it please?

0 Likes