Hey Rob,
Couples of things here.
Yes, it's easy to run multiple requests in parallel with `page`. However, after a certain threshold, even if you run the maximum number of parallel requests with `page`, it would still be faster to run sequential `since_id` requests due to the efficiency of the associated database reads.
Second, it is possible to implement multiple parallel requests with `since_id` using a probabilistic parallel scan. In the simplest of terms, you would first need to calculate the minimum and maximum resource id for the dataset you're trying to retrieve (2 API calls). You would then split this range of ids (dataset) into an equal number of parts based on the number of parallel requests you are trying to make. This would give you a different starting point for each parallel request block.
The assumption here is that there will be an equal number of resources in each block, which obviously won't be the case. So you'll have to implement logic to reallocate thread pool resources to an existing block whenever a block is complete.
For example, I make 2 API calls to determine that I want to retrieve all orders between id 1000 and 9000. I'm running 4 requests in parallel so I split my resource id range by 4. (9000 - 1000) / 4 = 2000. This gives me starting ids of 1000, 3000, 5000, and 7000 for each of my request blocks. Let's say at some point thread 2 has terminated as it's retrieved all orders from 3000-4999. I would then check which id each of the remaining requests are on to determine where to reallocate the thread. Assuming those values as 1400, 6000, and 8250 I can determine that block 1 still has the most resources to retrieve (2000 - 400 = 1600), so I would begin a new parallel request at the halfway mark of the remaining block ie: at (1400 + (2999 - 1400) / 2) id = 2200.
The exact logic about how the algorithm reallocates resources is entirely up to you, the above is just an example of one way you could accomplish it.
... View more