Parallel Chunk Start Loop with multiple GET Requests to multi-thread requests?

alabamian2 · January 15, 2021, 12:42am

Hi all,
I’m using GET Request to call API, which requires one second sleep time up to 25,000 requests per day. It needs to run analysis and returns a big json so each round-trip takes about 25 seconds. I was hoping that Parallel Chunk Start can take a single list of REST HTTP URLs and distribute them over to multiple GET Requests nodes with slightly different delay seconds then to collect them all together at the Parallel Chunk End (multi port) node. This API does not accept batch processing. Am I even close to having this understood correctly? Probably not because these 3 GET Request nodes get the same set of URLs so they are just processing the same set together so completely pointless. If I have 3000 URLs, I want 3 unique set of URLs to process, hopefully to speed up the process.

It’s not working but fun ideating tho.
Thank you so much!!

alabamian2 · January 15, 2021, 5:38am

Hhmm, concurrency setting in get request achieves what I’m trying to do??

daniela_digles · January 15, 2021, 1:59pm

I have been using the Parallel Chunk Loop with GET Request before. It is enough to put the GET Request node in there once (between the loop start and end). The multiple instances are then generated automatically when you run the loop. The number of instances can be set in the configuration of the loop start with the custom chunk count.

alabamian2 · January 15, 2021, 2:52pm

Hi @daniela_digles,
That’s awesome. So you are using Parallel Chunk Start then GET Request (just one node) and Parallel Chunk End (not multi port)? Are you using “Use automatic chunk count” setting in Parallel Chunk Start, when you process '000’s of GET request urls?

Thank you for your help and sharing knowledge.

daniela_digles · January 15, 2021, 3:21pm

Hi @alabamian2,

yes, exactly!
I would be careful with the automatic chunk count, as this takes into account your system’s processor count, but not the availability of the service. With API’s that allow a high number of requests at the same time this works fine, but if you are creating too many queries at the same time they might block you at some point. So first check if they have a maximum number of requests in a short time in addition to the 25,000 requests per day you mentioned before.

alabamian2 · January 15, 2021, 3:31pm

Hello @daniela_digles,
Got it. It just says you have to sleep 1 second between the calls so I’ll just do that. And it takes like 25 seconds for the return of json. I tested by running 1 node GET Request vs Parallel Chunk Start GET Request and wow what a difference!!
Thanks for your help, @daniela_digles!!

daniela_digles · January 15, 2021, 3:35pm

You’re welcome!
Yes, then it sounds fine. Just check the status column if you got a 200 for each request.

alabamian2 · January 15, 2021, 3:36pm

@daniela_digles, are you using try catch by any chance?

daniela_digles · January 15, 2021, 3:56pm

For checking the 200, or in general? I have used it a while ago, but I’m not really an expert with it. In this case, I rather use a row splitter to manually have a look at the cases that are not 200.

alabamian2 · January 15, 2021, 3:57pm

OK, I’ll do that, and I can hook it up to sending gmail if error. I’ll play with try catch too.

Hope you have a nice weekend, @daniela_digles. Thanks again.

system · January 22, 2021, 3:58pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.