How to implement Try Catch for GET Requests with Parallel Chunk Start and End

Hi everyone,
Hope your weekend is going well. I am setting up GET Requests with Parallel Chunk Start and End and my test run last night errored out with timeout after 4 hours. I am now trying to set up Try (Data Ports) and Catch Errors (Data Ports). I searched around and found these resources.

What I am struggling is how to use Catch Error (Data Ports) to retry GET Request for that errored row for several times then logs the error and move on to the next one. And ideally I want to capture that error data table and email it to myself so that I can inspect those particular GET Requests that errored out. Maybe the resources I’m enquiring about no longer exists, then I can remove it from the source list. Google API I’m calling showed as much as 10% error rate, wow.

Mr. @armingrudd had this and it looks close but slightly different because I need to move to the next one.

EDIT: OK, I can use Variable to Table Column to the error output port to capture errors in a table? Then do I put those rows back into the loop??

Thank you very much and hope your weekend is going well.

By the way, this is where I started…

Hi @alabamian2,

How about using the Get Request Plus component which I have shared in my KNIME Hub Space and NodePit Space.

By using this component you can set the number of retry attempts and also you will get the missed requests in the second port.

:blush:

6 Likes

Hi @armingrudd,
Yes, I will try this. Thank you so much!!

Hi @armingrudd,
You are already doing what I was trying to do in your Get Request Plus, no? If I wrap your Get Request Plus in my workflow, that’s running multiple parallel chunks in multiples? I should just use yours and set the concurrency (which accomplishes what I was trying to do with below). Am I understanding this correctly? If so, it was a super great experience building workflow but I should be better at searching for solution, like your Get Request Plus. :slight_smile:

Yes, you can send multiple requests concurrently.

:blush:

1 Like

@armingrudd, thank you. This makes a lot of sense.

I’m working to incorporate your Get Request Plus now. When it runs, I cannot seem to be able to stop it… Cancel, reset, F9, it has to run thru all requests or I have to close the workflow file. Am I doing something incorrectly? I’m connecting Get Request Plus directly to the nodes that creates REST GET URLs. I don’t need to use loop nor Try Catch, correct? I’m not able to cancel the GET Request node inside the component either.

image

You can cancel it but it takes some time to be stopped. Disconnect the component and then you can do anything inside it.

Right. you don’t need try-catch errors if you don’t need to seek for errors other than failed requests.

:blush:

1 Like

Disconnect as in deleting the line that connects to the previous node? It lets me select the line but hitting delete key doesn’t do anything.

image

I’m testing with just 20 urls and each call takes about 20 seconds to return json. Get Request Plus keeps running now. Normally 20 requests goes quick.

EDIT: It did finish and outputted the data correctly. …testing some more… THANK YOU!!

1 Like

Maybe I’m misunderstanding the Concurrency setting. Concurrency of 1 means it’s just running one instance of the node (workflow component). 2 means 2 instances of the node or the workflow inside it simultanerously? What I was trying to achieve was to run multiple GET Requests to speed up the data collection. I tried to research for “Concurrency” setting but couldn’t really find much. Inside the component node, I saw Recursive Loop but no Parallel Chunk Loop, which I thought would be needed to run multiple instances simultaneously. I’m running a test with 200 urls now and changing Concurrency doesn’t seem to speed up the whole collection process. I’m going to increase the Concurrency to maybe 5 and see. I’ll report back. Thank you Sir, @armingrudd.

EDIT: Parallel Chunk did this so may be put your Get Request Plus inside Parallel Chunks to handle re-tries and failed request output? Will test that.

Concurrency is the number of requests sent at once.

:blush:

OK, I had it at 2 so I’ll double that to 4 and measure how fast it goes.
Thank you again, Sir. Much appreciated, Mr. @armingrudd.

1 Like

Hi @armingrudd,
I ended up doing this and am still working on it.

Now, I’m trying to use the Missed output port to write out the missed rows to Excel. I added some incorrectly formatted urls so that the API will either fail or return error and while I can see failed rows in the Missed output ports inside the Parallel Chunk window, I’m seeing this error from Excel Writer. I assume it’s because I am not able to execute the Create Data&Time Range node. My intention was for each instance of parallel GET Request Plus to write out Missed rows if any. How could I best write Missed row table to Excel? Thank you so much for your GET Request Plus node and help.

image

Hi @armingrudd,
I used Empty Table Switch and I’m saving Missed in Excel now. :slight_smile: Now, I’m seeing some cases where the node is collecting ? (NULL?), making those come thru as regular Output rather than Missed. Should I do Rule Engine right after the Output port and send those back into GET REQUEST PLUS input?
Thank you so much.

image

Hi Sir,
This is where I’m at right now.

I am feeding 7 records that will fail into this Parallel Chunk with Get Request Plus. I am trying to write Missed rows into Excel or table. I have 6 custom chunks in the setting. This setup only captures the Missed from the last row. Am I not getting the other 6 Missed because those are processed in the Parallels? Maybe slow but making progress!! :slight_smile: Thank you!!


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.