File reader freezes

Hi All,
I have a workflow (see attached)

that takes 20 links from a long list of links (table creater) (~17,000), and downloads them. It extracts specific values from the downloaded table and makes a new table with the extracted values. The table is getting updated using nodes like “concatenate” and appends in a “csv writer” node. Suddenly, the csv reader stops to read a link. The url link that cannot be read works perfectly. The link that stops is not always the same, instead, it is always different. If I reset, and execute from the specific subset of links to continue downloading it shows no issue until it stops to another random link.

Could you please help me on how to bypass or resolve this issue?


LINKS_download.knwf (275.7 KB)

The knime console reports:

ERROR File Reader 5:104:66 Execute failed: Can’t access ‘http://geodesy.unr.edu/NGLStationPages/stations/COC2.sta’. (Read timed out)
WARN File Reader 5:104:66 Can’t access ‘http://geodesy.unr.edu/NGLStationPages/stations/COC2.sta’. (Read timed out)

Hi @amars -

I wasn’t able to reproduce your problem, which I suspect is on the server side. In my testing, I ran 100 URLs at a time.

That said, you might implement the KNIME Try/Catch nodes and see if that helps you. If there’s a hiccup from the server, these nodes should at least allow the loop to keep running, and you can review your results at the end to see if there are missing values you need to re-run again. Maybe something like this?

2020-01-13%2015_21_07-KNIME%20Analytics%20Platform

2 Likes

Try/Catch nodes is a fantastic idea! I just implemented and integrate them in the workflow, and I will let you know, so far so good.

I appreciate ScottF for your prompt response, and trying to reproduce the problem. The problem usually happens after 150 URLs at a time, sometimes at 500s. My best record has been 2600s, but after the problem is more often happening like every 200s.

1 Like

Unfortunately, it stopped at the 1266 URL link. It worked continuously until it stopped at the “file reader” node with a message:
“ERROR File Reader 6:111 Execution failed in Try-Catch block: Spec must not be null!
ERROR Loop End 6:109 Active Scope End node in inactive branch not allowed.”

Again, I had to double click (configure) the “file reader” node, and then click OK. Then, “file reader” node is able to continue after this annoying interruption.

Is there any other solution or modification that I may have missed to resolve this issue with “file reader” node?
Thanks,

Have you thought about first downloading the files into a folder, maybe with a short way in-between and catch error. You might also configure the workflow that it keeps trying until the list is complete.

@amars,

sorry my initial answer was to another post :). Anyway you could try the following, yet I’m not certain that it will work, use a CSV Reader instead and set a proper timeout

Best
Mark

1 Like

Any specific suggestion of configuration? I think I have tried everything, but I am sure I may have missed many others.

CSV reader! Interesting, I will try.

CSV reader gives me this error:
ERROR CSV Reader 6:155 Execution failed in Try-Catch block: Too few data elements (line: 3 (Row0), source: ‘http://geodesy.unr.edu/NGLStationPages/stations/00NA.sta’)

I set up a workflow that first downloads the .sta files to a local folder and then imports them and does the transformations. In order to make sure all files are downloaded after a freeze it scans the download folder and matches only the files that have not yet been downloaded. Also the list is used for the transformations.

For the demonstration, I have just downloaded a few files. Will let this run and maybe come back later. You could give it a try.

POSSIBLE SOLUTION (works for me/tested for 17,000 url links at once)
Here is a possible solution that works for me. I have changed the hidden variable under “flow variables” tab in settings of the “file reader” node to 10s. Perhaps, few may not agree but I created a workflow variable “TimeOut10sec” with a constant value of “10” (under Knime explorer, right-click on the workflow, select workflow variables). It seems that file reader has an “1” second as a default value (not visible anywhere), and I have increased this value using the workflow variable. Therefore, downloading process continues with no issues in the “file reader” node.

Suggestion to Knime developers:
I would suggest to surface up this hidden variable as a timeout setting under the settings tab.

4 Likes

Great, it worked out. Attached you find the results from the download and loop

list_gps_coords3X.table.zip (487.1 KB)

mlauber71- Interesting! I would like to check it out. Could you please upload the workflow? The file that you have uploaded is just the table.
Thank you

1 Like

The workflows itself is in the hub if you want to test it yourself

The knwf file with all the downloaded files is too large I might share it via Dropbox if you need it.

Hi there @amars,

glad you managed to find solution.

If this wasn’t planned then sure is now. Tnx!

Br,
Ivan

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.