"Stream closed" error

Hello there,

We have developed a workflow that tries to predict anomalies using previously trained isolation forest models. It divides data into several groups and applies the related model to each group using a loop. It works perfectly on KNIME Analytics Platform but when scheduled on our KNIME Server it fails after a few iterations (>150). It returns a “stream closed” error, which we cannot identify or understand.

We would appreciate any help you can provide us.

Regards.

Hi,
Which node exactly fails and do you have any console output for us? The content of the KNIME log (View -> Open KNIME Log) would also be interesting. Do you use the local H2o context for the isolation forest? Maybe a screenshot of your workflow would also help understand what is going on.
Kind regards
Alexander

Hello Alexander,

In this workflow Isolation Forest is implemented via Python nodes, so for prediction we use a “python predictor” node inside that loop. The workflow works perfectly in KNIME Analytics Platform, and it fails accessing server files, not predicting. So it is not a workflow problem but a server problem. This morning the job stopped after 121 iterations due to an error on a table reader node, when it tried to read a table with previous predictions for that data group (iteration) saved on server (new predictions were already calculated without any problem). Total number of iterations is about 300. We use “knime://knime-server” URLs.

Workflow is too big to understand anything with a screenshot, but we can provide you server o local logs by email.

Regards.

Hello @jricgar,
thank you for the clarification. Just to summarize and see if I understood correctly: the Table Reader stops with the above mentioned “stream closed” error? Are you using distributed executors or does the KNIME executor run on the same machine as the server? Which version is the executor and the server? Kind regards
Alexander

Yes, every day we get that error (or similar, like “read timeout”) when we try to execute this job on the KNIME server. This job tries to:

  • Iterates in different data groups. Each one is analysed independently.
  • Predicts anomalies with yesterday’s data for each independent data group, and after that, it appends this prediction to a saved table with anomaly categorization (Y/N) which contains anomalies detected during the training process and old predictions. In order to do that, it has to read and write an updated prediction table. These Table Reader and Table Writer nodes are the problematic ones.
  • If we execute this workflow locally on KAP, my machine is able to do this process without problems, even when it has to access yesterday’s data and old predictions table on the server where they are saved.
  • But when executed on KNIME Server, it always fails after some successful executions.

Both KNIME executor (v4.0.2) and KNIME server (v4.9.2) are running in the same machine.

Regards.

Hello,
I only found one mention of “stream closed” in the log file and it was only a warning, not an error. Did this only occur once so far and are you sure that this is the cause for the failure of the workflow? I noticed that there are a lot of errors concerning the Chromium driver and image generation from plots in your log. Have you installed libgtk? You can do that with sudo apt install -y libgtk-3-0 on Ubuntu and sudo yum install libXScrnSaver on RHEL 7/CentOS.

If you are sure that the stream closed warning from the table reader causes the failure, we have to investigate further. From the log file we can’t get more information, unfortunately. Are any other workflow jobs accessing the same file so that there might be a problem with too many file accesses? Or is the loop reading the file really fast? If so, could you insert a Wait… node into your workflow that slows down the loop a bit to see if that helps (of course this is not a permanent solution, I just want to narrow down the problem)? Sorry to have no quick solution here!

Kind regards
Alexander

Hello Alexander,

This problem happens from time to time. For example the scheduled jobs for this workflow has been working properly since last week (when we restarted the server to solve some performance problems) until today, when it has failed again with a socket closed error at the iteration number 17:

But, after close and reopen again this job, now I see this:

Error disappeared. Warning message on Missing Value Node isn’t related to the problem (just some string columns have missing values). If I try to continue with the execution, obviously I get this message:


And after restart the loop execution some nodes fail when they try to read files on the server, which is something strange because we are now again in the first loop execution which worked previously, and the file is still available and visible on the server:

The loop is not so fast, so I discard this cause. I have requested server logs to my IT team. I will send it to you asap.

Regards.