Knime 4.2 Severe Performance Regression

Good morning,

I happen to notice that the GroupBy, when doing a simple count Y for X for just 122.049 rows, takes several minutes just to reach 25 %. The XPath note also takes a whole lot of time more to even start too.

Something is clearly not right. I already downloaded, reinstalled all extensions and reconfigured Knime form scratch.


Update: Using the Test Data Generator Node to try to reproduce the issue it seems to be related to the actual data I generated myself. Inspecting the workflow folder the data collected, even though I am using Don’t Save Start / End nodes, is abnormally large (>200 MB).

I am trying to pin down the data cells at possible fault.

Update: I saved the data in a CSV and read it again. Performance is blazing fast once done. The data is also totally fine. Something is not right with the node, though.

Seems challenging to help you as you are still exploring and diagnosing the problem. Do you know why the data is that “large” for just 120k rows (assuming there are only few columns)?

Sometimes it also helps to extract runtime information, e.g. using visualvm and see what KNIME does while it’s processing the data (feel free to share the dump here and I can try to ask more useful questions).

Hi @wiswedel,

upon saving the CSV it was not more than 15 MB. The only point in time a lot of data is generated is during the crawl of a few thousand pages by using #community-extensions:palladian-selenium nodes.

Though, I enclosed them in Don’t save Start & End nodes, removing results / XML via Xpath etc… I am running the workflow now a fifth time and the issue still prevails. Even saving the mere 15 MB via CSV writer takes a whole lot of time.

Going to try to inspect the temp data saved in the workflow directory as I strongly believe something is not right with that.