In KNIME 3.7 and 4.1 versions, the Reference Column Filter performance is very slow when I have a large number of columns in the non-reference table on port 0. Specifically, I have a few hundred rows and tens of thousands of columns (and am filtering down to just a few columns). Running the Reference Column Filter node will take 20 minutes or longer in this situation. I can transpose the table, use the Reference Row Filter node, then transpose the table back in less than 2 minutes. I am hoping the Reference Column Filter node can be optimized so this isn’t something we need to work-around. Thanks!
Hi @bfrutchey -
Sorry for the trouble here - those are some interesting details about the timing. Let me pass them on to the dev team and see if they have any ideas. We’re always keen to improve performance where we can!
Thanks for bringing this up! I attempted to reproduce the issue you described. Using the Reference Column Filter node on a table with 50,000 columns and 500 rows, retaining only 10 columns took about 60 - 90 seconds in my KNIME AP 4.1. While that isn’t as bad as the 20 minutes you encountered, it is still terrible.
Taking a closer look, it turns out the runtime of the node scales unnecessarily bad with the number of removed columns. (More precisely, the node’s runtime was quadratic in the number of removed columns, i.e., doubling the number of to-be-removed columns increased its runtime by a factor of four.)
I have since implemented a fix that should be available with the next minor release at the latest. With that fix, in my AP, the node takes less than a second to run were before it would take longer than a minute.
Looking forward to the fix! I was running with ~187K columns and ~500 rows filter down to 4 columns. Glad it was such an easy optimization!
The fix should even make it to the next bugfix release
And it is here
We released today, so if you update, it should get back to its old speed.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.