Add 2nd "False" output to all FILTER nodes

iCFO · October 21, 2021, 2:51pm

I don’t understand the reasoning behind a single output on any filter node… I think that the Alteryx approach of a TRUE and FLASE output on every filter node makes much more sense. If a user only wants the TRUE output to continue through a workflow, then they simply leave the false output unattached. Quite often I find myself going back to a previously used filter and needing that FALSE output for testing or alternate processing. It would be nice to at least have the option to show/use the FALSE outputs. It is extremely rare that a single output filter node is of any use to me.

I pretty much exclusively use the “Splitter Nodes” when I build from scratch on my own because of this issue, but there can be UI advantages “Filter node” settings and it would make editing existing workflows that use filter nodes much quicker to improve if they had the option of outputting the FALSE rows or columns.

It seems like it would be an easy to implement upgrade to the existing nodes, and it would pay serious dividends in workflow construction flexibility.

iCFO · October 21, 2021, 3:47pm

This would also help to clean up a lot of the redundant nodes. Filter and Splitter are basically the same thing except Splitter is much more flexible because of the addition of the FALSE output. Why double the node count for the same tasks and add clutter to the node repository?

aworker · October 21, 2021, 4:07pm

Hi @iCFO

Theoretically, it is true that the KNIME -Filter- nodes could be replaced by their equivalent -Splitter- node. Having said this, I can say by experience that if you do not need two outputs, the -Filter- nodes are much faster than the -Splitter- nodes and hence I rather use the -Filter- nodes than the -Splitter- nodes in this case and for this reason. I guess they are faster because KNIME needs to allocate a second Table for the second output in the case of the -Splitter- nodes.

Thus, my common practice is that I use almost always the -splitter- node when I’m developing a workflow so that I can check what comes out from the two output ports and later, I replace them by -Filter- nodes if the 2nd output is eventually not needed. It is part of my optimization process in terms of usage of RAM but also of disc storage.

Maybe KNIME developers thought of other reasons for having this two options. Personally, I find them quite useful.

Best

Ael

iCFO · October 21, 2021, 5:53pm

It makes sense that a filter node would perform faster than a splitter node because it isn’t processing and passing the FALSE output rows / columns. The real question is how much would it slow down a filter node to have the “option” of including the FALSE output? It seems to me that it could be designed to only process that FLASE output table when that option is selected, which would avoid the slower performance issue when only a single output node is needed.

If so, a single quick setting setting would offer the flexibility to optimize performance by deactivating / removing the FALSE data channel and output, and avoid the hassle of swapping splitter nodes with filter nodes and adjusting settings as a final optimization step.

iCFO · October 21, 2021, 6:01pm

It seems like the code already exists. Basically that 2nd output option would be a setting that toggles between “process as filter node” or “process as a splitter node”. There may be slight reduction in processing speed when processing as a single output Filter node, but I think it could be negligible if done correctly.

aworker · October 21, 2021, 6:12pm

Knime is already moving in that direction. For instance, KNIME developers have recently implemented a variable number of inputs instead of a fixed one on nodes where this was suitable, i.e. the -Concatenate- node. Maybe your suggestion will be implemented in near future releases