Lesson Learned: Always include all columns when joining

Sometimes I don’t like the way KNIME manipulates the settings of current nodes in accordance with how former nodes change its input data. I know this is how KNIME works and it’d be difficult to decide where to keep this behavior and where not.

I don’t like, for example, how it looks like when I open a configuration of a Row Filter node at the moment input data does not contain the column the Row Filter node was configured to filter by. It simply sets itself to another column. Pretty confusing.

Unlike this, I like the way the Column Filter behaves in this case. It simply marks missing columns red just as you can see here:

And for now, let’s consider a complex workflow full of logic full of nodes and components. And let’s consider you have a Joiner Node joining two tables of tens of columns. At the design time, we decided to filter output and we chose a number of columns to keep. So we un-checked the Always include all columns option and picked a column or two to include. Let’s have a look at a simplified example:


Let’s imagine we have our logic based on joining and filtering so we have lots of such joiners. At a certain moment, we could rename table columns. Capitalizing them is a good example. Btw it proved successfully to me to stick strictly on either lower or upper case among the whole workflow. Well, let’s change the case, and let’s see the Node’s configuration. This seems OK. It’s wrong but we know what to fix:

But this is a problem. We have no clue. The node lost all its Column Selection settings and we have no idea how to configure it again:

So my conclusion is never to filter columns in a Joiner node. Always include all columns and use Column Filter nodes for column filtering instead unless you could always re-create your workflow by heart.

Possible workarounds:

  • Never rename anything. Well, some people act this way. Keep away from them.
  • Backup your workflows using VCS make backup copies of your workflow prior to any refactoring.

Check out the new Joiner Joiner (Labs) it comes with the advanced column filter panel.

Best, Iris

2 Likes

Hi Iris,

Thanks a lot for your quick response.

I’ll have a look at the Joiner you recommended. Just a question. Are Labs nodes production ready?

Thanks,
Jan

Hello @jan_lender,

check this comment about nodes from Labs:

And regarding Row Filter you can check Row Splitter (Labs) (I know Labs again :smiley:) which doesn’t set just another column in case original one is missing.

Br,
Ivan

Hi Ivan,

Thank you.

As for Row Filter (and the Row Splitter labs again you pointed out, I personally prefer Rule Based Row Filter / Splitter over Row Filter / Splitter. Rule based ones are more powerful and flexible and do not suffer from the issue I described in this topic. I understand there could be some performance differences. Well, if so, I would sucrifice performannce in favor of reliability here.

Regards,
Jan

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.