Joiner node configuration size

I have been working with a workflow which uses lots of columns (as many as 200K columns). This is identified several performance issues with a variety of nodes. Several of the nodes have configuration dialogs which load the entire list of columns into a selection list which can take a very long time, instead of just loading the first “page” of columns and then paging as the user scrolls/interacts. This is understandable behavior. At times some nodes even save long lists of columns in their configuration files which can make them very large. This is not surprising.

However, the Joiner node is doing something strange when its input ports have large numbers of columns - it takes a very long time to open the configuration dialog and similarly a very long to time open its containing workflow for the first time… but the configuration files are very small. This performance issue goes away if I unselect the “Always Include all Columns” box on the Column Selection tab and use the long list of columns explicitly. I can only guess that the upstream node(s) - a table reader in this case - need to provide the Joiner with all the columns during configuration when the “Always Include All Columns” box is checked, and the wait is caused by this processing.

Everything still works, just frustrating to wait on the Joiner delays. Hopefully something to consider as wide row support is built into KNIME.

Hi @bfrutchey,

Similar to the Reference Column Filter node, which we talked about in the Reference Column Filter Performance post, it looks like configuring the Joiner node also scales unnecessarily bad with wide tables (for very similar reasons, too).

Our development is talking about various performance improvements to the Joiner node these days and this is definitely an item that we shall discuss and address in an upcoming release. Thanks for bringing it to our attention. Keep it coming! :slight_smile:

I’ll update this post once we’ve implemented a fix.

Regards,

Marc

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.