Slow Joiner performance on tables with null values

Hey everyone

I encountered a seemingly strange behaviour with the Joiner node and I assume this has been introduced recently. I’m working on Knime 4.2.0.

If I left outer join a data table (~250000) records to a small dictionary table in order, the Joiner node can get really slow if the data table contains a lot of null values on the joining column. If these null values are however replaced with empty strings, the performance is fast as usual.

I created a demo workflow with random data and attached it. I hope that helps.
Best regards
Jan

Demo_Joinerproblem.knwf (15.3 KB)

Hi @JanSteinwand

that is because missing values do match nothing. We did not touch the joiner code, so there should be no difference to 4.1. I will check with our development team if we can further improve this case.

We did however introduce a brand new Joiner (Labs) with 4.2. and this outperforms our previous joiner. Below you see it only need 600ms instead of the 30s before.

image

Hi Iris

Thanks for your reply. I understand that missing values are not matching anything, however I don’t get why this should affect the performance of the joining so negatively. Essentially from my understanding that just means skipping the row without trying to match it to the second table.
Right now my workaround to this problem is to clone the joining column and fill the nulls with values that can’t match to the second table. That doesn’t seem like a “healthy” way of doing it.

Did you have a look at the attached example? The empty strings also don’t match to any records, however the performance is way faster.

Regarding the Joiner (Labs) node: I’ve seen that one and I like it. It just seems to lack the option to “Match any of the joining columns”. The new Joiner (Labs) node seems to always “Match all of the joining columns” or did I not understand this right?

Thanks and best regards
Jan

Hello @JanSteinwand,

You are right. It is on the list (internal reference: AP-14577). Have added +1 to it. Someone will update this topic once it has been resolved.

Regarding joining with missing values. Tnx for provided workflow. I have checked and indeed the execution difference is around 20 times(!). But I have tried new Joiner node and seems this was addressed there as joining with missing values is even slightly faster now than joining with empty ones. Workflow attached.
Demo_Joinerproblem_ipazin.knwf (22.8 KB)

To sum it up: you have a workaround for this until “Match any” functionality is implemented :smiley:

Also missing values in joiner was discussed here:

Br,
Ivan

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Hello @JanSteinwand

in case you haven’t seen with KNIME version 4.4.0 Joiner node is out of Labs and features match any of the following option. Give it a try and any feedback is welcome.

Br,
Ivan

1 Like