I encountered a seemingly strange behaviour with the Joiner node and I assume this has been introduced recently. I’m working on Knime 4.2.0.
If I left outer join a data table (~250000) records to a small dictionary table in order, the Joiner node can get really slow if the data table contains a lot of null values on the joining column. If these null values are however replaced with empty strings, the performance is fast as usual.
I created a demo workflow with random data and attached it. I hope that helps.
Thanks for your reply. I understand that missing values are not matching anything, however I don’t get why this should affect the performance of the joining so negatively. Essentially from my understanding that just means skipping the row without trying to match it to the second table.
Right now my workaround to this problem is to clone the joining column and fill the nulls with values that can’t match to the second table. That doesn’t seem like a “healthy” way of doing it.
Did you have a look at the attached example? The empty strings also don’t match to any records, however the performance is way faster.
Regarding the Joiner (Labs) node: I’ve seen that one and I like it. It just seems to lack the option to “Match any of the joining columns”. The new Joiner (Labs) node seems to always “Match all of the joining columns” or did I not understand this right?
You are right. It is on the list (internal reference: AP-14577). Have added +1 to it. Someone will update this topic once it has been resolved.
Regarding joining with missing values. Tnx for provided workflow. I have checked and indeed the execution difference is around 20 times(!). But I have tried new Joiner node and seems this was addressed there as joining with missing values is even slightly faster now than joining with empty ones. Workflow attached. Demo_Joinerproblem_ipazin.knwf (22.8 KB)
To sum it up: you have a workaround for this until “Match any” functionality is implemented