When using the Joiner node on two tables, where the right table is missing some values, and telling the node to do a Left Outer Join, rows with missing values are moved to the end of the table. This is pretty annoying and I don't get the reason or the benefits behind this behaviour.
Do you have any ideas how to solve or prevent this? Additional sorting is obviously no option, because KNIME is stopping with a Java Heap Space error (see my other thread) on very big tables.
Thank you for your help!
This shouldn't happen. Are you sure the end of your left table doesn't have a missing value in the column your are using as your joining column.
Absolutely sure, please see the simple workflow I attached to my initial post.
In my opinion, this is the correct behavior of the Joiner node.
If the node does not find matches for rows in the left table, in the right table then it puts the rows at the end.
Regarding the Java Heap Space error, just, in the knime.ini file, increase the Xmx512m value, putting Xmx2048m for instance (depending on your desktop memory)
Other alternative, right click on the sorter node -> configure -> memory policy -> write table to disc
Hope this help.
Nico, thank you for your time! I don't see why it is correct. The node description says:
"A Left Outer Join will fill up the columns that come from the right table with missing values if no matching row exists in the right table."
That is exactly what I expect when using the node. But instead it fills them up with missing values and moves them to the end of the table.
I agree the node description does not mention any rows resorting.
In my cases, it had never been a problem. Maybe Knime developpers, seeing your message will improve this :/
Thanks for the example flow, I've logged a bug and tagged this post.
Aaron, thank you very much!
What do you think, when will this be fixed? I ask, because this would save us some hours...
Let me quickly comment on this "problem" also. We don't consider the sorting behavior of the joiner a bug. It's a natural outcome of the joining algorithm that first produces the matching lines and in post-processing steps the left-overs from the left and right partition. I agree that a "retain sort oder" would be a nice addition but that's a feature.
Currently, you can work around this by using an ID generator node (e.g. math formula appending the row index) upstream the joiner and a sorter node downstream of it. We will use the same trick when we add the "retain sort order" option.
Just for completeness: Your statement regarding the sorter memory issue that you mentioned in the first post is resolved (ran out of disc space, no memory problem). Thanks also for following up on that in the other thread (http://tech.knime.org/forum/knime-general/solved-sorter-node-and-java-heap-space).