I am in the process of implementing quite a large work flow. I am facing issues in relation to the size of the tables present in. I am currently joining 2 tables, but the joiner node is taking too long to process, and eventually freezes at either 32 or 64%. I have changed the memory policy from “Cache tables in memory” to “Write tables to disc”. This is slightly improved the % completed of the node, but it is still getting stuck and is quite slow.
Please let me know if you need additional information.
The estimated size of the table is quite large, with over a million records with 50 columns. If size of the data is the problem, is there a workaround for this?
You could try the sorted join node where you presort the tables individually.
Can you give us an idea how large it is in terms of GB and what is the original format.
If you have enough disc space one idea could be to try and use the local big data environment and Hive. Could take some time but Hive should be able to handle that. Set all nodes that they should write to disc.
Hi there @parthak,
what KNIME version are you running? How much memory is assigned to KNIME? You can check this blog post on how to assign more memory to KNIME and in general how to optimize your workflows
first of all i recommend to check if you try to perform a n:m joining, because each duplicate set of keys produce a multiple of rows. You can do this check easily by using the group by node.
Thank you for the reply.
I can try to used this, but the problem is that I cannot find this node in the node repository. Does that mean, I am using an outdated version of knime? Because I downloaded it from the website. Not too sure