Reducing run time

mlauber71 · December 8, 2022, 7:24pm

@h_petheram first question would be how large your data is (as a CSV or parquet file) and how much memory have you assigned to KNIME (https://www.knime.com/faq#q4_2). I would suggest something like 2/3 of your memory.

Then: which version of KNIME are you using? The Joiner Node has been redesigned and also offers some configuration like doing the merge in memory (or not) - cf. Options / Performance. Another idea could be to use the cache node right before the join

Then actually there is a way to avoid having the data saved with the workflow - this will maybe not help you that much.

Another thing could be to try the columnar storage format that might also improve performance:

Then I can offer my collection of articles and links about KNIME and performance. One other candidate in a corporate environment has been an aggressive virus scanner that one might be able to tame.

Then in general when it comes to performance streaming is an option (just sending operations thru a pipeline per lines and not doing everything node by node). This will not help you much with joins which will need the whole dataset.