Optimize join of two tables with a unique lookup value

Hi all,

I am looking for the most optimal way to join two large tables. I have a transactional table with 10mio+ rows and a second table with related master data with 1.5mio+ rows in it. The second table has unique rows with a string as the key for the table.

I am looking for a way to run this join as fast as possible, and I am not sure if there are any alternatives. Previously when working with other ETL systems one could build a lookup table that is optimized for unique values.

Any input, or am I “just” stuck with the normal joiner for this?

Many thanks
Martin

Hi Martin,
What you can try is put your rows in the second table into a collection cell with the Create Collection Column node and then use the Cell Replacer to join it to the first table. Then you can use the Split Collection Column node to “unwrap” the collection column again. Create Collection Column and Cell Replacer are streamable, with can give you a performance boost. Split Collection Column unfortunately is not, but if you can replace it by a Java Snippet or Column Expressions, which are streamable, the whole thing can be streamed and that is pretty fast.
Kind regards,
Alexander

3 Likes

The new Join node should be optimised for performance. This points you might also want to consider:

  • see if you could force the whole join to be done in memory (settings at the node). You will of course have to have enough of that
  • you could set up your task so that the join is the only thing going on in your particular KNIME workflow and all the others are closed.
  • you could try and use a cache node before the actual join and see if this does help

Then you might want to explore my collection of more general performance topics. In general there will be a price to pay wither in more RAM or time and disk space. You will have to see what you might be able to optimise.

1 Like

Thank you for the input. We will run a few tests to see where we get the best performance and revert back here when we have some final conclusions

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.