Slow Joiner Node with less rows

Hello together,

i have a problem with a joiner.

Table1 consists of 847k rows and 47 columns
Table2 consists of 1200 rows and 9 columns

Target:
Task is a simple left outer join adding a column from Table2 to Table1. Table2 has been cleaned up for duplicates.

Challenge:
The join takes too long, after 3 days only 52% of the node was executed. So far I never had problems even with larger joins. As an attempt I removed all columns from Table1 except the Joining Column. Unfortunately this did not lead to the desired success. Also changing the “Maximum number of open files” did not lead to a noticeable performance improvement.

Does anyone have an idea how I can speed up the whole thing?

grafik

1 Like

Hi @Java

Welcome to the KNIME forum !

Strange. Did you try the new Joiner (Labs) ? It should be faster.

Joiners

Besides this, could you please tell us what you are using as joining column ? For instance, the column type (number, string, or other special type such as molecule, protein, image, document, etc. ? Perhaps share some data in a minimalist workflow ? Thanks.

Hope this helps.

Best

Ael

3 Likes

Hi aworker,

The Joiner(Labs) works as expected :slight_smile: I did not realize that there were such differences. Thank you very much, I can continue to work with it.

But apart from that, I would still be interested to know why the previous join does not work.

The Joining Column of Table1 looks like this:
String Values (10745, 10P45, 0001A)

Table2
Matching Column
String Values (10745, 1045P, 0001A)
Column to join
String Values (Tire, Engine, Rest)

Best regards
Java

1 Like

Hello @Java!

Welcome to KNIME Community and glad new Joiner works better.

It’s hard to tell without seeing data and workflow itself. Do you maybe have long RowIDs? Remember seeing this as issue when joining. And if you can share your workflow someone can check it :wink:

Br,
Ivan

3 Likes

Hi @ipazin,

the longest RowID ist Row847135. If i find time i try to anonymize the data to share the WF.

But thanks for your help. :+1: Great Forum :wink:

2 Likes

Hello @Java,

then it’s not the RowID issue then. ok.

Br,
Ivan

2 Likes

Hi @Java

Glad it helped and thanks for validating the answer :wink: !

@ipazin’s hint is an example of why sometimes the joiner node could be slow. I would mention too for instance:

  • Join by types that are not standard, such as for instance chemo- or bio- informatics data.
  • Join by data that is by nature too big to be used as a key for joining, images for instance.
  • Join by numerical data that is not integer (i.e. with decimals) because not sure that will correctly match.
  • Join without checking first that the result would be memory intractable, for instance producing a M x N number of rows, where M and N are the number of rows of “huge” tables.
  • Join when “missing values” are present in the joining columns.

Just to cite a few. Having said this, there are bypasses to solve some of these problems or at least to detect them before achieving a join.

Looking forward to having the anonymized WF to further help if possible.

All the best,

Ael

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.