KNIME big tables join

Alkaline · March 11, 2021, 10:54am

Hi,

I want to join different tables:

1.4 Mio
0,06 Mio
0,04 Mio
0,01 Mio
2,1 Mio
0,4 Mio

My laptop (16 GB Ram, i7-6820 @2.7 GHz, AMD R7 M370) is able to do the first or second join but then cannot continue (also over the weekend).

I looked here and optimized my knime.ini

Any further tipps? What is important for the performance when I will look for a new laptop (Ram, CPU or graphic card)?

Iris · March 11, 2021, 3:01pm

Did you try using our /Joiner (Labs) We did a lot of optimizations for this and it is a lot faster.

Keep in mind that this can be a huge resulting table. Do you need the full joined table afterwards?

AnotherFraudUser · March 16, 2021, 3:47am

Hi @Alkaline,

how are you joining your data?
Each dataset with a cross join?

Something you could try is to reset the rowid after a join (so you do not get stacked Row1_Row2_X as IDs in a join.
How many columns do you have? Maybe you could remove columns and later do a lookup after completing the large joins.

Did you enable keep in memory in thr joiner configuration?

But I think Iris suggestion is a good start

Regarding the question for your specs - it depends on what you are doing. I think in most cases memory and cpu will be relevant however some learning modules might facilitate the GPU as well. However why not just test it with your usecase yourself? Start your heavy workload process and check e.g. in the task manager what is the most used resource

mlauber71 · March 16, 2021, 5:46am

@Alkaline could you tell us more about the nature of the joins? Do they involve single IDs and are these IDs strings or numbers (or a combination).

Is there an error message if any. Can you check if you run into problems with Java heap space.

Then you could try and tell KNIME to do everything writing to disk (in case you experience memory problems). Or you could do one join per workflow and save the result in a table.

Further hints about KNIME and performance can be found here.

Another thing you could try is install a local database like postgres or MariaDB and see if they are able to make the join and use KNIME as a front end.

Alkaline · March 17, 2021, 9:21am

Hi,

thank you all for the good suggestions. I will try the joiner (labs).

I have these tables and the common identifier is a string (letters,numbers & symbols). The tables do not have many rows, each one 4-5. I want to bring all these information together into one table. After the second or third join, there is no more progress, the percentage does not increase any longer. I will try to write everything on the disk, as I think that it might be memory problems (just a feeling). And will also look into the metacollection :).

Daniel_Weikert · March 17, 2021, 5:36pm

best of luck and feel free to post your update here. Surely interesting to many of us here.
BR

system · September 16, 2021, 5:37am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.