Group by on a large file

Hello
How can I group by 130 million records on 4 columns?
I want the rest of the columns first
Currently, when I do this, the heap size is full and it hangs

Is it possible to index?

Thanks for your guidance

That is a ton of rows… Did you try the “write tables to disc” option in the memory policy tab? How much ram does KNIME have access to? Assuming that you have additional resources available in your system, then you may need to open up more ram to KNIME by adjusting your KNIME.ini file.

1 Like

Therefore, the problem was not solved

Ram 64 GB DDR4
HDD 4TB

How much of that RAM is available to KNIME in the KNIME.ini file? Maybe try to up it to 50GB, or more?

-Xmx50g

2 Likes

Hi @mikep2020

Maybe this -GroupBy- problem can be divided into chunks using the -Chunk Loop- node to split it into different -groupby- steps less memory eager:

Doing this way, the amount of memory needed would be less per iteration and eventually less too at a second and final total grouping.

Could you please upload here a small example of your initial data and a mock small example of what you need to achieve so that we can help you further?

Hope it helps.
Best
Ael

2 Likes

The problem is solved, thank you dear friends

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.