I need to sort a table with 36 million rows and 83 columns. The table columns mostly consist of double values. The table needs to be sorted by a string type column, followed by another string type column and finally by an integer type column.
Execution always failes with an Java heap space error.
To fix this problem I have tried to increase the heap space stepwise until
although my computer only has 8 GB of memory.
Do you have any ideas how to fix this problem?
In the memory policy tab of the Sorter node, does changing the option to "Write to disc" help
Simon, thank you for your answer!
I made a small mistake: I have got 36 million rows, not 3.6 million as mentioned before (I updated my initial post). The option "Write tables to disc" does not help, I tried this before.
I tried reproducing it but it works without any problems. I'm using the "data generator" node (36Miox83) and then sort by the class column + one of the number columns. The sorting completes in less than 5 hours (the CSV file is about 54GB) on a quite old desktop machine.
It also works with much less memory (-Xmx2G) but needs a little longer (5:20h ... more partitions are generated). I would claim it also works with 1G but I guess that's not the point?
Can you give more details on your set up (KNIME version?) or try reproducing it with the output of the data generator node or provide us with your data file (will send you an ftp account, yes).
The hard disk, where KNIME stores the temporary files, was running out of space while executing the node. The moment, the Java heap space error occures, the temporary files are deleted. So I haven't noticed that problem.