RAM needed for Table Editor with large table

#1

Hello everybody,

We have a large table with about 20 columns and 230000 lines (saved in CSV format, ~950 Mo).
We must use Table Editor to view all the table, and let the user edit some information (manual process, unfortunately).

Actually, if we have OOM each time we view all the table. Someone here has an idea how many RAM needed to use Table Editor with such a large table ?

Thanks in advanced,

Thanh Thanh

0 Likes

#2

Hi @tttpham,

in general, the memory footprint depends on the types of your columns. so it is hard to estimate. Plus, there could be memory clashes with other workflows running in parallel (imagine you have another heavy workflow that eats up to 95% of available memory, then).

Where do you face this issue? If it happens on the Server, did you try to increase memory available on the executor (by using -Xmx parameter in knime.ini in the executor installation)? What is load on the server, when this problem occurs?

Having more information will help to understand how to move forward.

Cheers,
Mischa

1 Like

#3

Hi Mischa @lisovyi,

I have this error on Knime Server. At the moment, just this one workflow is executed.
I’ve just re-done the test, with -xmx12g and now I have Java heap space error…
And about my table, all the columns are of String type.

Thanks

Best regards,
Thanh Thanh

1 Like

#4

Hi,

the problem here is that strings can be veeeery long, and thus take a lot of space. For example, each cell could contain anything from a single character to a book. Therefore it is difficult to estimate in advance how much memory does one need. In principle, the disc space could be a good proxy, but I’m not sure there is 1-to-1 correspondence.

How much memory does your server machine have? Could you monitor the server to see how much free RAM is there before you start the workflow? The problem could be that KNIME tries to get additional RAM for processing as long as current limit is bellow the allocated maximum (12 GB in your example), but there is no enough free resources to secure that much requested RAM.

Cheers,
Mischa

0 Likes

#5

Hi Mischa,
Thanks for your quick reply. I know that we need more RAM for this, but just do not know how much @@ so I try my luck here with this question :slight_smile:

And here is what I have before starting the workflow :
total used free shared buff/cache available
Mem: 15G 3,7G 8,2G 240K 3,7G 11G
Swap: 2,0G 38M 1,9G

And when it crashes, here what I have :
total used free shared buff/cache available
Mem: 15G 11G 176M 240K 4,1G 3,9G
Swap: 2,0G 38M 1,9G

Thanks :),

Thanh Thanh

0 Likes

#6

Hi,

thanks for memory details. OK, it does make sense now. So to my understanding it seems that you run out of memory and your maximum allocatable memory is larger than what you have free at hand (12 GB > 8 GB). I managed to re-rpoduce your problem by using a random ~900 MB file with strings and in my case execution reached a plateau of ~6 GB RAM (even before Table Editor I had a large number).

I’m not aware of a way to find the memory used by objects stored in individual nodes. Three alternative ideas:

  • try to run your workflow in a local AP, ideally on a machine, where you can allocate a lot of memory. Enable heap status bar (File -> Preferences -> General - > Show heap status) and see how does it evolve and at what value does it plateau. The trick is that the total memory footprint can be large evolving over previous nodes before it even reaches Table Editor.
  • set the executor memory to something fitting your machine (maybe 6 GB, if in a stalled situation there is only 8 GB free). Then try to execute the workflow gradually increasing the number of lines that your read from the file. Start with 1% of lines, then 5%, … Find where it breaks and thus where all 6GB is exhausted. Get a rough estimate of required RAM by rescaling 6 GB by last working fraction.
  • a dummy question: do you really need to present to the user all 230k rows and 20 columns? Can the user go through ALL those rows? Maybe one can iterate and present to the user chunks of data only or a subset of relevant rows/columns?

Cheers,
Mischa

1 Like

#7

Hi Mischa @lisovyi,

Sorry for the late response. I just manage to do as you say to have 1 estimation on this.
What works locally for me : about 4G for 50000lines with 20 columns. With this, I have the table editor viewed with the data and I can edit.
However, when I increase the number of rows to like 75000 or above, I see it consume more RAM (and still in the limit that I set for knime), the node is marked as executed, but I do not have the navigator showed, no errors @@
Uhm, I’ll go with 4G for 50000 lines and ask for the estimated 25G RAM. I’ll see if it do the job…

And for your question, yes, the user want to have all the data at a time, with all the filters on columns, so that they can analyze the corpus as a whole and validate each line.

Thanks and have a nice day,

Thanh Thanh

2 Likes