RAM needed for Table Editor with large table

tttpham · October 4, 2019, 7:28am

Hello everybody,

We have a large table with about 20 columns and 230000 lines (saved in CSV format, ~950 Mo).
We must use Table Editor to view all the table, and let the user edit some information (manual process, unfortunately).

Actually, if we have OOM each time we view all the table. Someone here has an idea how many RAM needed to use Table Editor with such a large table ?

Thanks in advanced,

Thanh Thanh

lisovyi · October 4, 2019, 11:30am

Hi @tttpham,

in general, the memory footprint depends on the types of your columns. so it is hard to estimate. Plus, there could be memory clashes with other workflows running in parallel (imagine you have another heavy workflow that eats up to 95% of available memory, then).

Where do you face this issue? If it happens on the Server, did you try to increase memory available on the executor (by using -Xmx parameter in knime.ini in the executor installation)? What is load on the server, when this problem occurs?

Having more information will help to understand how to move forward.

Cheers,
Mischa

tttpham · October 4, 2019, 12:29pm

Hi Mischa @lisovyi,

I have this error on Knime Server. At the moment, just this one workflow is executed.
I’ve just re-done the test, with -xmx12g and now I have Java heap space error…
And about my table, all the columns are of String type.

Thanks

Best regards,
Thanh Thanh

lisovyi · October 4, 2019, 12:58pm

Hi,

the problem here is that strings can be veeeery long, and thus take a lot of space. For example, each cell could contain anything from a single character to a book. Therefore it is difficult to estimate in advance how much memory does one need. In principle, the disc space could be a good proxy, but I’m not sure there is 1-to-1 correspondence.

How much memory does your server machine have? Could you monitor the server to see how much free RAM is there before you start the workflow? The problem could be that KNIME tries to get additional RAM for processing as long as current limit is bellow the allocated maximum (12 GB in your example), but there is no enough free resources to secure that much requested RAM.

Cheers,
Mischa

tttpham · October 4, 2019, 1:12pm

Hi Mischa,
Thanks for your quick reply. I know that we need more RAM for this, but just do not know how much @@ so I try my luck here with this question

And here is what I have before starting the workflow :
total used free shared buff/cache available
Mem: 15G 3,7G 8,2G 240K 3,7G 11G
Swap: 2,0G 38M 1,9G

And when it crashes, here what I have :
total used free shared buff/cache available
Mem: 15G 11G 176M 240K 4,1G 3,9G
Swap: 2,0G 38M 1,9G

Thanks :),

Thanh Thanh

lisovyi · October 4, 2019, 2:08pm

Hi,

thanks for memory details. OK, it does make sense now. So to my understanding it seems that you run out of memory and your maximum allocatable memory is larger than what you have free at hand (12 GB > 8 GB). I managed to re-rpoduce your problem by using a random ~900 MB file with strings and in my case execution reached a plateau of ~6 GB RAM (even before Table Editor I had a large number).

I’m not aware of a way to find the memory used by objects stored in individual nodes. Three alternative ideas:

try to run your workflow in a local AP, ideally on a machine, where you can allocate a lot of memory. Enable heap status bar (File -> Preferences -> General - > Show heap status) and see how does it evolve and at what value does it plateau. The trick is that the total memory footprint can be large evolving over previous nodes before it even reaches Table Editor.
set the executor memory to something fitting your machine (maybe 6 GB, if in a stalled situation there is only 8 GB free). Then try to execute the workflow gradually increasing the number of lines that your read from the file. Start with 1% of lines, then 5%, … Find where it breaks and thus where all 6GB is exhausted. Get a rough estimate of required RAM by rescaling 6 GB by last working fraction.
a dummy question: do you really need to present to the user all 230k rows and 20 columns? Can the user go through ALL those rows? Maybe one can iterate and present to the user chunks of data only or a subset of relevant rows/columns?

Cheers,
Mischa

tttpham · October 9, 2019, 1:19pm

Hi Mischa @lisovyi,

Sorry for the late response. I just manage to do as you say to have 1 estimation on this.
What works locally for me : about 4G for 50000lines with 20 columns. With this, I have the table editor viewed with the data and I can edit.
However, when I increase the number of rows to like 75000 or above, I see it consume more RAM (and still in the limit that I set for knime), the node is marked as executed, but I do not have the navigator showed, no errors @@
Uhm, I’ll go with 4G for 50000 lines and ask for the estimated 25G RAM. I’ll see if it do the job…

And for your question, yes, the user want to have all the data at a time, with all the filters on columns, so that they can analyze the corpus as a whole and validate each line.

Thanks and have a nice day,

Thanh Thanh

tttpham · October 23, 2019, 11:55am

Hi @lisovyi,

I’ve got our server upgraded. Now I can allocate -Xmx48g for Knime.
I’ve always the Java heap space problem if I want to view all the table (at the time of Java heap space, the RAM consumed to ~15g ).

Finally, I excluded 2 “big” columns and now the WF can view the Table Editor with all the ~230000 lines and 17 columns. (with RAM used : 13g and I have 45g free)

However, I can not edit any thing in the table. When I click on the editable cell, nothing happens, no error showed.

You have any idea about this problem? It lies with the Table Editor ? There’s a limit or something like that of Table Editor that I have missed ? For more information, I activate the paging, multiple selection, table and individual column search.

Thanks in advanced,

Best regards,

Thanh Thanh

lisovyi · October 29, 2019, 4:22pm

Hi,

I had a chat with our developers. Their verdict is that this node, unfortunately, will not work with massive amounts of data (i.e. GBs of RAM). This is in the pipelin and will be possible in the future once lazy loading is implemented for the Table Editor on the WebPortal. However, this feature is not ready yet and depends on completion of several other technical components. Therefore, it most likely will not be available in the next 4.1 release

For the time being, as a work around, you can implement some type of lazy loading as a separate step, i.e. there would be two WebPortal pages- first the user would be presented with a small data preview (e.g. a hundred rows) together with a set of sorting switches (one for each column) or a selection of a single column to sort on. Plus some filtering options (some ranges and some single/multiple selections for categorical features). Then as a next step the user would get the editable table of small size (the first N rows after application of sorting and filters from the first step), where N is a small number (~100).

Unfortunately, in the discussion we did not come up with a better idea.

Best,
Mischa

tttpham · October 30, 2019, 7:40am

Hi @lisovyi,

Thanks for your confirmation about the Table Editor limit. We’ll re-discuss with the user to do as you suggested, in order to limit the entry lines for Table Editor.

Just for info : we used the Table Editor with all the options as I mentionned, for the 50000 lines and ~15 columns, it worked well.

Thanks anyway!

Have a nice day,

Thanh Thanh

system · November 29, 2019, 3:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.