Server memory efficiency

tiaandp · November 30, 2021, 7:25am

Hi all.
We have a medium server on a AWS t3.xlarge instance (4 CPUs and 16GB ram).

We run quite a complex flow. Is there anyway to make the server more memory efficient? For example dropping tables stored in memory earlier in the flow?

ipazin · November 30, 2021, 9:07am

Hello @tiaandp,

usual approach (other than taking more computing power) is to make a “complex flow” less complex. Don’t know how advanced user are you and how much are you able to share so not sure how feasible is this approach but worth thinking about it. (There are many topics and discussions around workflow optimization and users that are willing to help.)

Regarding making server memory more efficient - don’t think there is some special settings for Server that differentiates from KNIME Analytics Platform but not a Server expert so might be wrong on that one. What I have seen is KNIMErs making use of garbage collector nodes from Vernalis KNIME Nodes extension to free memory (but this is more related to above mentioned flow optimization).

Additionally here is a little bit on compute/storage considerations for KNIME Server Small/Medium on AWS (just for reference):
https://docs.knime.com/latest/aws_marketplace_server_guide/index.html#_knime_server_smallmedium

Br,
Ivan

MichaelRespondek · December 3, 2021, 9:13am

Hi @tiaandp,

In addition to the topics Ivan already mentioned (thanks for this) you could configure nodes with high amount of data to be processed to use the memory policy “Write tables to disc” instead of store it in memory. The latter option will also use the disc storage after reaching a specific amount of cells. This limit can be configured, for details please have a look at this.

Best,
Michael

tiaandp · December 3, 2021, 9:27am

Thanks so much everyone.

@MichaelRespondek is there an easy way to change the setting for all selected nodes?

We have about 500 nodes on the workflow which would take quite a while to individually change all the nodes.

MichaelRespondek · December 3, 2021, 9:57am

As you are executing this on AWS I assume you are running linux on it.

Then you could use a sed command to change all memory policy settings of a specific workflow. Please be aware that you are changing a workflow directly on the file system level so that you should backup the workflow folder first.

The following is tested using bash as a shell. Please change directory to the specific workflow folder. Make sure that you are located in the workflow folder and not in a workflow group to assign the settings to this workflow only.
Run this command:

# this bash command can be used to go through a workflow and set all nodes such that they write everything to disk always (CacheInMemory also works for achieving the opposite)
find . -name settings.xml -exec sed -i -e 's/CacheSmallInMemory/CacheOnDisc/g' {} \;

To achieve this for all workflows you can use the knime.ini parameter mentioned in the linked faq of my first reply and set this to the value 0 so that all tables are stored on disc in general.

Note: The usage of disc storage for all tables could prolong the execution times as the access is slower compared to memory. But it might help you dealing with a relatively small amount memory.

Best,
Michael

system · January 2, 2022, 9:58am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.