CPUUtilization high and jobs failed - AWS - knime server 4.9

lisovyi · November 8, 2019, 8:48am

Hi Navin,

thanks for further details the screenshots are particularly useful to get insights into the situation. The first screenshot from yesterday shows memory usage at ~2.5GB, which is not very insightful, as you have pointed out. But the screenshot from today looks very interesting. It indeed shows that you hit the memory limit on your machine. However, i do not see swap memory at above 85% but rather the virtual memory. Is that what you have meant in this comment? In this screenshot you see a sharp cut-off in the memory footprint. This happens, when the executor crashes, as it tries to allocate memory up to the limit that you specified with Xmx argument in the knime.ini (22GB, if i remember right) and fails to do so, because there is no enough memory on the machine. This seems to happens, because you have some other processes occupying ~12.6 GB of virtual memory in total. 32-12.6 < 22. Could you comment on what is the situation in which the previous? Have you restarted the server by then and there is another workflow running or is the previous situation kept as is? The two last screenshots were posted separated by 30 min in time and it is not clear what happens in between.

In general,you need to understand what eats into the memory and competes wit hthe server for memory resources. Note, that you should not expect the executor to release memory back to the system while the executor is up. It runs in a JVM and has internal memory management and will release memory only upon closer. Quick googling gives sevral discussion threads, and this comment seems to be a good a simple summary: Reset/Free WF memory

At this stage, while you investigate what else eats into ,memory i suggest that you reduce the maximum memory available to the executor to avoid executor crashes.

Cheers,
Mischa