CPUUtilization high and jobs failed - AWS - knime server 4.9

#7

localhost.2019-10-18.log (918.2 KB)

5:20 Am IST and 9:55 Am IST

0 Likes

#8

Hi,

The high CPU moments that you have mentioned coincide with KNIME Server reboot. Could it be that you have a set of very CPU intense workflows running, which pump the CPU utilisation and your machine automatically kills the process that consumes most CPU?

side remark, i also see that you have executor dying because of not being able to allocate memory (because you promise it up to 22 GB, but apparently at that moment there is no free space on the system left, but the max of 22 GB is not reached yet.) I suggest that you either adjust the max RAM on the executor or on the machine or amount of data processed in the workflows that run at that time.

Best,
Mischa

0 Likes

#9

I guess 22GB is more than enough…
and
Yes i have if CPU utilization go beyond 90% it will kill and reboot.

But How it can utilize 90%.?

what means MAX RAM ?
I still Dont have solution for this
Can you

0 Likes

Error - JVM terminated - Knime Closed automatically
#10

Hi,

my comment was not that 22G is not enough, but that it is too much, as you seem to run some other memory-demanding application in parallel and your machine does not have enough memory to fit all running processes. The maximum RAM is the max amount of memory, that JVM can occupy and it is controlled by the -Xmx parameter in knime.ini of the executor.

Is there a way to check when such kill protocol was issued? If yes, could you please find out times on the 17th and the 18th and we can compare those to the logged server reboots.

Percentage of CPU utilization depends on what workflows do you run, what are your machine specs and on what are other processes running in parallel with the server. We can not comment on it - only the user and the machine admins know what is going on.

Best regards,
Mischa

0 Likes

#11

Hi Navin,

a quick check-in, did you manage to get your machine under control?

Cheers,
Mischa

0 Likes

#12

not yet… as there is no parallel workflow running at the same time, I have given a separate time slot for each workflow.
one workflow at once

0 Likes

#13

Indeed, that’s a good idea. Just to be 100% sure: how do you make sure that the execution windows do not overlap, i.e. workflows can take very different time to finish depending on the workflow structure and on the input data.

0 Likes

#14

i know its good but… knime supposed to play role in both ways…

i guess i found isseu -
swap memory not freeing after workflow execution done , even after closing knime. - i need to reboot for this…
every ttime i check swape memory is above 85%
then i reboot machine then it come back to normal and jobs run fine…

but after many job execution again swap memory above 85%

any solution for this…

0 Likes

#15

image

as we can see in this image 2.7 memory but sometime it goes above 28gb

0 Likes

#16

0 Likes

#17

as this is on aws machine i cant see any swap space.
and currenlty its not releasing memory for next workflow…

0 Likes

#18

can anyone help me with understand
why so many knime process are running as given in screenshot

0 Likes

#19

Hi Navin,

thanks for further details the screenshots are particularly useful to get insights into the situation. The first screenshot from yesterday shows memory usage at ~2.5GB, which is not very insightful, as you have pointed out. But the screenshot from today looks very interesting. It indeed shows that you hit the memory limit on your machine. However, i do not see swap memory at above 85% but rather the virtual memory. Is that what you have meant in this comment? In this screenshot you see a sharp cut-off in the memory footprint. This happens, when the executor crashes, as it tries to allocate memory up to the limit that you specified with Xmx argument in the knime.ini (22GB, if i remember right) and fails to do so, because there is no enough memory on the machine. This seems to happens, because you have some other processes occupying ~12.6 GB of virtual memory in total. 32-12.6 < 22. Could you comment on what is the situation in which the previous? Have you restarted the server by then and there is another workflow running or is the previous situation kept as is? The two last screenshots were posted separated by 30 min in time and it is not clear what happens in between.

In general,you need to understand what eats into the memory and competes wit hthe server for memory resources. Note, that you should not expect the executor to release memory back to the system while the executor is up. It runs in a JVM and has internal memory management and will release memory only upon closer. Quick googling gives sevral discussion threads, and this comment seems to be a good a simple summary: Reset/Free WF memory

At this stage, while you investigate what else eats into ,memory i suggest that you reduce the maximum memory available to the executor to avoid executor crashes.

Cheers,
Mischa

0 Likes

#20

after server reboot , memory goes to normal…
then again i start using it…
once memory reach to limit then again i need to reboot to avoid crash.

I have given 26gb memory out of 32gb.
any solution where i can clear memory without reboot machine

0 Likes

#21

Shutting down the executor (which is most easy to do by shutting down the server, for which you can use the dedicated script in the bin folder) will also release the memory that is occupied by the JVM.

0 Likes

#22

if i stop executor then running workflow will stop…
its the same as reboot …

i just want suppose i am running workflow after success of workflow, memory should be released for next workflow.

0 Likes

#23

Indeed, any running workflows would be stopped, so in this way it is similar to rebooting.

There is a parameter in the server configuration file, which controls how long does a job stay in memory upon completion or waiting for a user interaction on the WebPortal before being swapped to disk. The parameter is com.knime.server.job.max_time_in_memory and the default value is 60 minutes. You can set it to 1m to trigger swap after 1 minute. See the “Job swapping” section of the Server Admin Guide for details. Note, that this will release the memory internally within JVM, but the memory will not be released to the system until the JVM is up.

Best,
Mischa

0 Likes

#24

Hi Navin,

As long as the job has not been terminated, KNIME may still hold on to some of the tables in memory. This is intended behavior and not necessarily a bad thing since, for instance, you might want to have a look at some intermediate or final results of the workflow using a table view, in which case it is beneficial to have the data in memory still.

KNIME 4.0 introduced a new table caching strategy that attempts to keep some least recently accessed tables in memory. While this strategy will make your average workflow execution much faster, it will use available memory more liberally and has been observed to severely increase the load of garbage collection on some systems / for some workflows. These changes could be responsible for the increased CPU utilization you are observing. For more details, see this forum post.

If you are having troubles as a result of this change, rest assured to know that we are working on taking load off the garbage collector. If you need a quick fix, you can switch to the less memory-consuming table caching strategy for KNIME 3.7 and earlier by putting the line

-Dknime.table.cache=SMALL

into your knime.ini.

Best,

Marc

2 Likes

#25

com.knime.server.job.max_time_in_memory
DONE - but no change
tried
but the memory will not be released to the system until the JVM is up.

Dknime.table.cache=SMALL
Done -but no change

can anyone help me with
where we have set value for

Xms -
Xmx -

updating java will be helpfull ???
How to do it ?

i have set
com.knime.server.executor.max_lifetime=6h
will be helpfull ??
will it abort my running job ??

0 Likes

#26

Hi Navin,

as said, JVM will not release the memory back to the system while the executor is up. But what we wanted to achieve with com.knime.server.job.max_time_in_memory was to release memory within JVM upon job completion and then JVM can re-allocate it within another job on the same executor.

com.knime.server.executor.max_lifetime allows to configure executor lifetime. More specifically, the docs say: " Specifies the time in minutes after which an executor is retired and a new instance is created (defaults to 1d ), negative numbers disable." Running jobs do not get killed, and the retired executor will be kept alive to let activity on it to finish. The consequence is that there is the second, “active”, executor started, when the old stops accepting jobs. The down side for you is that the old can block X GB memory and the new active one will not be able to get the memory that it needs as the total is constrained by machine resources. So at this stage I do not see it as a helpful solution.

Xmx is specified in the same file as Dknime.table.cache, that @marc-bux had recommended, namely in the knime.ini file of the executor. Reducing allowed memory that is allocatable by JVM is a good solution until one understands what else consumes memory on your machine or until the machine has more memory :slight_smile:

I do not see how updating Java would help, could you elaborate on the reasoning?

Cheers,
Mischa

0 Likes