Stanford Tagger - KNIME 2.9 - Java Heap Space

Hello,

first thanks for providing KNIME 2.9 with additional features.

I did the update yesterday and had the following behaviour regarding memory handling with Stanford Tagger node on a rather small corpus (17 documents)

Version 2.8: works fine

Version 2.9: much slower and ends up with

ERROR     Stanford tagger                    Execute failed: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space

Both same knime.ini

-XX:MaxPermSize=512m

I even doubled to 1024m, but same error message.

I reverted back to 2.8 and it works fine again.

 

Thanks for any help

 

Bernd

 

Hi Bernd,

thank you for your post. I see your point and can reproduce the problem. How many parallel threads are you using with the Stanford tagger node? For each thread the external tagger model has to be loaded into memory. Unfortunately some of the stanford models are really big, however there seems to be a problem that models are loaded not only once per thread and memory is not deallocated properly.

When memory is running short, the garbage collection starts which slows down the complete  process. This is why the workflow is slower when you run it with 2.9. When you increase the Xmx to 1500m it should work again. If you want to make use of more threads you need to increse the memory again.

Anyhow, this is definitely an issue of which i will take care.

Cheers, Kilian

I forgot to write that you need to set Xmx:

-Xmx1500m

not MaxPermSize.

Hello Kilian,

 

thanks for your fast reply.

Fyi:

I used 1 parallel tagging process

German hgc

Knime.ini setting -Xmx2048m

the process fails with lack of memory.

I doubled to -Xmx4048m
 now it runs through - so it is really hungry of memory.

Thanks for your effort.

Bernd

Hi Bernd,

the German fast model needs less memory than the German hgc. However, i will investigate the issue more closely.

Cheers, Kilian