Hi everyone. I’m desperately hoping that some one can help me here, because I can’t for the life of me see a solution and I’m running out of options.
I’ve been running KNIME to build classifiers for high dimensional data for a while now. ~400 features, ~300 observations. The worksheets that I’m running are highly iterative in that there is forward feature selection, hyperparameter optimisation and bootstrapping. Unfortunately, when I try to run these worksheets, I very quickly run out or memory. I’m not running these on a laptop or desktop, I quickly run out of memory on a server with 320GB of RAM.
I’ve searched through all the previous posts on memory management and I have switched on the garbage collector button at the bottom of the KNIME screen, I have included the Varnalis Heavy garbage collector node in my worksheets and I have set the -Xmx2048m switch in KNIME.ini. None of these things work! (Help!) The % of server RAM that the java virtual machine occupies still grows and grows once the worksheet has started, until the server has no choice but to kill KNIME (with the error Memory pressure relief: total: res = 122757712/12275712/0, res+swap=7475200/7475200/0).
If I were coding the analysis, it wouldn’t have any memory problems as I could simply reuse the same vectors/matrices in each iteration. The only way that I can think to explain the huge growing memory demand would be if all the tables created on each iteration of every loops is retained in memory. 1000 bootstraps x 50 hyperparameter optimisations x 1000 FFS = ~50,000,000 iterations, so I could see how this would clog up the server memory if all the data for 50,000,000 iterations of the loops were retained.
The leads me to ask about garbage collection. I know nothing about how the Java VM works, so how does it know which data structures to mark for garbage collection? Is there a way within KNIME to specify at the end of a loop that the variables created inside the loop are to be sent to garbage?
Help! If I can’t find a solution, I’ll have to abandon KNIME and start coding the analysis in python.
Thanks for reading this long post. The contents of my knime.ini are below if it helps.
Thanks for any help you can provide.
Steve.
(base) [steve@stratmed0 knime_4.6.0]$ more knime.ini
-startup
plugins/org.eclipse.equinox.launcher_1.6.100.v20201223-0822.jar
–launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.2.100.v20210209-1541
-vm
plugins/org.knime.binary.jre.linux.x86_64_17.0.3.20220429/jre/bin
-vmargs
-Darrow.enable_unsafe_memory_access=false
-Darrow.memory.debug.allocator=false
-Darrow.enable_null_check_for_get=false
–add-opens=java.security.jgss/sun.security.jgss.krb5=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.jgss=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.jgss.spi=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.krb5.internal=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UseG1GC
-Dsun.net.client.defaultReadTimeout=0
-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot
-Dknime.xml.disable_external_entities=true
–add-opens=java.base/java.lang=ALL-UNNAMED
–add-opens=java.base/java.lang.invoke=ALL-UNNAMED
–add-opens=java.base/java.net=ALL-UNNAMED
–add-opens=java.base/java.nio=ALL-UNNAMED
–add-opens=java.base/java.nio.channels=ALL-UNNAMED
–add-opens=java.base/java.util=ALL-UNNAMED
–add-opens=java.base/sun.nio.ch=ALL-UNNAMED
–add-opens=java.base/sun.nio=ALL-UNNAMED
–add-opens=java.desktop/javax.swing.plaf.basic=ALL-UNNAMED
–add-opens=java.base/sun.net.www.protocol.http=ALL-UNNAMED
-Xmx2048m
-Dorg.eclipse.swt.internal.gtk.disablePrinting
(base) [steve@stratmed0 knime_4.6.0]$