MoSS data limits?

peterem · February 6, 2008, 10:21am

Hi All,

I've tried MoSS on several data set and I'm happy with the results. Because I want to make better use of the Active/inactive functionality I tried to enter our complete screening set (80k) for a certain project. I made sure all nodes were set to "write table to disc" since I found it some notes will fail when you leave this to the default setting.

After running for some time the MoSS failes with the message: java.lang.OutOfMemoryError: Java heap space
Changing memory settings of knime using -XX:MaxPermSize=1024M is not helping.
According to the task manager memory used is only about 320M anyway.

Here is the relevant part in the log file:

!ENTRY org.eclipse.ui 4 0 2008-02-05 16:56:51.056
!MESSAGE Failed to execute runnable (java.lang.OutOfMemoryError: Java heap space)
!STACK 0
org.eclipse.swt.SWTException: Failed to execute runnable (java.lang.OutOfMemoryError: Java heap space)
at org.eclipse.swt.SWT.error(SWT.java:3374)
at org.eclipse.swt.SWT.error(SWT.java:3297)
at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:126)
at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3325)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2971)
at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:1930)
at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:1894)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:422)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
at org.knime.product.rcp.KNIMEApplication.run(KNIMEApplication.java:76)
at org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:78)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:92)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:68)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:400)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:336)
at org.eclipse.core.launcher.Main.basicRun(Main.java:280)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at org.eclipse.core.launcher.Main.main(Main.java:952)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.(HashMap.java:203)
at org.eclipse.draw2d.DeferredUpdateManager.repairDamage(DeferredUpdateManager.java:280)
at org.eclipse.draw2d.DeferredUpdateManager.performUpdate(DeferredUpdateManager.java:179)
at org.eclipse.draw2d.DeferredUpdateManager$UpdateRequest.run(DeferredUpdateManager.java:46)
at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:123)
... 20 more
!SESSION 2008-02-05 17:14:23.828 -----------------------------------------------
eclipse.buildId=unknown
java.version=1.5.0_13
java.vendor=Sun Microsystems Inc.

unknown_user · February 6, 2008, 12:47pm

Hi Peter,

peterem wrote:

I've tried MoSS on several data set and I'm happy with the results. Because I want to make better use of the Active/inactive functionality I tried to enter our complete screening set (80k) for a certain project. I made sure all nodes were set to "write table to disc" since I found it some notes will fail when you leave this to the default setting.

After running for some time the MoSS failes with the message: java.lang.OutOfMemoryError: Java heap space
Changing memory settings of knime using -XX:MaxPermSize=1024M is not helping.
According to the task manager memory used is only about 320M anyway.

First, you need to set -Xmx1024m and not -XX:MaxPermSize=1024M. Second, changing the nodes buffering strategy will not help, as MoSS needs quite a lot of memory during mining the molecules. The results themselves will be rather small in the end. If setting -Xmx1024m will not help, you may either increase the minimum focus support frequency (memory requirements go up almost exponentially with the minimum support going down) or set the "Maximum embeddings used"-option to something > 0. This will hurt runtime a bit, but needs less memory.

Regards,

Thorsten

unknown_user · February 6, 2008, 1:43pm

thor wrote:

Hi Peter,

First, you need to set -Xmx1024m and not -XX:MaxPermSize=1024M. Second, changing the nodes buffering strategy will not help, as MoSS needs quite a lot of memory during mining the molecules. The results themselves will be rather small in the end. If setting -Xmx1024m will not help, you may either increase the minimum focus support frequency (memory requirements go up almost exponentially with the minimum support going down) or set the "Maximum embeddings used"-option to something > 0. This will hurt runtime a bit, but needs less memory.

Regards,

Thorsten

Aha, the -Xmx1024m lookes much more familiar to me. I got the unusual parameter setting somewhere from the documents or the forum and I found it a bit weird (and of course I can't find it anymore :oops: )

About the node buffer strategie. I'm sure you are right but it did help on an other node. I think it was one of the CDK nodes but I'm not sure. If you want to know I can try reproducing it for you.

I will try the options you suggest and provide you with the feedback.

Thanks,

Peter

unknown_user · February 6, 2008, 4:07pm

peterem wrote:

About the node buffer strategie. I'm sure you are right but it did help on an other node. I think it was one of the CDK nodes but I'm not sure. If you want to know I can try reproducing it for you.

The node buffering strategy affects how the result data of a node is handled. It can either be held complete in memory, completely written to disk or written to disk after a certain number of produced cells is exceeded. The MoSS node creates the result table at the very end, after the complete mining process has finished. So changing the buffering will very likely not change anything (except a huuuuge number of fragments have been found, but then it will crash before). Other nodes write to the output table constantly during their execution. Such nodes may be affected from the buffering strategy but still only if the result table is quite big (very large number of cells and/or big cells, like e.g. molecules, etc.). Hope this clarifies it a bit :wink:

Regards,

Thorsten

unknown_user · February 12, 2008, 5:30pm

Hi Thorsten,

I tried a lot of things now but it's still not working. Even transferred the hole thing to linux.
I set some of the parameters to ridiculous sizes in order to reduce the number of fragments to keep in memory.

I set the minumum focus support up to 80% (really not usefull but just try to get it working)
Min Fragment size to 22 (max still default 100)
Max Embeddings Used = 100 (which is the minimum).

Do you have other suggestions?

Thanks,

Peter

unknown_user · February 13, 2008, 9:59am

Hi Peter,

That is strange. 80,000 molecules should be doable with 1GB of RAM, at least if you restrict the search to fragments that contain at leas on hetero-atom, pure-carbon fragments may screw up the thing. But by default they are ignore anyway. When exactly does the node fail with OOM? During reading/parsing the molecules or during mining (the progress message should indicate that)?

Regards,

Thorsten

unknown_user · February 13, 2008, 2:18pm

Hi Thorsten,

It seems to be running OK now.
I did not apply the -Xmx1024m parameter properly. I added this to the knime.ini and it finally it does not collapse. (tonight I'm going to run it to an end but for now I need my PC)

Thanks for your help,

Peter