KNIME performance on Azure FIles

We recently migrated our company file system to run on Azure Files. Performance generally seems very good, no difference to the data center we were previously using.
One large difference is the KNIME GUI is much slower than before. It takes along time to read the workflows from the file system and the GUI regularly freezes when running workflows. Opening and closing workflows also takes longer.

My temporary files are saved to the local disk so I don’t see why it should freeze so much. Any ideas how I might debug this to see the reason?

Example form the Error log:

eclipse.buildId=unknown
java.version=17.0.5
java.vendor=Eclipse Adoptium
BootLoader constants: OS=win32, ARCH=x86_64, WS=win32, NL=en_US
Command-line arguments:  -os win32 -ws win32 -arch x86_64

org.eclipse.ui.monitoring
Info
Tue Jun 20 08:34:03 CEST 2023
Sample at 08:33:59.692 (+5,320s)
Thread 'main' tid=1 (WAITING)

Stack Trace
	at java.base@17.0.5/jdk.internal.misc.Unsafe.park(Native Method)
	at java.base@17.0.5/java.util.concurrent.locks.LockSupport.park(Unknown Source)
	at java.base@17.0.5/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
	at java.base@17.0.5/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source)
	at java.base@17.0.5/java.util.concurrent.locks.ReentrantLock$Sync.lock(Unknown Source)
	at java.base@17.0.5/java.util.concurrent.locks.ReentrantLock.lock(Unknown Source)
	at org.knime.core.node.workflow.WorkflowLock.lock(WorkflowLock.java:122)
	at org.knime.core.node.workflow.WorkflowManager.lock(WorkflowManager.java:651)
	at org.knime.core.node.workflow.WorkflowManager.getIncomingConnectionsFor(WorkflowManager.java:1719)
	at org.knime.core.ui.wrapper.WorkflowManagerWrapper.getIncomingConnectionsFor(WorkflowManagerWrapper.java:247)
	at org.knime.workbench.editor2.commands.DeleteCommand.<init>(DeleteCommand.java:165)
	at org.knime.workbench.editor2.actions.NodeConnectionContainerDeleteAction.createDeleteCommand(NodeConnectionContainerDeleteAction.java:110)
	at org.eclipse.gef.ui.actions.DeleteAction.calculateEnabled(DeleteAction.java:78)
	at org.eclipse.gef.ui.actions.WorkbenchPartAction.refresh(WorkbenchPartAction.java:131)
	at org.eclipse.gef.ui.actions.SelectionAction.handleSelectionChanged(SelectionAction.java:89)
	at org.eclipse.gef.ui.actions.SelectionAction.setSelection(SelectionAction.java:101)
	at org.eclipse.gef.ui.actions.SelectionAction.update(SelectionAction.java:124)
	at org.eclipse.gef.ui.parts.GraphicalEditor.updateActions(GraphicalEditor.java:458)
	at org.knime.workbench.editor2.WorkflowEditor.updateActions(WorkflowEditor.java:1698)
	at org.knime.workbench.editor2.WorkflowEditor.selectionChanged(WorkflowEditor.java:2847)
	at org.eclipse.ui.internal.e4.compatibility.SelectionService.notifyListeners(SelectionService.java:266)
	at org.eclipse.ui.internal.e4.compatibility.SelectionService.handleSelectionChanged(SelectionService.java:98)
	at org.eclipse.ui.internal.e4.compatibility.SelectionService.lambda$0(SelectionService.java:72)
	at org.eclipse.ui.internal.e4.compatibility.SelectionService$$Lambda$501/0x0000000801328428.selectionChanged(Unknown Source)
	at org.eclipse.e4.ui.internal.workbench.SelectionAggregator$1.run(SelectionAggregator.java:123)
	at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
	at org.eclipse.e4.ui.internal.workbench.SelectionAggregator.notifyListeners(SelectionAggregator.java:120)
	at org.eclipse.e4.ui.internal.workbench.SelectionAggregator$5.lambda$0(SelectionAggregator.java:220)
	at org.eclipse.e4.ui.internal.workbench.SelectionAggregator$5$$Lambda$1539/0x00000008012f1608.run(Unknown Source)
	at org.eclipse.e4.core.contexts.RunAndTrack.runExternalCode(RunAndTrack.java:59)
	at org.eclipse.e4.ui.internal.workbench.SelectionAggregator$5.changed(SelectionAggregator.java:220)
	at org.eclipse.e4.core.internal.contexts.TrackableComputationExt.update(TrackableComputationExt.java:105)
	at org.eclipse.e4.core.internal.contexts.EclipseContext.processScheduled(EclipseContext.java:356)
	at org.eclipse.e4.core.internal.contexts.EclipseContext.set(EclipseContext.java:372)
	at org.eclipse.e4.ui.internal.workbench.SelectionServiceImpl.setSelection(SelectionServiceImpl.java:34)
	at org.eclipse.ui.internal.e4.compatibility.CompatibilityPart.selectionChanged(CompatibilityPart.java:471)
	at org.eclipse.gef.ui.parts.AbstractEditPartViewer.fireSelectionChanged(AbstractEditPartViewer.java:247)
	at org.eclipse.gef.ui.parts.AbstractEditPartViewer$1.run(AbstractEditPartViewer.java:131)
	at org.eclipse.gef.SelectionManager.fireSelectionChanged(SelectionManager.java:156)
	at org.eclipse.gef.SelectionManager.setSelection(SelectionManager.java:314)
	at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setSelection(AbstractEditPartViewer.java:751)
	at org.knime.workbench.editor2.actions.delegates.AbstractEditorAction$SelectionRunnable.run(AbstractEditorAction.java:178)
	at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:40)
	at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:132)
	at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:4043)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3648)
	at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$5.run(PartRenderingEngine.java:1155)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:338)
	at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1046)
	at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:155)
	at org.eclipse.ui.internal.Workbench.lambda$3(Workbench.java:644)
	at org.eclipse.ui.internal.Workbench$$Lambda$259/0x0000000800ee6720.run(Unknown Source)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:338)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:551)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:156)
	at org.knime.product.rcp.KNIMEApplication.start(KNIMEApplication.java:191)
	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:203)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:136)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:402)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:255)
	at java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base@17.0.5/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base@17.0.5/java.lang.reflect.Method.invoke(Unknown Source)
	at app//org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:659)
	at app//org.eclipse.equinox.launcher.Main.basicRun(Main.java:596)
	at app//org.eclipse.equinox.launcher.Main.run(Main.java:1467)


Hi @bobpeers,
That’s curious. What seems to be happening here is that a KNIME thread (a program path executing in parallel) is waiting to acquire a lock so it can delete a node connection:

at org.knime.workbench.editor2.actions.NodeConnectionContainerDeleteAction.createDeleteCommand(NodeConnectionContainerDeleteAction.java:110)

Here the thread has to wait because some other thread holds the lock at the moment, and this is causing the delay. Now we would need to find out which other thread is preventing this thread from continuing. To do that, we need to run a command from the command line. jstack is a tool that can be used to print the state of all threads in a Java program, including information about locks. To use it, you first need to find the process ID of KNIME (you can use the jps command for that) and then you can run:

jstack -l [process-ID] > jstack-knime.txt

This will create a new file called jstack-knime.txt with the necessary information. Can you execute this command on your command line right when a freeze occurs?
Kind regards,
Alexander

2 Likes

Hi @AlexanderFillbrunn,

I can try that but where do I find the command jstack. Assuming I’m using the bundled java I don’t see that program in there.

Hi @bobpeers,
Good point. You’d need a Java JDK for that, e.g. from here. The JDK contains both jps and jstack.
Kind regards,
Alexander

OK, so I opened KNIME and expanded the KNIME Explorer tree to see the workflows which always triggers a freeze. The log from jstack is attached.
Appreciate the help :slight_smile:

knime.txt (56.0 KB)

Hi @bobpeers,
Thank you for providing the file. The likely offender can be found right in the first thread, the main thread.

at org.eclipse.core.internal.filesystem.local.LocalFileNatives.internalGetFileInfoW(Native Method)
at org.eclipse.core.internal.filesystem.local.LocalFileNatives.fetchFileInfo(LocalFileNatives.java:116)
at org.eclipse.core.internal.filesystem.local.LocalFileHandler.fetchFileInfo(LocalFileHandler.java:30)
at org.eclipse.core.internal.filesystem.local.LocalFileNativesManager.fetchFileInfo(LocalFileNativesManager.java:65)
at org.eclipse.core.internal.filesystem.local.LocalFile.fetchInfo(LocalFile.java:161)
at org.eclipse.core.filesystem.provider.FileStore.fetchInfo(FileStore.java:260)

Here in the last like you see fetchInfo being called and this call originates in our LocalWorkspaceFileInfo class. That means every time KNIME needs some info about a file in the workspace (does it exist, what permissions does it have set, when was it last updated, etc), it calls this method and subsequently I assume a network request goes out to Azure to retrieve the information. This causes the other thread to block as well, leading to the frozen UI. Right now, I’m afraid all you can really do about this is move your workspace to your local disk, so the fetchInfo call gets the info from the OS and no call to an external service via the network is necessary. I have informed our developers about this issue and they tell me that there won’t be any changes to the KNIME AP 4.7 release line, but the code changed in 5.0 and we will ask our QA team to make sure the new implementation performs better with workspaces on a network drive.
Kind regards,
Alexander

4 Likes

Thanks for looking into this. It’s strange that this wasn’t an issue before moving to Azure Files since I was also using a network location for the workflows before. Seems something must be different on Azure.

I’ll consider moving the workspace to the local disk until 5.0 is production ready.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.