Data save in Workspace

Hello,
As KNIME stores intermediate data on disc under the workspace folder, is there any setting to disable this and only save the data in the memory?
To save the HD space, I know way to remove the folder after workflow finished. Well looking for ways to improve the speed during the node-to-node processing.

Thanks.

BR,
Zy

hello @Zyw,

when you save the workflow you can check the reset box in the bottom left.
but there are some nodes mentioned in this thread which maybe will help you.

br,
sven

1 Like

Hello @sven-abx

Thanks for your reply. Test it and work well.

Btw, what about the memory release? Sometimes working with long work flow and big data. When the data has been transferred to the post node, is there any way to release the memory in the previous node? thank you.

Regards,
Ziyang

@Zyw you could try and use the /Run Heavy Garbage Collector garbage collection node or garbage collection node.

Additional performance measures are collected here

1 Like

hello @Zyw,

as far as I know nodes should “release memory” on their own, because the data which is currently used is stored in the memory and nodes are working the data of the previouse node. if you like to reduce the memory usage, then you can switch to "write tables to disc"in the “memory policy”-tab in the node.
image

but i don’t know how huge the impact is, because i never used it really.

br,
sven

1 Like

Hello @Zyw,

if you are looking for speed increase and don’t have need for intermediate results maybe you should check KNIME Streaming Execution (Beta) extension:

Here is blog post about it:

Also with KNIME 4.3.0 extension that boosts performance was introduced - KNIME Columnar Table Backend. Check here blog post about it:

And for completeness I’m adding link to documentation about knime.ini configuration options with additional explanations:

https://docs.knime.com/latest/analytics_platform_workbench_guide/index.html#configuring-knime-analytics-platform

Hope this helps!

Br,
Ivan

4 Likes

Hello @ipazin
The streaming function is great. This is what I’m looking for.
I test it by a WF, and it saves 60% process time.
And thanks for the blog, now I understand how the KNIME process data node by node.

Well, when I try another one, shown up a warning as below:

Original flow:
image

Component with Stream:
image

So, Loops node are not supported by simple stream, right?

Regards,
Zy

1 Like

Hello @Zyw,

loops are not supported in streaming mode. Now thinking about it that makes sense, doesn’t it? See here how to parallelize (seems parallelization is actually what loops need) Group Loop:

Br,
Ivan

As a general comment loops are extremely slow in knime. So the best performance optimization one can do is to find a way to not need the loops. What are you doing with a double group oop start by and math node? can’t you do that with 1 or 2 groupby nodes?

Hello @kienerj
Yes, it can be done by groupby nodes. I was testing the stream function so tried different ways.
thanks.

Zy

Hello @ipazin
Thank you. :grinning: