Data save in Workspace

Zyw · February 19, 2021, 9:35am

Hello,
As KNIME stores intermediate data on disc under the workspace folder, is there any setting to disable this and only save the data in the memory?
To save the HD space, I know way to remove the folder after workflow finished. Well looking for ways to improve the speed during the node-to-node processing.

Thanks.

BR,
Zy

sven-abx · February 19, 2021, 11:09am

hello @Zyw,

when you save the workflow you can check the reset box in the bottom left.
but there are some nodes mentioned in this thread which maybe will help you.

br,
sven

Zyw · February 20, 2021, 1:32am

Hello @sven-abx

Thanks for your reply. Test it and work well.

Btw, what about the memory release? Sometimes working with long work flow and big data. When the data has been transferred to the post node, is there any way to release the memory in the previous node? thank you.

Regards,
Ziyang

mlauber71 · February 20, 2021, 6:16am

@Zyw you could try and use the /Run Heavy Garbage Collector garbage collection node or garbage collection node.

Additional performance measures are collected here

sven-abx · February 22, 2021, 10:01am

hello @Zyw,

as far as I know nodes should “release memory” on their own, because the data which is currently used is stored in the memory and nodes are working the data of the previouse node. if you like to reduce the memory usage, then you can switch to "write tables to disc"in the “memory policy”-tab in the node.

but i don’t know how huge the impact is, because i never used it really.

br,
sven

ipazin · February 22, 2021, 10:34am

Hello @Zyw,

if you are looking for speed increase and don’t have need for intermediate results maybe you should check KNIME Streaming Execution (Beta) extension:

Here is blog post about it:

Also with KNIME 4.3.0 extension that boosts performance was introduced - KNIME Columnar Table Backend. Check here blog post about it:

And for completeness I’m adding link to documentation about knime.ini configuration options with additional explanations:

https://docs.knime.com/latest/analytics_platform_workbench_guide/index.html#configuring-knime-analytics-platform

Hope this helps!

Br,
Ivan

Zyw · February 25, 2021, 6:22am

Hello @ipazin
The streaming function is great. This is what I’m looking for.
I test it by a WF, and it saves 60% process time.
And thanks for the blog, now I understand how the KNIME process data node by node.

Well, when I try another one, shown up a warning as below:

Original flow:

Component with Stream:

So, Loops node are not supported by simple stream, right?

Regards,
Zy

ipazin · February 26, 2021, 10:03pm

Hello @Zyw,

loops are not supported in streaming mode. Now thinking about it that makes sense, doesn’t it? See here how to parallelize (seems parallelization is actually what loops need) Group Loop:

Br,
Ivan

kienerj · March 1, 2021, 11:21am

As a general comment loops are extremely slow in knime. So the best performance optimization one can do is to find a way to not need the loops. What are you doing with a double group oop start by and math node? can’t you do that with 1 or 2 groupby nodes?

Zyw · March 2, 2021, 2:34am

Hello @kienerj
Yes, it can be done by groupby nodes. I was testing the stream function so tried different ways.
thanks.

Zy

Zyw · March 2, 2021, 2:35am

Hello @ipazin
Thank you.

system · August 31, 2021, 2:36pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.