Knime processing power blocking at every single step after finished flow

Gonzo · December 14, 2021, 11:02pm

I had installed the 4.3 version and it was getting worse and worse in efficiency during two days.

I have installed 4.5 to see if it could improve. I have a two flows in the same workflow with just about 50 nodes of grouping, combining columns, filtering, pivoting, nothing special. If I reset the nodes the whole workflow runs in about a minute, not more.

Once finished if I just try to click on a node knime begins to consume 70% CPU and doesn’t open the node for configuration until 15-20 seconds have passed

Intel i7 series 8, 16GB ram, 6GB reserved for Knime memory.

Any ideas?

Gonzalo

mlauber71 · December 14, 2021, 11:24pm

@Gonzo welcome to the KNIME forum.

You could try and give KNIME 10-12 GB but not more and not run any other jobs. Is this a Windows machine and does it have an SSD hard drive.

Then you could explore this collection about KNIME and performance

You might try:

Garbage collection
columnar table backend
splitting the workflow into several parts and calling them separately while resetting them at every start (maybe a last resort)

Gonzo · December 15, 2021, 12:36am

Thanks for your quick response,
Yes is a Win11
Two additional info:
I’m parsing a Json file (30 MB) Json read–>Json Path–>Ungroup
It seems that now something is going wrong because after ungrouping 30.000 items correctly the rest is totally disorganized this could be causing huge files… I guess

On the other hand, sometimes Knime works well but after opening several times the result tables to audit the results things begin to go super slow, just to continue opening those tables.

I will read your suggested posts, even if I know some of them already,

thanks again

Gonzalo

mlauber71 · December 15, 2021, 7:25am

@Gonzo I have no experience with Windows 11 as of now.

These things come to my mind when you mention a large ungroup (or any other complex operation for that matter):

the Cache node. Sometimes I have experienced that before complex operations after a long data manipulation stream KNIME ‘likes’ a Cache node in order to bring all the changes back to one place
then you could see if forcing the node to write something to disk can help, also if you try different underlying encodings in the knime.ini (-Dknime.compress.io=[SNAPPY|GZIP|NONE]) of the workflows, maybe try GZIP instead of SNAPPY (this might come at a cost in speed and will be the default for every workflow), Also see this entry.
try the columnar table backend. KNIME has not yet moved it to a standard but you can activate it for individual workflows
and then you could think about splitting the ungroup in chunks (If you data does allow that)

A comibination of these along with some more RAM and garbage collection (1 | 2 ) at the start maybe might bring you across the finish line

Gonzo · December 15, 2021, 8:15am

Thanks again,

yesterday I solved part of my problems

I’m using a “Json path” node, as my data is already a list I have to pass it through that node before ungrouping.

The “Json path” node comes by default with a “$…*” parameter, I just kept that untouched. The result is that the node worked… almost well, so no error but not well.

Just writing “$." instead of "$…” has solved the problem.

In addition the output tables that in origin had 1,3Mlines were multiplied x6 number of lines, now they stay clean.

I’l update if I’ve solved also the memory problem, in any case, it should be less critical

Gonzalo

bruno29a · December 15, 2021, 2:07pm

Hi @Gonzo , it may be a good idea to understand what the JSON path statements mean.

The "$..*" means take everything. I think Knime adds this by default so that if you don’t do anything inside the node, the node simply acts as a pass-through. Similarly with nodes such as Python script, Java script, etc, by default Knime will output what the nodes are receiving as input, so that they act as pass-through in case you don’t do anything.

That being said, if you do not need all the data, then do not use the default $..* path. Use only the paths that you need.

Make sure you understand what you are doing, especially if you are going to re-use the workflow with another input. While this is working with your current data, it might not work with another data if you are going by trial and error.

EDIT: Adding some good resources for JSON Path:

ipazin · December 15, 2021, 3:18pm

Hello @Gonzo,

and welcome to KNIME Community!

This seems related to JSON Path node. See this topic for more info:

Additionally have added +1 to existing ticket mentioned in above linked topic.

Br,
Ivan

Gonzo · December 15, 2021, 4:15pm

Bruno,
thanks for your answer, I normally try to be very structured and understand what I do. Let me share my experience:

Json read + ungroup–>doesn’t work even if I don’t have to select any info, I just want everything
Json read + Json Path + ungroup–> works

If Json path has “$…asterisk” works but gives lots of stranges results after the correct ones
If Json path has “$.asterisk” works well

From the beginning I’m conscient that I just want to take all the info from the Json file and ungroup it, so super simple.

This has also solved the memory issues (probably it has drastically reduced the size of the files)

If you find any explanation for it I’m really interested I spent about 10 hours to find this simple change

Gonzalo

system · June 16, 2022, 4:15am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.