parallel node, low CPU

Hi,
I am having issues with parallel chunk node.
I used to run it in previous version of the software and it would occupy 100 % of CPU (if memory allowed).
But lately, it made barely a dent and workflows are much slower.
image
image

This is typical CPU usage:

This is my machine. using W10 Pro with KNIME 4.5.1
image

Anybody knows why?
Cheers,
MG

Hi, can you try this: On the screen you showed, click on the Memory Policy tab and then enable the “write to disk” option > “Apply” > “OK” > Re-run the node. You might wanna do this for all nodes that are slow.

If this doesnt help, wait till other people suggest something different e.g. Java heap space etc.

Never had good experience with the parallel loop nodes.

However are the nodes you use not multi threaded themselves? Then I would complain with ChemAxon/Info com. I mean you pay for this stuff so expecting proper performance in 2022 isn’t too much to ask. RDKit nodes are if possible multi-threaded.

Hi,
I have used them in the past with parallel and it worked like a charm.
I think something changed with never version of the software.
I tried to find an answer i.e. changing heap size, but nothing helps.
Maybe somebody has a workflow that for sure works with parallel and is able to occupy full CPU.
That way I at least would be able to confirm that it is not parallel but rather ChemAxon issue.
I think I will also contact ChemAxon, just in case.
Cheers,
MG

my point was merely that the nodes themselves should already be multi-threaded. have you tried that? if not, it’s rather a poor implementation probably caused by the license checking, Maybe that changed and you need to pay more to make use of more cores. a common theme with proprietary software nowadays.

Hi,
I have contacted them and it works on their end.
I will try installing newer version of KNIME as their test was on 4.5.2 and I am using 4.5.1.

“my point was merely that the nodes themselves should already be multi-threaded” -
You are right, according to them it should be parallelized already, so there is no need for parallel node.
I have tested just Standardizer node and still was barely using my CPU (while it used full in their case).

“pay more to make use of more cores” - I have not thought of that, will ask them to clarify.

I have tried this and sadly did not solve the problem.

I have done some testing and seems I have made a mistake making this thread.
When used with RDKit the parallel node did utilise full CPU.
Hence, parallel chunk node works fine, it is specific component of ChemAxon Sandardizer node that is lagging.
When running each option separately I have noticed that only Tautomerization and Mesomerization are not using full CPU whereas other options do. These two options created a bottleneck which resulted in whole process being very slow. I will report it to ChemAxon and delete this thread as title is misleading.

Note that if possible RDKit nodes use all threads already anyway, no need for the parallel loop.

I was not aware, last time I looked at optimising workflows was a few years back. Since then I did not bother to check, seems I will have to rethink most of them.
Thanks for pointing it out :+1:

The issue seems to be more complex as the node works fine from my office computer.
The problem is not license depended as they do not operate like this.
After speaking with ChemAxon they confirmed to work on per user not on per process.
I was also given unlimited users license for testing and still no difference.
This is very baffling.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.