How to improve row filter performance?

Hi,
I have to do ~200 times of row filtering within a loop that takes most of the time of the loop. Now the total filtering does take about 10 seconds what is 0,05 s per row filtering. That does you seem much but it add and is to much for a liver operation.
I already have done this:

  • parallelize the loop, so I have now 8 parallel loops. (the number of CPU cores)
  • put a “Cache” node before the loops. This improves performance by factor of 6-12!
  • use a “row filter” and no “row splitter”

Do you have other ideas?

Hello @spider,

So you run loop around 200 times with Row Filter within and some other operations or? Anyways I would try to create loopless solution. Is that possible?

Br,
Ivan

1 Like

I do interpolation (via missing value node) and the loops are for separating the different signals in the same data table. If I interpolate without loops I get interpolation errors at the “border” of the signal.

Hi @spider , can we know what you are filtering on? May be creating an index, or a shorter version of the field to filter might speed up the filtering

1 Like

What about Group Loop?

2 Likes

Hello @spider,

I see. Guess you are filtering then cause you are not using Group Loop as mentioned by @izaychik63?

Br,
Ivan

I already use “Group Loop Start”. But only one is to slow and takes ~25 seconds so I use 8 parallel group loops.

Hello @spider,

so your issue is similar/same as here?

Guess you are using simulation of parallel Group Loop?

Br,
Ivan

Can you share a workflow?
Maybe a loop can be avoided?
br

Hi @spider , as I said, if you explain what data you are filtering, perhaps we can help optimize the search for filtering.

You mean an index for one or more columns in the database context? I did not know you can do it in KNIME. You mean the rank node as already told by you on Add index column with respect to group - #2 by morpheus . Does it really have speed advantages for filtering?

I did some improvements due to your hints.
I have 500.000 rows in 220 groups and do for each group the linear interpolation. I already tried to put the sorter out of the loop but it had no performance improvement.
Grouping with the rank node has no improvement if I consider the time for the rank node and the cache node has now also no advantages anymore.
This lasts 10-12 seconds.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.