Execute loops in parallel

phick · January 8, 2024, 2:10pm

Hi all,
I’m working on a workflow that needs to carry out two tasks each using chunk loops; the first part is very quick, but needs to be run on 1 row at a time while the second part is slower, but can run on large numbers of rows at a time.

At the moment, the workflow runs the first loop until it is complete and only then does it start on the second loop. Is it possible for the second chunk loop to start execution as soon as enough rows are fed out of the first chunk loop? The workflow will ultimately be run on some large inputs and it feels as if running the loops in parallel could greatly reduce the overall execution time.

It’s possible this will just result in overall slowdown, but I’d like to test it with some benchmarking nodes on my tests before running deploying it to the large real datasets…

AlexanderFillbrunn · February 1, 2024, 1:14pm

Hi,
Unfortunately, there is not a really good solution for this. You could write the data into a local database (e.g. SQLite) in one loop and then continuously read from the DB in another loop, but it’s a hack. KNIME’s concept of a node completing before the next one starts makes this tough.
Kind regards,
Alexander

phick · February 1, 2024, 3:43pm

As it happened, the issue was far less than I’d feared, once I set it up with my large data sets the performance as a linear sequence was fine and i’ve been able to do what i needed.
I was probably just over thinking it!

system · May 1, 2024, 3:43pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.