multiport support for Parallel Chunk Loops

We would like to replace Zip Loops by Parallel Chunk Loops in order to parallelise our workflows.
ZipLoopStart/End have multiple ports. But ParallelChunkStart has not. How can we replace ZipLoopStart?
Could a parallelised loop become the default loop in Knime? And not merely an add-on?

To be honest, I never used the OpenMS Extensions before, so my suggestions come without warranty :wink:

There are three obvious obstacles in changing from ZipLoop to Parallel Chunk loops:

  1. The different port types: Parallel Chunk Loops can only handle KNIME tables as in- and output, whereas ZipLoop or OpenMS nodes in general work with URI Ports. I just substituted the OpenMS Input Files nodes with common List File nodes. Later on in the loop you can just transform the ports with a URI to Port node to make them accessible for the other OpenMS nodes.
  2. The Parallel Chunk Start node has only one input and one ouput port. If you want to be able to use data from two different sources, you have to join your data sets in a way such that all the information for one iteration of the loop is available in one single row of your input data set.
  3. You can’t use a ZipLoopEnd node to end the loop, but have to use a Parallel Chunk End node instead. So you need to transform the processed data back into one or more (with Parallel Chunk End multi port node) KNIME tables. And honestly this leaves me little puzzled. In my example I just used a Port to URI node to get back a KNIME table, but I guess for your use case some additional steps are needed.

One minor obstacle is that information about row IDs from your origin data set gets lost in the chunk loop when transforming back and forth the output types, but you can easily overcome this with RowID and Column Appender nodes (see attached screenshot).

I hope this helps.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.