Ensure one node does not start until another is done?

leslies · December 8, 2021, 3:30pm

Hi,

We have a workflow with a component and a meta node that we want to run in series. The component extracts data from a database and writes it in knime tables. The meta node reads from the knime tables and does some data cleaning steps and outputs eight tables of data which are used to generate excel files and emailed to end users. We also store data about the processing in a QA/QC database. Here is an image of the workflow:

We do not want the data processing meta node to run until the Data extraction component is finished. We added a flow variable connection between them as suggested in this forum article. The article suggests adding a flow variable connection between the nodes to force the second node to wait to start until after the first node finishes. We tried this but it doesn’t seem to work correctly in our development Knime server. Oddly, it does seem to work correctly in our production server, though. We are enhancing the workflow and the versions on the development and production servers are not the same, although we have compared the connector and do not see how they may be different.

This scenario is a little different because we are not passing a table or other element between the two nodes except a flow variable.

Is there another way to force them to run in series? Or perhaps we are missing a configuration parameter for one or both of these nodes?

We are using:
KAP(Client): Version 4.3.2
KS(Server) :version 4.12.2
Executor: Version 4.3.2

Any advice would be appreciated.

Thanks!

bruno29a · December 8, 2021, 4:30pm

Hi @leslies , I explained this graphically in another thread, but I can’t find it.

Basically, Knime executes nodes sequentially from left to right, meaning if 2 nodes are connected together, it will execute the node from the left first and then the one to its right.

Now, there are cases where you “can’t” connect 2 nodes, in that the output port type of the left node is not compatible with the input port type of the right node, or even cases where your left node does not have an output port (because there are no operations to be done in relation to that node, for example, Writers (Excel Writer, CSV Writer, etc), or Send Email, etc), or the right node does not have any input port (Table Creator, etc). In that case, you would connect them via the Flow Variable port.

For example, if I have this workflow and I execute the workflow:

Node 1, 2 and 3 will all start at the same time.

However, if I link the Node 4 to the Node 2 like this:

Node 2 will execute only after Node 4 is done. So when this workflow is executed, only Node 1 and Node 3 will start at the same time. Node 4 will execute only after Node 1 is completed, and Node 2 will execute only after Node 4 is completed.

You can have different variations, for example:

Node 2 will execute only after Node 4 is completed, and Node 3 will execute only after Node 2 is completed.

Another variation:

Node 2 and Node 3 will start at the same time, but only after Node 4 is completed.

EDIT: I went back and read more thoroughly what you wrote. So it looks like you already know about using the flow variable port for the sequential execution. When it comes to metanode, it can be a bit tricky, depending on what’s being done and what’s being linked inside the metanode. A metanode simply “summarize” part of your workflow, so the nodes within a metanode are independent. The metanode is not one object, so some nodes can start already if they “left” nodes are ready. With components, it’s a bit different. A component will not start until all input ports are ready.

Weirdly though, you mentioned that it’s working on your production server but not on your dev server. It could be pure coincidence in that your production server might be more resourceful and is finishing the Data extraction faster to the point that all is done by the time the Metanode is being executed. But it might not be guaranteeing that it’s making sure to start only after the data extraction.

Without seeing that’s in the metanode, it’s hard to tell you if it’s correctly structured or not.

leslies · December 10, 2021, 11:22pm

We found the issue. It was that we did not connect the flow variable in the metanode to the first node in the metanode. Once we did this, it ran in series.

bruno29a · December 10, 2021, 11:28pm

Hi @leslies, It is as I said, and that’s where metanodes are tricky. Happy that you found the issue. The nodes are individuals so they act independently, like they would if they were not in a metanode - again reiterating that a metanode simply makes your workflow look “clean”, it just groups nodes together, that’s it. Behaviour is the same as if they were not in a metanode.

system · June 11, 2022, 11:29am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.