Order of operations & parallel execution

kevinnay · May 29, 2024, 4:27pm

I’ve got about ten db sql executors running ETL queries in tandem, as there are no dependencies between them. However, I have a loop node that will grant access across all the tables created, but I’m trying to find a logical means of forcing it as a last step.

Right now I’m guessing at the longest running node and using that flow variable output to force the grants to wait until that has completed. However, if I’m wrong or it happens to run short of another at any point in time then it will attempt to grant to tables that have yet to complete.

If there was some way to tie all the nodes to the grant node then it would have to wait for all, but most nodes only accept so many inputs so I’m wondering if I’m faced with having to create a node that operates as a traffic cop or if there is a node like that that already exists.

takbb · May 29, 2024, 11:17pm

Hi @kevinnay, as you have said, you can use flow variable connections to control the flow. The answer to the problem of a large number of flows, is the Merge Variables node. You can dynamically add ports to it.

Classic UI

Modern UI

This can then be linked to your remaining node to ensure that the final GRANT only occurs after everything is completed.

You might event end up with some “modern KNIME art” like below!

Some thoughts though… Do you have multiple database connections? If so, great. If not, read on…

Parallel or “appearance of parallel” execution?

2024-05-3000-45-46-ezgif.com-video-to-gif-converter

In the above mock-up I would doubt that my DB SQL Executors would actually run in parallel since they are all on the same JDBC connection, and JDBC drivers to my knowledge are normally “single threaded”. Therefore whilst they may all appear to be processing in parallel, (and KNIME will help with that illusion by producing an animated status bar on each of the nodes), each would be likely be blocking the thread and so they would actually be running serially with each DB SQL Executor waiting its turn.

I haven’t looked at all databases and drivers, so it is possible there are exceptions, but I always assume that a single db connection can process only a single statement at a time.

Multiple DB Connections

This may be the kind of thing you are doing already. The following would probably come closer to parallel execution. As I have 5 connections used by 10 DB SQL Executors, I’d expect there to be 5 statements now able to process concurrently:

2024-05-3000-57-04-ezgif.com-crop

So, the part where I attempt to answer the question you actually asked… the Merge Variables receives the flow from each of the “parallel” DB SQL Executors. Only once they have all processed, does it pass flow to the final DB SQL Executor, and then once that has completed, flow is passed to the DB Connection Closer which closes all of the database connections and keeps the DBA happy.

I’ve uploaded the above (runnable) demo workflows to the hub. They’re actually quite therapeutic to watch in modern UI

kevinnay · May 30, 2024, 8:38pm

You know I’ve used that, but not for this purpose, but I can see how overloading the ports would be a viable solution in my case. Thanks for the input and the examples.

system · August 28, 2024, 8:39pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.