What determines how far into a workflow I can start running?

lparsons42 · June 19, 2018, 1:14pm

I’ve been trying to get a handle on how this is determined and I haven’t been able to find an answer yet. If I build a linear workflow in KNIME that has 40 nodes, I may be able to start it from the last node or I may not. I may find that I need to start node 20, then wait for it to finish and then I can only go to 27 and wait for it to finish. What criteria determine this? I presume it has to do with what types of operations each node performs on the data, but it is not clear to me how to predict this.

As an example I currently have a workflow with ~200 nodes (including several metanodes). I would like to be able to hand it off to a collaborator and tell them to specify input and start the workflow from node 200 but that does not currently work. Right now to get through it, I need to start by starting node 27, then 48, then 112, etc … I can’t seem to find a relationship between the nodes that cause this.

In a previous, similar workflow I noticed that R snippets were often bottlenecks so I eliminated those from this version. This did have the beneficial side effect of speeding up the processing somewhat (while adding quite a few nodes) but I still have other similar points in my workflow.

Iris · June 30, 2018, 8:29am

Hi

sorry for taking this some time. I tried to recreate this.

In theory, you should always be able to try to execute all nodes in a workflow, or use the execute option also on red nodes. If this is not the case, it would be great if you could send me a mini workflow or a small description how I can recreate it, so we can look into this.

Best wishes, Iris

beginner · July 2, 2018, 5:59am

I can only confirm what Iparsons42 wrote. This is a common issue that you get over time accustomed to but when already frustrated after long trial-error can be very annoying . The “execute all” does indeed not always work / is available and execute on red nodes very often is not possible. I suspect it has to do with the table spec as nodes where table spec output is not clear tend to lead to this behavior. This means all scripting nodes (python/r) or nodes like correlation filter or low variance filter.
I never bothered to search for the cause and it usually only happens with a somewhat complex workflow so coming up with an example to provide here is an issue in itself.

lparsons42 · July 2, 2018, 7:52pm

Iris

Thank you for getting back to me on this, I was not aware that this had not been reported before (it is a rather difficult issue to search for). My current workflow that I’m seeing this on is several hundred nodes and takes input files that are hundreds of MB in size; I’ll spend some time trimming out the stickiest parts into a separate workflow that I can try to send soon.

thank you
Lee

Iris · July 2, 2018, 7:57pm

The problem is I was aware of it, but also could never reconstruct this in a smaller scenario. And for bugfixing it, we need to have a possibility for recreation.

I now got one workflow which has this problem which uses metanodes, loops, column filters. But it only has 10 nodes, so it is pretty nice. However, if you remove the metanode or change it into a wrapped one, the problem is gone. So to date we suppose the problem only occurs when using metanodes in the pipeline. If you find another example this would be great!

Thank you! Iris

lparsons42 · July 2, 2018, 8:12pm

I don’t know if it is significant here but I am currently using a number of OpenMS nodes in my workflows. I’m pretty sure I can pull out a workflow that has no OpenMS nodes that does this though. I have been making extensive use of metanodes lately as well though they do not consistently show this behavior of not being immediately executable (or of preventing the immediate execution of non metanodes that are immediately downstream from them).