integrated deployment: PortObject Reference Reader executed within the context of the active workflow vs the workflow read ?

Hello,

When capturing and writing a workflow in e.g. a learning workflow locally and then reading this workflow in a separate workflow, e.g. a production workflow, also locally, everything works just fine, that is except until the written workflow has to use of PortObject Reference Reader, e.g. to restore any model objects and alike.The issue is that the latter is executed within the context (working directory, data area) of the active workflow instead of the context of the workflow that has been actually read. As a consequence PortObject Reference Reader produces an error according to which the relevant files cannot be not found.

In the KNIME Integrated Deployment Guide, it is indeed mentioned that:

The Workflow Executor node can only execute a workflow that is captured from the same workflow the Workflow Executor node is part of.

Does this mean that Workflow Executor cannot work properly under the above conditions and will only work properly under the condition that you capture, write, read and execute a workflow within the same workflow ?
Under this assumption, I wonder whether it would make sense to provide Workflow Executor with the capability to refer to the data area of the workflow read by Workflow Reader instead of the data area of the workflow it is being executed in, thus enabling PortObject Reference Reader to point to the correct path ?

Kind regards
Geo

Hi Geo,
I think your diagnosis is spot-on and your assumptions are correct. However, you may be able to execute the workflow using the Call Workflow Service node instead of the combination of Workflow Reader + Workflow Executor. Would that be an option for you?
I think your proposed solution is hard to implement because the way the Workflow Executor works is that it loads the workflow into a hidden metanode (you can see it if you enable the debug option in the Workflow Executor’s configuration) and then executes it. The metanode is part of the workflow with the Workflow Executor and changing the context of the metanode is not possible at the moment.
Kind regards,
Alexander

1 Like

Dear @AlexanderFillbrunn

Following further analysis on my side, I wonder whether or not there is a bug in Workflow Executor.

Using the official example on KNIME Hub → 01_Integrated_Deployment_Example, I have tried executing the written workflow from a workflow other than where the workflow has been written. And it works. This appears to contradict my assumptions here above.

This has lead me to further testing on my side regarding my own use case, which is as follows: predict a Naive Bayes model on a document vector. The prediction workflow, once written via Integrated Deployment, looks as follows:

  • two portobject reference reader nodes as input, i.e. one with a Naive Bayes model, another with a document vector;
  • one table input.

Under this configuration, Workflow Executor fails on both portobject reference reader nodes because it does not seem to find the source workflow’s data area.

However, if I modify the prediction workflow such that I only need the Naive Bayes model input (and inside the prediction workflow I then tediously import the document vector using a table reader and cell to model), the written workflow looks as follows:

  • one port object as input, for the Naive Bayes model;
  • one table input.

Interestingly, in this case, that is one port object, Workflow Executor has no trouble at all accessing the data area of the source workflow.

What does this exactly mean ?

Kind regards
Geo

Hi,
Thanks for your effort! Would it be possible for you to provide screenshots so it is easier for me to understand the differences? And is the Workflow Executor your only option, or is the Call Workflow Service node a feasible alternative?
Kind regards,
Alex

Hi,

Ok, I will try to provide screenshots later today.

As for your suggestion:

  • Call Workflow Service yields a workflow similar to the Model Factory, which is quite burdensome to set up: sending and receiving json configs, saving temporary files, packing and unpacking the model objects …

  • Call Workflow (Table Based) does not appear to be compatible with the Document type, otherwise that would be still yield a decent alternative workflow. See my ticket here: Call Workflow (Table Based) does not appear to support Document type

I currently see two workarounds for my use case:

  • changing the workflow as specified in my reply above, that is one single port, even though it is more fiddly because it still involves packing and unpacking the document vector. However, it is less burdensome than Call Workflow Service;
  • certainly terrible but it works: manually copying the source’s workflow’s data area objects to the executing workflow’s data area and then using Workflow Executor as advertised.

Kind regards
Geo

Hi Geo,
The Call Workflow Service node also supports passing flow variables and other port types to the called workflow. Would that not alleviate the issues with the JSON configs? You could even pass a table with Binary Object columns to send over files and models (Files to Binary Objects, Model to Binary Object nodes).
Kind regards,
Alexander

Hi @AlexanderFillbrunn,

Thank you for all those suggestions. I know about CWS’s ability to pass flow variables via the JSON object. Beyond this, despite having been able to adapt the model factory and running it successfully for the last 4 years, I still only have basic knowledge about the Call Workflow Service node itself.

Nevertheless, it all feels far more complex to set up than with integrated deployment. With the latter, the whole process simplifies very nicely. I will post some screenshots later. Maybe that will clear things up a bit regarding my usage.

Kind regards
Geo

Hi @AlexanderFillbrunn,

Here are the screenshots: (what you see is the debugging view of Workflow Executor)

Scenario 1- does not work

The prediction workflow, once written via Integrated Deployment, looks as follows:

  • two portobject reference reader nodes as input, i.e. one with a Naive Bayes model, another with a document vector;
  • one table input.

Under this configuration, Workflow Executor fails on both portobject reference reader nodes because it does not seem to find the source workflow’s data area.

image

image

More particularly, the error message of the PortObject Reference Reader(s) is Execute failed: <path to the data area of the consuming workflow instead of the written workflow> (The system cannot find the file specified)

Scenario 2 - works

However, if I modify the prediction workflow such that I only need the Naive Bayes model input (and inside the prediction workflow I then tediously import the document vector using a table reader and cell to model), the written workflow looks as follows:

  • one port object as input, for the Naive Bayes model;
  • one table input.

Interestingly, in this case, that is one port object, Workflow Executor has no trouble at all accessing the data area of the source workflow.

image

The obvious drawback of scenario 2 is that I have to write the trained document vector separately (instead of storing it with Workflow Writer), connect that Table Writer node via a flow variable with the Capture Workflow Start node to ensure that the timing is right and then in the consuming workflow manage the path to the document vector.

Scenario 2 tells me that accessing the written workflow’s data area via Workflow Executor is not the source of the problem because it obviously works as it should. Quite puzzling :slight_smile:

Kind regards
Geo

Hi Geo,
Sorry for the long wait! I just tested it and seem to not be able to confirm your results. Please see my attached screenshot. As far as I understood, this should not work, right? I also imported the generated workflow in a completely separate workflow and could execute it there as well. Do you see a difference in my approach compared to yours?
Kind regards,
Alexander

1 Like

Hi @AlexanderFillbrunn

Thank you for taking the time to investigate the issue that I have been encountering. I can confirm that this does not work for me on my KNIME client version 4.6.4 build nov23.

I have visually identified the following differences:

  • Document Vector does not create bit vectors in my case (relevant option unchecked) but one column per feature. I don’t think this should matter but you never know;

  • the sub-flow Doc Vector --> Category to Class --> Naive Bayes Learner is embedded into a metanode, and the sub-flow Doc Vector Applier --> Naive Bayes Predictor is embedded into another metanode:

image
(nevermind the first output of the Learn metanode, it does not flow into the Apply metanode and thus is not captured, as intended)

image

  • the workflow which contains Workflow Writer is not the same as the one that has both Workflow Reader and Workflow Executor.

One other note: the workflow actually written by Workflow Writer in the writing workflow is located in another folder compared to the workflows that are reading or writing the said workflow, that is something like this: ../../subfolder. This is to avoid that I accidentally modify the automatically written workflows.

Nevertheless, I am happy to see that it appears to work in your case. I can check whether or not further simplifying the internals of my workflow would yield the same result for me.

Kind regards
Geo

Hi Geo,
I tried the following things:

  1. Disabled “Bitvector” in the Document Vector node
  2. Add Vector Applier and Naive Bayes Predictor into a Metanode
  3. Tried executing the generated workflow in another workflow

And none of that have any influence on the execution. But I would be interested in your results if you investigate further!
Kind regards,
Alexander

Hi @AlexanderFillbrunn

Thank you for the feedback. Have you checked this with the currently available KNIME version ?
I will keep you posted on the progress.

Kind regards
Geo

Hi,
Yes, I checked with KNIME AP 4.6.3. I don’t think any changes were made between that and 4.6.4.
Kind regards,
Alexander

Hi @AlexanderFillbrunn

Ok, I will try to replicate your workflow to see if this also works on my machine. So far, I have not managed to get it to work with my workflow, despite the simplifications. I will also try to fiddle more with the configuration settings of the capture nodes.

Kind regards
Geo