integrated deployment: PortObject Reference Reader executed within the context of the active workflow vs the workflow read ?

Hello,

When capturing and writing a workflow in e.g. a learning workflow locally and then reading this workflow in a separate workflow, e.g. a production workflow, also locally, everything works just fine, that is except until the written workflow has to use of PortObject Reference Reader, e.g. to restore any model objects and alike.The issue is that the latter is executed within the context (working directory, data area) of the active workflow instead of the context of the workflow that has been actually read. As a consequence PortObject Reference Reader produces an error according to which the relevant files cannot be not found.

In the KNIME Integrated Deployment Guide, it is indeed mentioned that:

The Workflow Executor node can only execute a workflow that is captured from the same workflow the Workflow Executor node is part of.

Does this mean that Workflow Executor cannot work properly under the above conditions and will only work properly under the condition that you capture, write, read and execute a workflow within the same workflow ?
Under this assumption, I wonder whether it would make sense to provide Workflow Executor with the capability to refer to the data area of the workflow read by Workflow Reader instead of the data area of the workflow it is being executed in, thus enabling PortObject Reference Reader to point to the correct path ?

Kind regards
Geo

Hi Geo,
I think your diagnosis is spot-on and your assumptions are correct. However, you may be able to execute the workflow using the Call Workflow Service node instead of the combination of Workflow Reader + Workflow Executor. Would that be an option for you?
I think your proposed solution is hard to implement because the way the Workflow Executor works is that it loads the workflow into a hidden metanode (you can see it if you enable the debug option in the Workflow Executor’s configuration) and then executes it. The metanode is part of the workflow with the Workflow Executor and changing the context of the metanode is not possible at the moment.
Kind regards,
Alexander

1 Like

Dear @AlexanderFillbrunn

Following further analysis on my side, I wonder whether or not there is a bug in Workflow Executor.

Using the official example on KNIME Hub → 01_Integrated_Deployment_Example, I have tried executing the written workflow from a workflow other than where the workflow has been written. And it works. This appears to contradict my assumptions here above.

This has lead me to further testing on my side regarding my own use case, which is as follows: predict a Naive Bayes model on a document vector. The prediction workflow, once written via Integrated Deployment, looks as follows:

  • two portobject reference reader nodes as input, i.e. one with a Naive Bayes model, another with a document vector;
  • one table input.

Under this configuration, Workflow Executor fails on both portobject reference reader nodes because it does not seem to find the source workflow’s data area.

However, if I modify the prediction workflow such that I only need the Naive Bayes model input (and inside the prediction workflow I then tediously import the document vector using a table reader and cell to model), the written workflow looks as follows:

  • one port object as input, for the Naive Bayes model;
  • one table input.

Interestingly, in this case, that is one port object, Workflow Executor has no trouble at all accessing the data area of the source workflow.

What does this exactly mean ?

Kind regards
Geo

Hi,
Thanks for your effort! Would it be possible for you to provide screenshots so it is easier for me to understand the differences? And is the Workflow Executor your only option, or is the Call Workflow Service node a feasible alternative?
Kind regards,
Alex

Hi,

Ok, I will try to provide screenshots later today.

As for your suggestion:

  • Call Workflow Service yields a workflow similar to the Model Factory, which is quite burdensome to set up: sending and receiving json configs, saving temporary files, packing and unpacking the model objects …

  • Call Workflow (Table Based) does not appear to be compatible with the Document type, otherwise that would be still yield a decent alternative workflow. See my ticket here: Call Workflow (Table Based) does not appear to support Document type

I currently see two workarounds for my use case:

  • changing the workflow as specified in my reply above, that is one single port, even though it is more fiddly because it still involves packing and unpacking the document vector. However, it is less burdensome than Call Workflow Service;
  • certainly terrible but it works: manually copying the source’s workflow’s data area objects to the executing workflow’s data area and then using Workflow Executor as advertised.

Kind regards
Geo

Hi Geo,
The Call Workflow Service node also supports passing flow variables and other port types to the called workflow. Would that not alleviate the issues with the JSON configs? You could even pass a table with Binary Object columns to send over files and models (Files to Binary Objects, Model to Binary Object nodes).
Kind regards,
Alexander

Hi @AlexanderFillbrunn,

Thank you for all those suggestions. I know about CWS’s ability to pass flow variables via the JSON object. Beyond this, despite having been able to adapt the model factory and running it successfully for the last 4 years, I still only have basic knowledge about the Call Workflow Service node itself.

Nevertheless, it all feels far more complex to set up than with integrated deployment. With the latter, the whole process simplifies very nicely. I will post some screenshots later. Maybe that will clear things up a bit regarding my usage.

Kind regards
Geo

Hi @AlexanderFillbrunn,

Here are the screenshots: (what you see is the debugging view of Workflow Executor)

Scenario 1- does not work

The prediction workflow, once written via Integrated Deployment, looks as follows:

  • two portobject reference reader nodes as input, i.e. one with a Naive Bayes model, another with a document vector;
  • one table input.

Under this configuration, Workflow Executor fails on both portobject reference reader nodes because it does not seem to find the source workflow’s data area.

image

image

More particularly, the error message of the PortObject Reference Reader(s) is Execute failed: <path to the data area of the consuming workflow instead of the written workflow> (The system cannot find the file specified)

Scenario 2 - works

However, if I modify the prediction workflow such that I only need the Naive Bayes model input (and inside the prediction workflow I then tediously import the document vector using a table reader and cell to model), the written workflow looks as follows:

  • one port object as input, for the Naive Bayes model;
  • one table input.

Interestingly, in this case, that is one port object, Workflow Executor has no trouble at all accessing the data area of the source workflow.

image

The obvious drawback of scenario 2 is that I have to write the trained document vector separately (instead of storing it with Workflow Writer), connect that Table Writer node via a flow variable with the Capture Workflow Start node to ensure that the timing is right and then in the consuming workflow manage the path to the document vector.

Scenario 2 tells me that accessing the written workflow’s data area via Workflow Executor is not the source of the problem because it obviously works as it should. Quite puzzling :slight_smile:

Kind regards
Geo

Hi Geo,
Sorry for the long wait! I just tested it and seem to not be able to confirm your results. Please see my attached screenshot. As far as I understood, this should not work, right? I also imported the generated workflow in a completely separate workflow and could execute it there as well. Do you see a difference in my approach compared to yours?
Kind regards,
Alexander

1 Like

Hi @AlexanderFillbrunn

Thank you for taking the time to investigate the issue that I have been encountering. I can confirm that this does not work for me on my KNIME client version 4.6.4 build nov23.

I have visually identified the following differences:

  • Document Vector does not create bit vectors in my case (relevant option unchecked) but one column per feature. I don’t think this should matter but you never know;

  • the sub-flow Doc Vector --> Category to Class --> Naive Bayes Learner is embedded into a metanode, and the sub-flow Doc Vector Applier --> Naive Bayes Predictor is embedded into another metanode:

image
(nevermind the first output of the Learn metanode, it does not flow into the Apply metanode and thus is not captured, as intended)

image

  • the workflow which contains Workflow Writer is not the same as the one that has both Workflow Reader and Workflow Executor.

One other note: the workflow actually written by Workflow Writer in the writing workflow is located in another folder compared to the workflows that are reading or writing the said workflow, that is something like this: ../../subfolder. This is to avoid that I accidentally modify the automatically written workflows.

Nevertheless, I am happy to see that it appears to work in your case. I can check whether or not further simplifying the internals of my workflow would yield the same result for me.

Kind regards
Geo

Hi Geo,
I tried the following things:

  1. Disabled “Bitvector” in the Document Vector node
  2. Add Vector Applier and Naive Bayes Predictor into a Metanode
  3. Tried executing the generated workflow in another workflow

And none of that have any influence on the execution. But I would be interested in your results if you investigate further!
Kind regards,
Alexander

Hi @AlexanderFillbrunn

Thank you for the feedback. Have you checked this with the currently available KNIME version ?
I will keep you posted on the progress.

Kind regards
Geo

Hi,
Yes, I checked with KNIME AP 4.6.3. I don’t think any changes were made between that and 4.6.4.
Kind regards,
Alexander

Hi @AlexanderFillbrunn

Ok, I will try to replicate your workflow to see if this also works on my machine. So far, I have not managed to get it to work with my workflow, despite the simplifications. I will also try to fiddle more with the configuration settings of the capture nodes.

Kind regards
Geo

Hello @Geo,

I found the issue you described here very interesting, so I did my own experiments.

I believe you still can use Workflow reader and Workflow Executor in a separate workflow to run the captured/deployed workflow. However you should changed the architecture - instead of using Port Objects you can always replace them with Workflow Service Inputs. These special nodes also support all kinds of inputs: tables, flow variables, models, data base connections, etc.

The only difference is that in the Executor workflow you should read everything you need with Reader nodes and connect them to the Workflow Executor node. The drawback here is that you need to explicitly read the inputs, although it gives you flexibility to execute the deployed workflow in any environment you wish.

Recently I created a test workflow using Integrated deployment nodes, then I managed to copy the part from “Workflow in production” into a separate workflow. Then I explicitly read all files I needed and connected to the databases, so I managed to run the deployed workflow with Workflow Executor node.

I hope this example might be useful for you:

1 Like

Dear @AlexanderFillbrunn, sorry for the late information. Finally, I have been able to test the behaviour on your workflow based on the example 03_Sentiment_Classification. Long story short: I have managed to reproduce the error.

Find here below my findings, which should be interesting for you to test on your side.

Environment

KNIME version 4.7

I have two workflows (03_Sentiment_Classification), one to learn the model and to deploy the workflow, and another one to execute the deployed workflow (03b_Sentiment_Classification_Apply).

! Important ! The deployed workflows and the dataset to be predicted are both located in a subfolder ../test relative to the aforementioned two workflows.

Learner Workflow: 03_Sentiment_Classification in its original form

with the Metanote Apply looking like this inside:

Executor workflow: 03b_Sentiment_Classifcation_Apply

When does the Executor workflow fail ?

It fails when the Learner workflow is reorganised like this:

with the Metanode Generate Doc Vector and Learn Model looking like this inside:

The error is as follows:

and more precisely:

Observations

The error really only occurs when Document vector is inside the above Metanode and together with Naive Bayes Learner.

If you keep Document vector outside of the Metanode which contains Naive Bayes Learner, the consuming workflow will execute successfully. If you put each of them into their own separate Metanodes, the consuming workflow will also execute succcessfully.

I hope this helps.

Kind regards
Geo

1 Like

Hi @Geo,

thanks a lot for further investigating the issue! It’s fixed and will be released with 4.7.1.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.