Batch Mode & Call Local Workflow - where is nested workflow output?

jan_lender · May 10, 2019, 11:09pm

I have got a Main Workflow that orchestrates three another workflows using Call Local Workflow (Row Based) nodes.

If I run a workflow from the console using batch mode, I can say where KNIME is supposed to generate workflow’s output using -destDir parameter. After the workflow finishes, I can import the output to KNIME IDE and investigate possible errors. But the only output I can see in the case I described above is an output of the Main Workflow. No outputs of the workflows the main workflow calls are produced.

Is there a way how to make KNIME produce outputs of workflows invoked from other workflows using Call Workflow nodes?

ipazin · May 13, 2019, 11:00am

Hi Jan!

Never used -destDir option. How it works?

Br,
Ivan

jan_lender · May 13, 2019, 11:19am

Hi Ivan,

as stated in knime -nosplash -application org.knime.product.KNIME_BATCH_APPLICATION output:

-destDir=… => directory where the executed workflow is saved to
if omitted the workflow is only saved in place

It generates a directory structure containing your workflow in the exact same state as you found it in your workspace after you invoked it from your IDE. It is great for a subsequent investigation: You just need to import the directory to your KNIME IDE in order to see what exactly happened to your workflow while it was running.

Sadly, this is not the case of the workflow another workflow calls using Call Workflow nodes. Only resources (folders, files) of the outer workflow are what you get. You can’t investigate nested workflows that way.

ipazin · May 13, 2019, 12:50pm

Hi Jan,

I see. How does your batch call looks like? Seems I’m doing something wrong…

Br,
Ivan

jan_lender · May 13, 2019, 12:54pm

Hi Ivan,

Here you are

#!/bin/bash

workflow_name=main-workflow-daily
run_identifier=knime_wf_${workflow_mame}_run_$(date +%Y-%m-%d-%H-%M-%S)

current_dir="$PWD"
nohup /home/ders/knime_3.7.1/knime  \
	-nosplash \
	-application org.knime.product.KNIME_BATCH_APPLICATION \
	-workflowDir="/home/ders/knime/workflow/ADS/PoC/pipelines/main_workflow" \
	-workflow.variable=months_to_past_to_load,9,int \
	-workflow.variable=log_directory,"/home/ders/knime/log",String \
	-workflow.variable=data_mart,fis,String \
	-workflow.variable=source_table_name_pattern,".*",String \
	-reset \
	-preferences=/home/ders/knime/knime-preferences.epf \
	-consoleLog \
	-destDir=$current_dir/out/${run_identifier}-out/ \
	-vmargs -Dorg.knime.core.maxThreads=2 -Xmx4g > ${run_identifier}.log 2>&1 &

Regards,
Jan

ipazin · May 13, 2019, 1:14pm

Hi Jan,

tnx, now it worked. I had -nosave parameter so it didn’t work obviously

As you already said this option only works on Main workflow. To get output from other workflows after execution maybe you can write logic in your script…

Br,
Ivan

jan_lender · May 13, 2019, 1:34pm

Hi Ivan,

Thank you for your support.

Well, I do have my own logging written. But even in a combination with the logs, KNIME produces it doesn’t give me a picture of what happened to the workflow in such detail I can see looking at a workflow in KNIME Analytics Platform.

I considered introducing call-workflow driven workflow orchestration to my project but not having an opportunity to investigate failed workflow importing its output to the KNIME Analytics Platform is not acceptable for me.

I’ve already thought of how big improvement of the platform it would be if the KNIME provided an extension point you could use in order to intercept node inputs and node output the way the Interceptor JEE concept or AOP concept work. Imagine you could register your own handler that would get notified about a type, name, id of a node, the data (model, variables, QuickForm inputs) the node receives and produces.

ipazin · May 13, 2019, 9:15pm

Hi Jan,

Not aware of Interceptor JEE concept or AOP concept work either so can’t imagine/see how that would work for KNIME. In my KNIME project I used to log what I thought is important to database. Wasn’t perfect and troubleshooting was pain but sometimes it was pain regardless of running it in batch or GUI mode. But that is up to my architecture and workflow design

Anyways I know there were some feature request regarding easier troubleshooting so we’ll have to wait and see what is about to come in newer versions

Br,
Ivan

jan_lender · May 15, 2019, 9:40am

Hi Ivan,

thank you for your reply.

Well, I developed logging support on my own. I made a set of Wrapped Metanode Templates that create log entries and write them to the filesystem. Why to the filesystem? Well, I used to save entries to the DB at the moment they were created. I stopped using this approach because of Database Writer nodes latency (because KNIME serializes DB operations as discussed elsewhere). Simply: this approach had an impact on the performance of the workflow. So I reworked the logging. My logging-related Wrapped Metanodes write every log entry as a small CSV file to a directory that belongs to the particular workflow run. The last node of my workflow is a Wrapped Metanode which saves the content of those CSV files to the database. But the purpose of this is performance monitoring. I wanted to know how long particular parts of my workflow took in order to reveal possible bottlenecks of my design If I wanted to produce a log suitable for problems investigation, I would have to scatter my Logging Wrapped Metanodes all over my workflow with a really huge impact on the time the workflow opens, starts, saves, on how the workflow looks like. Eventually, I’d doubled the number of nodes in my workflow if I wanted to log every single step.

I’m afraid we moved from the topic a bit now. I planned to make a topic about my opinion about how KNIME logs and how I think it should do so. Would you mind pointing me to the feature request you mentioned?

Regards,
Jan

ipazin · May 15, 2019, 10:18am

Hi Jan,

Nice approach. I like it.

Yes we did moved a bit from topic but tnx for detailed explanation. Believe someone might find it pretty useful in their work/design.

Couple if points/comments:

There is (Global) Timer Info node which can help find possible bottlenecks - for example you can do a report after execution is done and even some visualization of duration over time
new DB Integration nodes are coming out - plan is for them to be production ready in summer release (they are already in LABs for test). Believe this is a major one and should bring a lot of feature and performance upgrades
Feel free to share your opinion/ideas on logging in a separate topic. Additionally KNIME 3.8 version brings less verbose logging - changelog
regarding feature requests for troubleshooting here are two similar/same topics:
How Can I print a message to Knime Console from Java Snippet?
Search a Node in a Knime Workflow by Id (Goto & focus a Node by ID)

Br,
Ivan

jan_lender · May 15, 2019, 10:49am

Hi Ivan,

please don’t mention Timer Info node in this context. I tried using it months ago. In my opinion, it’s suitable for very, very, very simple workflows. At the time you introduce loops or even more nested loops, Wrapped Metanodes, multiple nodes of the same type to your workflow, it gets useless.

Let’s consider I process a number of tables in my workflow in a loop. I read the table, then I perform some magic with its content and I write it to its destination. Imagine the size of the tables varies between a couple of rows to millions of rows. If you want to know how long did it take to process each of the tables, the Timer Info can’t tell you the answer. The only option is to parse the console log. First, you set it’s level to the DEBUG. Then you run your workflow. Now you have to find the first and the last node of each iteration. Then, you have to bind each iteration to the name of the current table. You probably do it parsing SQL expression Database Reader or Writer node outputs. It’s not easy if you use an automatic logs-collecting tool like Logstash especially because the SQL is likely line-wrapped in the log.

But if you had an option to register your custom handler which gets notified every time any node is entered or left, you’d be fine with this.

ipazin · May 16, 2019, 8:41am

Hi Jan!

I agree with you! Timer Info node needs upgrade

Br,
Ivan

ipazin · May 22, 2019, 9:32am

Hi Jan,

just to catch up on this one. The Vernalis extension under Testing Category has some nodes for timing so maybe they can help you

Timer Info node won’t see any changes for now

Br,
Ivan

jan_lender · May 22, 2019, 10:25am

Hi Ivan,

Thank you for the hint. I’ll check it.

system · November 20, 2019, 10:25pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.