Using KNIME Workflows in Jupyter Notebooks

duydu · August 31, 2019, 1:41pm

Hi,

I have a question regarding this post https://www.knime.com/blog/knime-and-jupyter

As far as I understood, wf.data_table_outputs contains two output tables:
wf.data_table_outputs[0] = summary data
wf.data_table_outputs[1] = details data

My question is how do I define the order of output tables. For example, if I want wf.data_table_outputs[1] = summary data, what should I do?

Thank you!

Best regards,
Du

HansS · September 1, 2019, 8:30pm

Hi @duydu

I suggest within your python environment you make a copy of wf.data_table_outputs[1] and give it a different name e.g.:
summary_data = wf.data_table_outputs[1].copy()

gr. Hans

duydu · September 2, 2019, 7:39am

Hi @HansS,

Thanks for the answer. But my problem is not to assign wf.data_table_outputs[1] to summary_data. I think it is from KNIME side, not within python environment. I am sorry that I did not make the question clear.

According to the KNIME workflow (in the post) there are two output tables: one contains the summary of the data (Table 1) and and the other contains the details of the data (Table 2).

Now if I call the workflow from a jupyter notebook:

As expected, wf.data_table_outputs will have two tables, and the first table has the summary level of the data. In other words, the first table is Table 1.

But my question is: How do I define the first table to contain the summary of the data? I guess this should be done in the Container Output (table) or somewhere else in the KNIME workflow, but I just do not know how.

Best regards,
Du

HansS · September 2, 2019, 7:17pm

Hi @duydu

I think you have a clear question. But I don’t the answer to this question. That is why I suggest “the workaround/solution” to rename/copy your output tables afterwards in your python environment. So I am curious why you want to assign the name within the KNIME environment. This may help to find a solution.

gr. Hans

duydu · September 2, 2019, 8:37pm

Hi @HansS,

Suppose the KNIME workflow has three output tables namely Table A, B, and C. If I call it from a jupyter notebook, I should know which table data_table_outputs[0], data_table_outputs[1] and data_table_outputs[2] correspond to.

The only way I am aware of now is to look at each table in data_table_outputs and compare with all tables in the KNIME workflow. If Table A contains user information and when I run data_table_outputs[1].head(), I also see user information, then I know that data_table_outputs[1] corresponds to Table A. It is ok but really inconvenient. I hope that there is a better solution.

I have just found out that somebody already asked a similar question in github (quite long ago), but unfortunately, nobody has answered yet.

gist.github.com

https://gist.github.com/greglandrum/d721822e8973ff8438cf2af283c9271e

KNIME_from_Jupyter_demo.ipynb

{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# Using KNIME Analytics Platform from Python\n\nThe new [knime module for Python](https://pypi.org/project/knime/), released at the same time as [v3.7 of KNIME Analytics Platform](https://www.knime.com/knime-analytics-platform-37-highlights), allows KNIME workflows to be called from Python and integrates nicely with the Jupyter notebook.\n\nHere we'll demonstrate using a KNIME workflow that performs a complicated database query (the query is against a PostgreSQL instance of [ChEMBL24](https://www.ebi.ac.uk/chembl/) that we have running in AWS) that is parameterized on user-supplied information. \n\nThe results of the query are summarized in a couple of different ways and made available in two separate output tables."
    },
    {
      "metadata": {},
      "cell_type": "markdown",

This file has been truncated. show original

Thank you!

Best regards,
Du

potts · September 5, 2019, 7:00am

Hi @duydu –

Regarding your question about the order of the outputs in data_table_outputs: there is no convenient way of knowing what the order of the output tables will be in the current implementation. To describe in more detail, the knime.Workflow class’s code currently determines the order of the output tables (the order observed in wf.data_table_outputs in your example) based upon the order pathlib.Path.glob discovers settings.xml files for each node in the workflow. (I am ultimately referring to the current implementation of knime.find_service_table_node_dirnames().)

A couple of things occur to me:

I definitely agree that it would be convenient to know which output is which beforehand when accessing wf.data_table_outputs on workflows containing more than out Container Table (Output) node. I have added this as an enhancement request to the knimepy project on GitHub (see https://github.com/knime/knimepy/issues/17).
It is perhaps worth pointing out that a user can currently take full control of the order of the input and output tables by using the functional apis (e.g. use run_workflow_using_multiple_service_tables()) instead of the object-oriented api that knime.Workflow conveniently provides.
Though not a universal solution, if it is possible and appropriate to combine the output from both portions of a workflow into input for a single Container Table (Output) node, then you can effectively control exactly how this data will be passed back from KNIME into your Python code.

Also, regarding the similar question asked on github, the question asked by mbezy at that link you provided is about inputs, not outputs. Identification of which input is which is already possible via the knime.Workflow.data_table_inputs_names and knime.Workflow.data_table_inputs_parameter_names attributes. Of those two, the data_table_inputs_parameter_names is probably the far more helpful one to use because that label can be easily changed in the node’s configuration dialog in KNIME.

Davin

duydu · September 12, 2019, 12:31pm

Hi Davin,

Thank you very much for your detailed answer. I know that the question asked on github is about inputs, not outputs. I just thought that they were similar, therefore I mentioned it. But anyway, it is clear to me now.

Thanks again!

Best regards,
Du Nguyen

system · March 13, 2020, 12:31am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.