Python Script node and Jupyter notebook integration error

fbagirov · January 12, 2021, 7:21pm

I am trying to include the Jupyter notebook I developed into my KNIME workflow. My problem is that I can not pass the data from the CVS Reader into the notebook.

This is the flow:

This is the code I have inside the Python Script that suppose to assign input_table_1 to df variable and then pass it to the jupyter notebook:

The jupyter notebook is inside the KNIME directory and this is how it looks:

It seems that Python Script node assigns the input_table_1 to df variable, but does not pass it to Jupyter.

What might be an issue?

MarcelW · January 13, 2021, 1:13pm

Hi @fbagirov,

I am afraid this will not work the way you would like it to, given how our Jupyter integration is currently designed.
knime_jupyter.load_notebook loads the Jupyter notebook into the node’s “main” Python script as a separate Python module (comparable to what Python’s import statement does). In Python, the namespaces/scopes of the variables of two different modules are isolated from one another, that is, code inside the notebook cannot simply reference variables declared in the node’s main Python script. So some mechanism will be necessary to explicitly pass the required variables from the main script to the notebook. Right now, this is only possible by means of function calls (like the one in line 30 in your screenshot, where ìnput_table is explicitly being passed to my_notebook.sum_each_row). So the only solution that comes to mind would be to remodel your notebook such that all cells that require external input become functions:

def calculate_centroid(df):
  df['the_geom'] = df['the_geom'].apply(wkt.loads)

Come to think of it, it would be really cool to be able to do something like this:

notebook = knime_jupyter.load_notebook(notebook_directory, notebook_name, vars={'df' : df})

Would that be interesting for you? Then I would create a feature request for that (or some comparable mechanism).

Marcel

fbagirov · January 13, 2021, 6:27pm

What I am looking for is an ability to pass the data (input table) into my notebook, so the notebook could pick it up and process without the need to import the dataset again within the notebook.

I am not clear on how do I pass the input table from the node to the notebook and how to pass the output dataframe to the node so it could write it to the node.

If vars={‘df’: df} could serve that purpose, then yes.

MarcelW · January 13, 2021, 6:56pm

Yes, that probably comes closest to what you are looking for without having to change the notebook. I am going to add a feature request for that.

fbagirov · January 13, 2021, 9:05pm

Thanks. When should we expect this feature to be added? Next release?

mlauber71 · January 13, 2021, 10:16pm

One way (maybe not the most elegant one) could be to store the data in a parquet file and read it back into the jupyter notebook and later back to KNIME. Or you could store your data in a local database like SQLite.

MarcelW · January 15, 2021, 11:59am

I cannot promise. But I will update this forum thread as soon as we have a more detailed timeline.

system · July 16, 2021, 11:59pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.