Running Jupyter in KNIME best practices

I recently find out Jupyter Notebook can be run in KNIME, which opens many more opportunities for KNIME.

Spent some time playing around, I am thinking about what should be the best practices for moving forward.

My Jupyter notebook is under anaconda virtual environment, so I have Conda Propagation Environment node to enable Python Source or Python Script nodes running under a virtual environment.

The Python Source/ Script will run Jupyter Notebook. So how should I structure Jupyter Notebook, eg:

  1. 1 function contains many features in NB, and 1 Python node run 1 NB function. The output is the final result.

NB:
def A(a):
b = a + 3
c = b + 5
return c

So in Python script only run function A with a to get c.

  1. Multiple functions/ features in NB, Python node run each function. But I don’t know how to pass through data between functions in NB. eg,

NB:
def A(a):
b = a + 3
return b
def B(A):
c = b + 5
return c

How should I finally run function B to get answer c, by putting a to function A?

  1. 1 function/ feature in NB, and use multiple Python nodes to run each NB function, eg,

NB ONE:
def A(a):
b = a + 3
return b

NB TWO:
def B©:
c = b + 5
return c

So the first Python node run NB ONE and return b, pass b to second Python node to run NB TWO and return c.

Any idea what is the best practice?

Hi @anguslou

When it comes to best practices, it is worth keeping in mind that the authors of Jupyter themselves have tried to make it clear that the Notebook is not intended and should not be used as a substitute for Python modules, Python’s go-to mechanism for delivering blocks of reusable Python code.

The counter-argument roughly goes, “but Notebooks are just so much fun I wanna use them for all the things!” and it is true that Notebooks are super fun and rewarding to use as the awesome interactive tools that they are. The lack of clarity or even borderline confusion around how to rationally use them in the contexts you describe serves as a strong clue that there is a bit of a mismatch going on or otherwise something isn’t quite right.

So, when should we use Notebooks through KNIME (thinking about KNIME in particular now)? When we are using the GUI for KNIME AP to help us explore data and play out concepts, building a workflow incrementally as our ideas come together, opportunities may arise where we think, “oh, I know what to do for this next step – I have it already worked out in this Jupyter Notebook,” then we have a quick way to access a Python function from a Notebook and quickly continue on our path of discovery and creation. We can even coordinate our continued exploration interactively via the Notebook alongside our interactions with KNIME – life is good, this is productive and fun! As our workflows mature out of this rapid-discovery-exploration mode, we start to replace shortcuts made in our workflows with something more robust and easier to share with and explain to others. This tends to also prompt thinking about where should the Python functions / classes / etc. live and be organized for the longer term. This is sometimes where we stop leveraging the Notebooks opportunistically as a source of reusable blocks of Python code.

You are asking about best practices which also means you are thinking about the longer term, not only the quick, short-term wins. My read of your example scenarios is: if you consider organizing your example functions into reusable Python modules (the purpose-built, right tool for the job), you are able to import and access any of those functions in any of the Python Scripting nodes you add to your workflow, without complication.

Not everyone has experience creating Python modules so let me offer a quick recipe and offer a neat KNIME feature to combine with them: If you have a useful function, named “magic_formula()”, that you would like to use 1) in the regular Python interactive shell, and 2) in a KNIME Python Scripting node, and 3) in a Jupyter Notebook, create a new file on your system in your current working directory (more on that in a moment) named “my_cool_stuff.py” and copy-paste your function definition into that text file. That is the full recipe – you now have a viable, reusable Python module. To use your “magic_formula()” function in any of those 3 places, first type “import my_cool_stuff” followed on the next line by “my_cool_stuff.magic_formula()” and it will run your function. A neat feature in KNIME allows you to ensure your new custom Python module goes wherever your KNIME workflow goes – if you place the “my_cool_stuff.py” file inside the directory for your KNIME workflow, it will both be available to all your Python Scripting nodes and it will be included whenever you export your workflow to send to someone else or to upload to a KNIME Server. So your reusable Python module can go wherever your KNIME workflow goes.

I hope the above helps and I hope it is clear that I am both a big fan and a heavy user of Jupyter Notebooks. What I wrote should not be interpreted as discouraging the use of Notebooks but I am a strong advocate of using the right tool for the job. I also hope my pitch for employing reusable Python modules and including your custom Python modules in your workflow’s directory proves useful.

Davin

5 Likes

Not sure about best practices but here is a basic example how to integrate knime and Jupyter amd python modules.

Please note. You can just import certain cells from the notebook and activate them for the knime python node.

1 Like

Hi Davin,

Very insightful and practical idea. I think this is best we can do with KNIME, python script and notebook.

Angus

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.