Conda Environment Propagation Node isolation

kalimist · March 29, 2023, 7:10am

Folks,
I am a software engineer and I need to provide access to an external API from a Knime workflow. I need a number of python libraries to do this (boto etc) I have created a component that can be added to any workflow where this external API call is required. I believe this will allow me to choose the python version and library versions I need within the component without the potential of library/python version clashes with the rest of the workflow. Can someone explain to me how the conda propagation node achieves this python isolation? I am not overly familiar with anaconda. Python is not my primary language. I need to be able to explain how this isolation works.

alinebessa · April 4, 2023, 7:03pm

Hi @kalimist,

Conda is just a system for package and environment management, so if you properly* use the Conda Environment Propagation node, it guarantees that the library versions (and python version!) you specify are going to be installed in the machine and used in the workflow.

To properly use this node, follow the instructions outlined here.

Please let me know if this answers your question, otherwise I’d be happy to further help you.

DiaAzul · April 4, 2023, 10:32pm

@kalimist

Just to add a little more context as you mention that you are not familiar with Python (my apologies if this is all known to you).

Managing packages and dependencies is a problem that exists in many software environments. In the way back, when a computer may have one copy of Python and all the packages were installed with that copy, the risk of dependency conflicts was high. Over time, tools have evolved that allow the creation of multiple virtual environments for specific use cases - effectively, a copy of Python and all its dependencies within a single folder.

One of those package management tools is Conda. This tools was developed to create virtual environments where (a) the version of Python could be specified (prior package tools didn’t permit this) and (b) as many Python packages are bindings to other compiled libraries, tools to compile those libraries for the machine where the packages are being installed. It gained a lot of traction in the Data Science community as many mathematics packages are compiled from other languages (e.g. Fortran) and Conda provided a way to install the Python bindings and automatically compile libraries as required.

Conda permits the creation of named virtual environments into which both the required version of Python and packages are installed. It also manages dependencies to ensure that only compatible version can be installed together (in theory). Creating a new environment, with a copy of Python installed is as simple as:

conda create --name my_virtual_environment python=3.9

To activate this environment:

conda activate my_virtual_environment

Then to install packages:

conda install notebook

There are multiple tutorials around the internet to explain how to use Conda.

There are a couple of other tweaks that need explaining:

Anaconda is a commercial organisation that provides tools for managing conda environments and also a managed repository of vetted (for security) packages. In most situations this is not free and a license fee should be paid (check their website to determine whether you can use it for free or not).
The conda application itself is free; as is access to the conda-forge repository. When installing packages using conda use conda install -c conda-forge notebook to pull packages from the conda-forge repository. Remember, this is the internet, so if you are accessing the Anaconda repository without a license they will be logging your IP address (whether they choose to pursue you is up to them).
The conda application is written in Python (as is the Anaconda gui front end) and is incredibly slow. So slow that I stopped using conda for quite some time. Fortunately, a group of developers has re-written conda in c. This application is call mamba and if you are using conda a lot you should consider switching to mamba.
Both conda and mamba require a base Python installation as some (or all) of their coding is in Python. For applications using Python within a Docker container this is a pain, first a version of Python needs to be installed, then a virtual environment created and another version of Python installed with the required packages. For containers that are set up and destroyed within a short time period this is unnecessarily fiddly. Therefore, there is another package micromamba which can create a virtual environment without a base Python installation and install the required packages; though its functionality is limited compared with mamba.

This was going to be a short post, but ended up being longer than I was expecting (sorry). The finally piece of the jigsaw:

The Conda Propagation node includes a drop-down to select the virtual environment created by conda/mamba. This is separate from the Python environment installed by default by KNIME and allows developers/users to create isolated environments specific for their own Python script. You will still need to install the KNIME required packages into your environment so that your code can interoperate with the KNIME application. The details for this are in Aline’s post.

Apologies of this is repeating stuff that you know, but thought it might be useful to put it somewhere on the forum.

DiaAzul
LinkedIn | Medium | GitHub

kalimist · April 4, 2023, 11:19pm

Thanks for all the info. If I were to create a component that included the conda propagation node and this component was utilised in a workflow that was already using a certain version of python along with specific versions of various python libraries, would my code in the component that utilises the conda propagation node be isolated from those python specifics in the containing workflow that uses the compoent?

DiaAzul · April 4, 2023, 11:30pm

@kalimist

If the Conda virtual environment named in the propagation node is unique the environment will be isolated. You can share environments across nodes and workflows by using the same environment name. Don’t forget to select Conda Environment Variable in the Execution tab of the Python script node…if you do not do this then you will be using the default Python environment and not the one defined in the Conda Environment Propagation node.

system · April 11, 2023, 11:30pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.