KNIME 4.3: Conda Environment Propagation installing all necessary packages except for one

Kaegan · December 15, 2020, 11:48pm

Hello KNIME Community,
I am absolutely obsessed with the new conda environment propagation node in KNIME 4.3. I have tested out it’s functionality with coworkers and it is correctly installing all of the necessary packages except for one in a custom component I built. The component uses pandas, numpy, scipy, matplotlib, and a probability distribution fitting library called fitter. All of the packages are correctly being installed on brand new conda environments except for fitter and I’m not sure why. Here is a screenshot of the configuration window for the conda environment propagation node:

so you can see that fitter is selected and I am also including pip of course. Here is a screenshot of the error when the component tries to run on an environment without fitter installed:

I know the obvious workaround is installing fitter manually and that works but I’m just hoping there is something I’m missing or some sort or workaround that would allow me to avoid that! I also just wanted to point it out to the community because this might not be the only package this new node is struggling with!

Thank you in advance for any help!

MarcelW · December 16, 2020, 1:10pm

Hi @Kaegan,

Glad to hear that you find the node useful, and sorry for the problems you are experiencing!

I have just tried to reproduce the problem on my machine but so far have not managed to do so – fitter seems to be installed correctly and I can import it in downstream Python nodes.
Could you please provide me with your knime.log file? At the moment, the visible error output of the Conda Environment Propagation node is not really helpful (“Conda process termined with error code 1.”), but the node should have recorded more details in the log.
(We have already fixed that behavior in the meantime. Future versions of the node will provide more helpful diagnostics in case something goes wrong. Some other usability improvements are also on the way, so any feedback on the current state of the node is highly welcome! )

Marcel

Kaegan · December 16, 2020, 6:08pm

Thank you for the response!

I’m not sure what is going on with “Conda process terminated with error code 1.” because that error only showed up on my machine when testing out uninstalling the fitter package and executing the component. My co-workers were not getting that error when they had newly installed anaconda and were only receiving the “No module named fitter” errors. I enabled per workflow logging this morning and uninstalled the fitter package from my conda environment and executed the component again and now I am no longer getting “Conda process terminated with error code 1.” but am getting the “No module named fitter” errors. Here is the log file: knime.log (6.5 KB) If I can reproduce the “Conda process terminated with error code 1.” today I will upload another log file. Here is a screenshot from today:

Thank you very much for the help! I have hope that you said it’s installing fine for you!

MarcelW · December 17, 2020, 1:44pm

Hi @Kaegan,

Could you please double-check whether the failing Python Script (2⇒2) node inside the component actually uses the propagated environment? Maybe something has gone wrong there.
You can do that by opening the configuration dialogue of the node. If at the very bottom of the dialogue window, a message reading “python2Command” and “python3Command” are controlled by variables is being displayed, then this is the case. Otherwise, please go to the Flow Variables tab of the dialog and select conda.environment from the respective drop-down lists of the python2Command and/or python3Command entries. Then please confirm the dialog and retry.

Marcel

Kaegan · December 17, 2020, 4:30pm

Hey @MarcelW,

That was the first thing I thought it might be as well, Here is a screenshot of the inside of the component when it fails:

and here are screenshots of the flow variables controlling the python environments for both of the nodes that are failing:

python 2 => 2:

python view:

Thanks for the continued support on this! I’m talking to my bosses about releasing this and a few other components to the community so if I check with them first I might be able to send you the component if that’s what this might take to get it working, it’s just not completely done yet.

MarcelW · December 17, 2020, 5:37pm

That would be great! Your node configurations look good to me, so I am really starting to run out of further suggestions without taking a look at the actual workflow .

Marcel

Kaegan · December 18, 2020, 10:01pm

Hey @MarcelW,
I got the OK from my bosses to post the component, it’s just procedural so there is no proprietary information contained within it. Here is a workflow that contains the component:

continuous_distribution_fitting_WF.knwf (111.5 KB)

I have also included a small dataset of sample sales data I found somewhere online so you can test everything out properly.

Thank you so so so much for your help with all of this! Excited to hear back from you!

MarcelW · December 21, 2020, 10:15pm

Hey @Kaegan,

Thanks for sharing! This was really useful to pin down the problem. In fact, I do not think I would have been able to do so without it:

It seems you found a (really weird) bug caused by an edge case we did not consider when developing and testing the node. In this edge case, the propagated Conda environment is not respected by downstream Python nodes outside their configuration dialogues at all. Instead, the nodes keep using the default environment configured in the KNIME Preferences. So from what I can tell, it is not the fitter package that causes problems. fitter is merely the only package required by the failing Python nodes that is missing from the default environment. (Can you confirm?)

I believe this is what it takes to run into this edge case:

A Python scripting node whose configuration dialogue has never been confirmed (“OK”, “Apply”) in v4.3 of KNIME Analytics Platform (so either a node configured in a previous version of KNIME but not touched in v4.3 (presumably your case), or a node only added to a workflow but never configured)
The Conda Environment Propagation node, obviously. Its configuration does not matter as long as it does not result in overwriting the default environment configured in the Preferences

Then:

Connecting the Conda Environment Propagation node to the Python scripting node
Wrapping the nodes in a component (if not already) and sharing the component

Sharing the component replaces it by a linked, read-only instance in the workflow. In the linked instance:

Opening the configuration dialogue of any Python node will result in exactly what is shown in your screenshot, a seemingly correctly configured node:

Kaegan:

image705×685 17.4 KB

But here is the thing: in this case, the configuration dialogue does not show its actual configuration, but the configuration the node would have after confirming the dialogue (via “OK” or “Apply”), which is not possible in linked components (I completely missed the grayed-out buttons when looking at your screenshot for the first time).
Thus, the node only looks like it would use the flow variable when in fact it does not. This is clearly a bug on our side.

tl;dr: you should be able to fix the component by:

Opening the shared component (or alternatively, unlinking the component in your workflow and opening it)
Opening and confirming the configuration dialogues of all contained Python scripting nodes. This should properly apply the flow-variable settings
Saving the shared component (or alternatively, saving and sharing the modified component in your workflow)

We are already working on fixing this bug (and in general, making the actual configuration of the node more obvious). In the meantime, please make sure to open and confirm the configuration dialogues of all Python scripting nodes connected to a Conda Environment Propagation node before sharing their enclosing component .

Hope this helps!

Marcel

Kaegan · January 4, 2021, 4:14pm

Hello @MarcelW,
Thank you so much for your detailed reply! I’m sorry it has taken me so long to get back to you as I was not working through Christmas and the New Year.

Your response looked very promising but I’m afraid it has not fixed the problem for me. I haven’t had a chance to test out sending the component to co-workers but I tried what you said, disconnecting the component connection then applying the configuration settings for each of the python nodes, and then I uninstalled the fitter library from my conda environment and tried to execute the component and I’m still getting the ‘No module named fitter’ error. I recorded a .gif of me going through that process so you can ensure I was walking through your steps correctly.

[Edit] I can’t get the gif to show up correctly but I do have an .mp4 video file of me that is 2 min long that I can send to you somehow, unfortunately I can’t upload that here [Edit]

Again, I haven’t had a chance to try this out with a coworker so there might be something strange going on when I uninstall the library from my environment then try to execute the component but I was assuming that the propagation node should still work in that instance.

Thank you for the continued support with this!
Kaegan

MarcelW · January 4, 2021, 6:29pm

Hi @Kaegan,

No problem at all, I hope you enjoyed your holidays .

I just noticed that the “Environment validation” option of the Conda Environment Propagation node in the workflow you shared is set to “Check name only” instead of “Check name and packages” as in the screenshots.
Could you check if that is also true for your local version of the node, switch to “Check name and packages” if necessary, and see if the problem persists? If it has already been set correctly, I will come back to you with a way to share the video, but maybe this (hopefully) already solves it.

Marcel

Kaegan · January 4, 2021, 7:01pm

Hey @MarcelW,

So I dragged in a new component, disconnected the link, switched the propagation node to “Check name and packages” and then went and confirmed the configuration windows of the python nodes. After this I uninstalled fitter again and tried to run the component and still got the “No module named fitter” error. Just messing around I then reset the component and tried to run it again and this time I got that “Conda process terminated with error code 1” error again and since I have logging set up per workflow now I was able to capture the logged message that came along with this termination error in case this is related to the issue I’m having.

Here is the workflow that I did not reset before exporting in case you need that:
continuous_distribution_fitting_WF2.knwf (333.1 KB)

Here is the log file, the error shows up at the very bottom:
knime.log (57.3 KB)

And this is what that part of the log looks like for quick reference:

Happy to send over a video of this if that will help you to confirm if I’m doing all of the steps correctly on my end.

Thank you!
Kaegan

MarcelW · January 4, 2021, 7:30pm

Hi @Kaegan,

Alright, I think we are really getting closer now . It seems like Conda does not allow overwriting its base environment at all (which makes sense given that it serves as the base of the Conda tool itself). Could you try to clone the base environment and configure the Conda Environment Propagation node such that it propagates the cloned environment instead? You can clone the base environment like this:

conda create --name my_cloned_environment --clone base

I will create a ticket to get this fixed on our end (i.e., exclude the base environment from selection in the Conda node).

Marcel

Kaegan · January 4, 2021, 9:08pm

Hey @MarcelW,

I believe this has worked!!! I will need to test out sending it to a coworker later this week to confirm. I went about it a little bit differently because I got an error message trying to run

conda create --name my_cloned_environment --clone base

So instead in KNIME I went to file>preferences>KNIME>python and created a new virtual environment from there so it would be more of a bare environment only with the packages necessary for the KNIME integration. Then I tried two different things that both worked. First I tried leaving the conda propagation node set to ‘base’ and seeing if it would install the base environment in my newly created one (py3_knime) which worked. then I switched the environment propagation node to point to this new environment and was able to successfully run the component as well!

This did bring up a few questions though that I’m curious about. For the conda environment propagation node I am assuming it would be ok to select the base environment that contains the selected packages in the node itself and as long as the environment you are pointing KNIME to in the python preferences (the one that will be propagated) is different from the base environment it should still work correct? So I could develop using my base environment or any other environment and then if I were to distribute the component to anyone they just have to make sure that the environment they are pointing KNIME to in the preferences is not the base environment? Like what I have shown in the picture above?

Thank you so much for all of your help! I’m so excited to be able to use this more!!
Kaegan

I’ll post a follow-up this week once I try distributing it to someone!

MarcelW · January 13, 2021, 1:39pm

Hi @Kaegan,

Sorry for the late reply!

Great to hear! Let’s hope it will work for them, too.

The Conda Environment Propagation node is fully independent of the Python Preferences. That is, it will only ever validate/create/overwrite the environment that is set in its own configuration and propagate it downstream. So the scenario you described here:

should actually not be possible. If “base” is selected in the node, the node will only try to (re)create “base” (though in the future, it may be a nice addition to allow the names of the “source” and “destination” environments to differ, or to give the receiver of a shared component more control over the environment (re)creation process). So distributed components will always try to (re)create the environment that you have used to develop them (including the environment name).

One reason behind this independence of the Preferences is that a single workflow may already contain multiple Conda Environment Propagation nodes (let alone multiple possibly concurrently running workflows). If all of these nodes would modify the environment configured in the Preferences, things could get messy pretty quickly. So the aim of the node is actually the contrary: allowing different Python environments to coexist and be used within a single workflow or a set of workflows without interference.

Marcel

system · February 3, 2021, 3:34pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.