I am trying to recreate a workflow described here:
I asked the author to uploade the workflow to the Knime hub but without aveil so far, hence I am trying to recreate it from the written material.
I have installed Reinvent 3.2 according to GitHub - MolecularAI/Reinvent using miniconda. I can activate the environment packages using the Conda Environment Propagation node, and see all packages installed in the Reinvent.v3.2 environment. In the Python Source node I enter the code from the blog:
from pandas import DataFrame
Create empty table
output_table = DataFrame()
from rdkit import Chem
import reinvent_models.reinvent_core.models.model as reinvent
batchsize = 124 #model path where user would like to use
modelpath = ‘/tmp/evehom/random.prior.new’
ERROR Python Source 3:2 Execute failed: No module named ‘rdkit’
Traceback (most recent call last):
File “”, line 4, in
ModuleNotFoundError: No module named ‘rdkit’
Yet rdkit is listed in the environment package list. One thing I noted is that in Knime it says rdkit version 2020.03.3.0. while if I activate the same env command line and check the version it says 2022.03.2.
Either way I don’t understand why the rdkit error message comes up since the package is installed.
from what you provide, there seem to be two things mixed in the KNIME Analytics Platform: the global Python environment (See Preferences -> KNIME -> Python (or Python(Legacy))) versus the environment you created with your Conda Environment Propagation node. The latter does not include RDKit and will thus fail.
First: you can delete the link between the two nodes in your workflow and try it with your global Python environment.
#1: after deleting the link and setting the global Python environment to reinvent.v3.2 it (sort of) works. At least rdkit gets loaded (but I get another error further down the code, which is not due to missing packages)
#2: my interpretation of the Conda Environment Propagation node was that you can use it to ‘bypass’ the global Python preferences in a workflow-specific manner but this is not the case?
#3: see screenshot from the Conda Environment Propagation node configuration, where it says rdkit 2020.03.3.0. Note that this environment was created outside Knime according to the Reinvent gihub instructions. On the command line the rdkit version in this environment is 2022.03.2
Seems to be an issue with your Python environment reinvent.v3.2, let me know if there are further issues installing missing packages.
Yes you can, but that requires a working Python environment. I suggest using a global Python environment first, complete your script and then make the Conda Environment Propagation node use the Python environment. If it works, you can use it to bypass the global preferences. The node should then also use the current Python version and not refer to some old version anymore.
It seems like the workflow you shared had another Conda environment configured, thus my confusion. I get the following environment displayed:
@evert.homan_scilifelab.se these remarks: I would not use dots in the name of my conda environment. Try to limit the special characters to just underscore.
Then you might want to take a look at my article about KNIME and conda environments and mabye create a YAML file that would contain all your necessary libraries and try to install vis this configuration:
Then follow the official guise how to make sure your Python Script node knows which environment to use. I would use the new ones: I think your screenshots shows a deprecated one.