Python not loading rdkit

Hi,

I am trying to recreate a workflow described here:

I asked the author to uploade the workflow to the Knime hub but without aveil so far, hence I am trying to recreate it from the written material.

I have installed Reinvent 3.2 according to GitHub - MolecularAI/Reinvent using miniconda. I can activate the environment packages using the Conda Environment Propagation node, and see all packages installed in the Reinvent.v3.2 environment. In the Python Source node I enter the code from the blog:

from pandas import DataFrame

Create empty table

output_table = DataFrame()
from rdkit import Chem
import sys
print(sys.path)
import reinvent_models.reinvent_core.models.model as reinvent
batchsize = 124
#model path where user would like to use
modelpath = ‘/tmp/evehom/random.prior.new’

model = reinvent.Model.load_from_file(modelpath, sampling_mode=True)
sampled_smi, likelyhood = model.sample_smiles(batchsize)
output_table[‘sampled_smi’] = sampled_smi
output_table[‘likelyhood’] = likelyhood
output_table[‘ROMol’] = output_table[‘sampled_smi’] .apply(Chem.MolFromSmiles)

Running this gives:

ERROR Python Source 3:2 Execute failed: No module named ‘rdkit’
Traceback (most recent call last):
File “”, line 4, in
ModuleNotFoundError: No module named ‘rdkit’

Yet rdkit is listed in the environment package list. One thing I noted is that in Knime it says rdkit version 2020.03.3.0. while if I activate the same env command line and check the version it says 2022.03.2.

Either way I don’t understand why the rdkit error message comes up since the package is installed.

Thanks for any pointer to a Python noob.

Reinvent.knwf (13.0 KB)

Hi,

from what you provide, there seem to be two things mixed in the KNIME Analytics Platform: the global Python environment (See Preferences -> KNIME -> Python (or Python(Legacy))) versus the environment you created with your Conda Environment Propagation node. The latter does not include RDKit and will thus fail.
First: you can delete the link between the two nodes in your workflow and try it with your global Python environment.

Second: see this link for the usage of the Conda Environment Propagation node and this link on how to use the latest Python integration for further details.

Third: I am wondering, where in KNIME does it say that it has the version 2020.03.3.0? Because that cannot be within the node.

Fourth: if all that does not help, please provide the installed packages of your used Python environment as a txt file (see Second on how to check which Python environment is currently used).

Fifth: please format code in forum posts :slight_smile:

Does this answer your question? Don’t hesitate to ask further questions!

Best regards
Steffen

2 Likes

Hi,

#1: after deleting the link and setting the global Python environment to reinvent.v3.2 it (sort of) works. At least rdkit gets loaded (but I get another error further down the code, which is not due to missing packages)

#2: my interpretation of the Conda Environment Propagation node was that you can use it to ‘bypass’ the global Python preferences in a workflow-specific manner but this is not the case?

#3: see screenshot from the Conda Environment Propagation node configuration, where it says rdkit 2020.03.3.0. Note that this environment was created outside Knime according to the Reinvent gihub instructions. On the command line the rdkit version in this environment is 2022.03.2
(py39h89e00b9_0):

#4: since it seems to work by setting the global python env to reinvent this is not necessary

#5: not sure how to do this

Hi,

  1. Seems to be an issue with your Python environment reinvent.v3.2, let me know if there are further issues installing missing packages.

  2. Yes you can, but that requires a working Python environment. I suggest using a global Python environment first, complete your script and then make the Conda Environment Propagation node use the Python environment. If it works, you can use it to bypass the global preferences. The node should then also use the current Python version and not refer to some old version anymore.

  3. It seems like the workflow you shared had another Conda environment configured, thus my confusion. I get the following environment displayed:


    I suggest to also include only explicitly installed packages, which makes the propagated environment available across different OS. In this subsection we have a small walkthrough on the node’s usage.

  4. :+1:

  5. The (maybe not intuitive) symbols when writing a post provide formatting options: mark a whole script and click Preformatted text

If you have further questions, don’t hesitate to ask!

Best regards
Steffen

1 Like

@evert.homan_scilifelab.se these remarks: I would not use dots in the name of my conda environment. Try to limit the special characters to just underscore.

Then you might want to take a look at my article about KNIME and conda environments and mabye create a YAML file that would contain all your necessary libraries and try to install vis this configuration:

Then follow the official guise how to make sure your Python Script node knows which environment to use. I would use the new ones: I think your screenshots shows a deprecated one.

1 Like