How to access TDC Python code through KNIME

evert.homan_scilifelab.se · October 31, 2023, 10:30am

Hi, I am investigating if it is possible to run Python code from the Therapeutic Data Commons (TDC) by using the KNIME Python Integration. For example this code allows one to calculated the drug likeness , expressed as QED for molecules in smiles format:

https://tdcommons.ai/functions/oracles/#quantitative-estimate-of-drug-likeness-qed

from tdc import Oracle
oracle = Oracle(name = ‘QED’)
oracle([‘CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1’,
‘CCNC(=O)c1ccc(NC(=O)N2CCC@H C@HC2)c(C)c1’,
‘C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O’])

[0.7369335974098526, 0.7965866720151891, 0.9026967965647689]

Is it possible to do this in KNIME? Ideally by feeding the Python Script node with a KNIME table containing smiles strings, with the output being the QED values, one for each row.

Any pointers on if/how to get this to work appreciated.

Thanks/Evert

steffen_KNIME · October 31, 2023, 1:26pm

Dear Evert,

yes, I think this is possible in KNIME.

Basically, I see the following steps:

Make the Python module of TDC available in a local Python environment to be used in your KNIME installation
Write and test the code within a Python Script node
Make the local Python environment exportable via Conda Environment Propagation node to share the whole workflow with others

The first point can be tackled by creating a Python environment. One approach is via command line / terminal /anaconda3 prompt. Create a environment for the Python Script node as described in this section about metapackages. You might have to look at the prerequisites section at the beginning of the chapter. As pytdc is available via Pip (up until version 0.4.1) or via Conda (up until version 0.3.8), I describe the approach via Conda:

conda create -n my_environment -c knime -c conda-forge knime-python-scripting=5.1 python=3.11 pytdc

If 0.3.8 is not sufficient, you can install a more recent version via pip (easy, google it )
Make sure that in Preferences -> KNIME -> Python you choose Conda instead of Bundled or Manual and select the above created environment (which I named here my_environment).
Now you should have a working Python environment.

For point two I suggest using a template of the Python Script node and adjust it.

Point three then follows this part of the documentation.

Does that help already?

Best regards
Steffen

evert.homan_scilifelab.se · October 31, 2023, 3:19pm

Dear Steffen,

Thanks for your swift reply. I got quite far but in the end it comes down to writing the correct Python code under 2).

First off I used your code to create the Conda environment for PyTDC, and then tried to run the Oracle QED code directly on the command line. This did not work initially but I got it to work by doing

pip install PyTDC

after which it still did not work. I then did

pip install PyTDC --upgrade

and then the code worked, so I had a working environment for KNIME. I specified this environment under the Python preferences in KNIME.

Now it comes basically down to writing the correct Python code in the Python Scripting node. I feed this node with a table containing the 3 smiles form the Oracle QED example, and enter the following code…

#This example script simply outputs the node’s input table.

#knio.output_tables[0] = knio.input_tables[0]
import tdc
from tdc import Oracle
oracle = Oracle(name = ‘QED’)
knio.output_tables[0] = oracle([‘knio.input_tables[0]’])

This gives

KnimeUserError: Output table ‘0’ must be of type knime.api.Table or knime.api.BatchOutputTable, but got <class ‘list’>

I am sure it’s a simple thing but I am a total NOOB when it comes to writing code (which is why I love KNIME

Much appreciated/Evert

steffen_KNIME · October 31, 2023, 9:36pm

Dear @evert.homan_scilifelab.se,

so I tried to do the following:

pip install PyTDC
pip install pyTDC --upgrade

And I got the error ImportError: Please install networkx by 'pip install networkx'!
So I did

pip install networkx

Then I played around and created the following workflow:
ForEvert.knwf (7.7 KB)

I think that should serve you as a starting point. Everything else is now Python development I would say.

Does that help already?

Best regards
Steffen

PS: the results look as follows where column1 was the input from the Table Creator and column2 is what you wanted to compute

…
and here a short version of the code from the workflow:

import knime.scripting.io as knio
import pandas as pd
from tdc import Oracle
oracle = Oracle(name = 'QED')
df = knio.input_tables[0].to_pandas()
list = df['column1'].tolist()
df['column2'] = oracle(list)
knio.output_tables[0] = knio.Table.from_pandas(df)

steffen_KNIME · October 31, 2023, 9:41pm

PPS: I do not think that the pip part is necessary for the usage within the Python Script node if the conda command for the environment creation is executed as I suggested.

evert.homan_scilifelab.se · November 1, 2023, 8:52am

Brilliant, I could reproduce this:

pytdc_qed

Bit interesting that you get a value of 0 (zero) for the second row with the same smiles?

Anyway, this is great as it gives access to a lot of interesting cheminformatics tools from the Therapeutic Data Commons via KNIME, which should be of interest to quite a few people in the life science field.

Many thanks,

Evert

evert.homan_scilifelab.se · November 1, 2023, 8:57am

Without installing PyTDC via pip the environment does not work for me. I tried to reproduce it using your installation command:

conda create -n my_environment -c knime -c conda-forge knime-python-scripting=5.1 python=3.11 pytdc

If I then activate my_environment in KNIME and rerun the Python scritp I get this error in KNIME:

ERROR Python Script 3:2 Execute failed: ImportError: cannot import name ‘rmsd’ from ‘tdc.chem_utils.oracle.oracle’ (/home/evehom/miniconda3/envs/my_environment/lib/python3.11/site-packages/tdc/chem_utils/oracle/oracle.py)

So clearly pytdc does not get installed on my system this way.

On the command line in the same environment it looks like this, if this is of any use:

python3
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
from tdc import Oracle
oracle = Oracle(name = ‘QED’)
Traceback (most recent call last):
File “”, line 1, in
File “/home/evehom/miniconda3/envs/my_environment/lib/python3.11/site-packages/tdc/oracles.py”, line 58, in init
self.assign_evaluator()
^^^^^^^^^^^^^^^^^^^^^^^
File “/home/evehom/miniconda3/envs/my_environment/lib/python3.11/site-packages/tdc/oracles.py”, line 74, in assign_evaluator
from .chem_utils import qed
File “/home/evehom/miniconda3/envs/my_environment/lib/python3.11/site-packages/tdc/chem_utils/init.py”, line 3, in
from .oracle.oracle import PyScreener_meta, Vina_3d, Score_3d, Vina_smiles, molecule_one_retro, ibm_rxn,
ImportError: cannot import name ‘rmsd’ from ‘tdc.chem_utils.oracle.oracle’ (/home/evehom/miniconda3/envs/my_environment/lib/python3.11/site-packages/tdc/chem_utils/oracle/oracle.py)

evert.homan_scilifelab.se · November 1, 2023, 10:27am

One more thing: if I use the Conda Environment Propagation node to set the working PyTDC environment to be used for Python Scripting, I get the same error message as when running the QED script command line:

ERROR Python Script 3:2 ImportError: cannot import name ‘rmsd’ from ‘tdc.chem_utils.oracle.oracle’ (/home/evehom/miniconda3/envs/my_environment/lib/python3.11/site-packages/tdc/chem_utils/oracle/oracle.py)

Maybe you can shine a light on why it works when setting the environment through the preferences, but not when using the Conda Environment Propagation node.

BW/Evert

steffen_KNIME · November 2, 2023, 11:05am

Dear @evert.homan_scilifelab.se,

very nice that it works in general!

The different values are interesting, maybe TED has different implementations on MacOS Intel (which I have) vs your OS?

About the whole pip thing. I understand (and thanks for all the details, that was helpful).
One correction: clearly pytdc does get installed on your system with conda-forge. However, as stated in my first answer, it installs only version 0.3.8, whereas pip installs the newer version 0.4.1. The error you (and I) get when using the conda-forge version is fixed in the newer 0.4.1. I opened a ticket for you at their issue portal and gave them a very clear picture: Conda-forge contains old version · Issue #213 · mims-harvard/TDC · GitHub

The Conda Env Prop node of yours seem to also use a wrong version (i.e. 0.3.8). Can you configure it with a working Python environment and verify that it contains the correct (i.e. 0.4.1) version of PyTDC, which can be seen in the configuration dialog? If you have trouble, please provide a workflow with the Conda Env Prop node in this thread.

Best regards
Steffen

evert.homan_scilifelab.se · November 2, 2023, 12:14pm

Hi,

To be honest I don’t really understand how all this works with Python. I have a working environment (pytdc) that works command line, and also with KNIME if I choose it under the preferences. But it does not work when I under preferences use the bundle Python and then propagate my working pytdc environment with the Conda Env node. The contents of this node look different before and after running the workflow. What do all the yellow lines mean?

BTW, I am calculating SAscore and QED in parallel but could this be combined in just one node? Again both work fine if I set the Python env through the preferences.

evert.homan_scilifelab.se · November 2, 2023, 12:21pm

BTW pytdc 0.4.1 is available through Conda Envirnment Propagation, yet I get the error that it is missing when running the workflow. Does it matter if it comes from pypi or conda-forge?

steffen_KNIME · November 2, 2023, 12:36pm

Hi @evert.homan_scilifelab.se,

that seems to be a misconfiguration of the Conda Env Prop node. Please use the radio button “Check name and packages” near the bottom of the config dialog. If that does not help enforcing the correct (0.4.1) version, then the “Always overwrite existing…” option should help. I also suggest using the “Include only explicitly installed” button to make the workflow available for other OS (because if everything is included, then also packages, which are downloaded as specific dependencies for your OS, will be tried to be installed on a new computer with a different OS).

And yes, SAscore and QED can be combined in one node.

Best regards
Steffen

evert.homan_scilifelab.se · November 2, 2023, 12:48pm

This is what I get after pressing ‘Include only explicitly installed’.

steffen_KNIME · November 2, 2023, 12:59pm

Good. Now please also check the box for the package pip as stated in the warning in the console below.

evert.homan_scilifelab.se · November 2, 2023, 1:35pm

Not working I am afraid…pytdc is ticked yet tdc not found?

steffen_KNIME · November 2, 2023, 1:39pm

Did you also try overwriting the existing environment as suggested here?

mlauber71 · November 3, 2023, 4:40am

If you want to read up about KNIME and python setup I can offer this article.

Maybe best to create as yaml file with the necessary packages first, the an environment per operating system that you can then distill into a Conda Environment propagation.

evert.homan_scilifelab.se · November 7, 2023, 11:20am

Thanks, I will have a a look. I am mostly confused bu Python itself (using conda vs pip etc.) not so much by KNIME.

Best wishes/Evert

mlauber71 · November 7, 2023, 5:17pm

This is where I hope the article can shed some light. The easiest way to use KNIME and Python is to use the integrated Python version in the extension.

Then: pip will just install the packages you tell it to install without considering the dependencies. Conda tries to manage them. So you might first want to use conda (via Miniforge), and if this does not work add additional packages with pip. Which is what the configuration (YAML) files suggest.

evert.homan_scilifelab.se · November 9, 2023, 5:18pm

I saw the light indeed after reading your article: I had forgotten to set ‘Use the CONDA flow variable’ under the Executable Selection tab

I also managed to get both calculations working in one Python Scripting node:

import knime.scripting.io as knio
import pandas as pd
from tdc import Oracle
sa = Oracle(name = 'SA')
qed = Oracle(name = 'QED')
df = knio.input_tables[0].to_pandas()
list = df['Smiles'].tolist()
df['SAscore'] = sa(list)
df['QED'] = qed(list)
knio.output_tables[0] = knio.Table.from_pandas(df)

Thanks for your patience, I am still learning.

BW/Evert