Python Script: Extension Data type RDKit (performance)

With the new Python script node not yet supporting extension data types I wanted to mention that for the RDKIt extension, I actually avoid using it because it was extremely slow loading data into Python. So slow it was faster to convert to smiles and then back to rdkit inside python node.

If it is possible to implement with new scripting nodes performance should be taken into consideration. It should be at least as fast as just using smiles conversion.

Hi kienerj,

Thanks for the question! I am not sure I’ve got your question right, but the improvement introduced in Phyton Script (Labs) node is exactly to offer you a fast data transfer to python.
Is this what you’ve meant in your post?

Maybe it fixes it. I can’t say yet. My point was that with current (non-labs) Python script, the transfer java->python for RDKit molecules is very slow so that it is faster to do RDKit To Molecules → Python script (recreating rdkit molecule from smiles).

So when doing the implementation for the new Python nodes, this should be checked for. That it actually is faster than just converting to/from smiles. It should be, but good to check it.

Hi kienerj,
Thanks for pointing out that the performance of transferring RDKit data types to Python can be improved.

As nsas pointed out, the general data transfer to Python has been improved drastically in the Python Script (Labs) node. However, as you already noted, the RDKit extension type is not available yet for the new node. We will work on that soon and will pay attention to the performance!
Best,
Carsten

2 Likes