Memory issue with RDKit Molecule column

Hi all,

I have implemented a work flow where a column of SMILES strings are canonicalized using a python script node. The file has about 8000 rows. In order to do the canonicalization a column of rdkikt mol structures need to be created. This works fine.
Rather than regenerate this mol column further in the workflow I keep it in the table and pass it through the workflow. When using the Modern UI, when I look at the contents of the table it takes 20-30 seconds to display, and when I have implemented three or so python script components that output a table with the mol column the Mac complains with an application memory issue and KNIME is taking ~56GB of space, forcing me to kill the program.
I do not know if this issue is unique to the Modern UI, if it is related to the rdkit mol object itself, or the table view beneath the workflow trying to visualize each of the mol entries in the column.
As a work around I drop the mol column at the end of the Python Script and regenerate it from the SMILES column when needed.
There is clearly some memory issue as 56GB is not a reasonable amount of memory to hold 8000 molecule entries.

I am running KNIME 5.1.1 on a Macbook Pro 16" with 16GB of memory.
While I have a workaround I thought that I would bring this issue to your attention.

– Scott

Hi @Scott_Snyder -

Would it be possible for you to share your workflow, or is the data business confidential? I’d like to share it internally to see if we can reproduce the issue on MacOS and address it.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.