I first thought “it’s me”, but after several versions of Knime (older versions of 3.7.x which I had (not anymore) in parallel to 4.x) on multiple Linux (Ubuntu) distributions (18 and 20, 2 different machines), I found that Rdkit/Knime on Linux is less stable than on Windows (two different machines).
It often crashed “silently”, Knime simply closes down without warning.
This is in combination with missing or faulty structures, either during calculation phase (e.g. Descriptor calculations, but others as well), or, if it did manage to calculate everything, and one wants to open the table view of the node output, it crashes there. The console doesn’t give any error messages during calculations, just crashes (the same data-sets using other chemistry nodes don’t make Knime crash).
On Windows, this doesn’t happen. Error messages appear in console during calculations and Knime doesn’t silently say good-bye.
It’s also independent on memory allocation or data-set size (100 compounds or 1 000 000).
So I am gonna be a bit amateur here and say Rdkit on Linux/Knime isn’t the same as on Windows independent of version number when it comes to handling faulty/missing structures.
Hi Greg,
so I am not sure how to do that since I noticed that it seems to depend on size of data/workflow.
Went through some of my workflows to find a suitable culprit to share, containing 900k UPTSO compounds (from the precurated data set that is available to download).
At some point in the workflow I filtered out a 1000 for testing to feed into an RDKit module and - silent crash.
I save those to a table and make a new workflow with only those 1000 and it then doesn’t crash but gives a error message as expected.
I am certain I had datasets that were smaller, but now that I am looking for something specific I of course can’t find what I need.
Doesn’t help much I assume. I continue keeping watch until I find something reproducible (currently doing other kinds of modelling not Knime based)
Yeah sorry this is a pain, but the intermittent nature of the problem and the fact that I haven’t managed to reproduce it on my own, pretty much makes it impossible to fix.
Hi @greglandrum
after having worked more in Python this issues was dormant, but now it is back when working in Knime:
I enclose a set of molecules here that seems to reproduce the issue.
On Linux, the Knime log shows nothing. Knime just crashes.
On Windows, you can see the following in Knime log (and in the console):
WARN : KNIME-Worker-16-RDKit From Molecule 0:2703:0:2709:2695 : : Node : RDKit From Molecule : 0:2703:0:2709:2695 : Failed to process data due to SDF Parsing Error (MolSanitizeException) - Generating empty result cells. [52 of 14813 rows]
CrashingMols.knwf (54.2 KB)
Note0: tested on Knime 4.2.3 (according to a colleague also happens in 4.3)
Note1: the source of the molecules stems from Indigo2 conversions, these nodes aren’t downloaded automatically.
Note2: if I do an intermediary conversion of these structures using OpenBabel (from sdf to sdf) and then use RDKit, it works even on Linux, no crash and output to log/console.
A small addition: dataset size seems to have some impact? A colleague of mine managed to use the OpenBabel “trick” and Rdkit to molecule doesn’t crash, he can continue working. But if he opens the table from Rdkit node - silent crash. A “regular” node later though, he can view the table.
(on Linux)