I'm on Knime 2.7 and I'm trying to convert ~800K molecules to RDKit and then InchiKey. It works on 10K moleucles, but on the large set it keeps crashing. I even tried chunking it in batches of 10K, but after a few it still crashes. The error report from the Mac OS I'm getting is pasted in below.
Remark: Removed error report since it caused time out when loading this page.
Some additional info from the Xterm:
Invalid access of stack red zone 0x12c802ec8 rip=0x126513ad1
Bus error: 10
Also, I didn't see this problem in Knime 2.6 with the same dataset I'm running.
That's probably related to changes in the RDKit binaries. Would it be possibly that you isolate the problematic structures? It's probably not trivial as it crashes but your loop could help: Take 10k rows at a time and write them into a CSV file (use the overwrite option). After the CSV writer finishes, run them through the RDKit node. Make sure to connect the CSV writer and the RDKit node using a variable connection so that the CSV writer finishes before the rdkit node executes. The CSV file will then contain the molecule(s) that cause the crash. Maybe you can do it again on the subset with a batch size of 100 until you isolate the problematic structure.
Which update site of the RDKit extension are you using (nightly or stable)?
Some good news. After finding the 1 molecule that was causing a problem (I sent you the sdf via email), the rest of the molecules were processed just fine.
And yes, I'm running the nightly updates.
Just to close this out here in case anyone else sees this: the problem Natasja was encountering is due to the JVM running out of stack space while processing a large molecule.
The fix is to edit the knime.ini file and increase the available stack space by adding, for example, "-Xss2048k".