Parallelize RDKit Molecule Fragmenter

rguha · April 19, 2019, 7:49pm

Is there a way to use the Molecule Fragmenter node with the Parallel Chunk Start/End nodes? Right now, it appears that the Start node does use multiple cores to parallelize the fragmenter node, but the End node complains about non-unique row id’s.

greglandrum · May 29, 2019, 9:15am

Hi @rguha,

Part of the trick is to tell the Parallel Chunk End node to “Add Chunk Index to RowID”
That will at least get the loop to execute.

Then you’re going to have the problem that the fragment indices that are used in the tables are going to be specific to the individual chunks, i.e. each chunk with have a fragment 0, a fragment 1, etc. You can probably update these inside the loop itself (since the chunk ID is available as a flow variable), but that’s probably going to require some use of the java snippet node.

I hope this helps at least a little bit,
-greg

system · April 21, 2023, 9:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.