Parallelize RDKit Molecule Fragmenter

#1

Is there a way to use the Molecule Fragmenter node with the Parallel Chunk Start/End nodes? Right now, it appears that the Start node does use multiple cores to parallelize the fragmenter node, but the End node complains about non-unique row id’s.

0 Likes

#2

Hi @rguha,

Part of the trick is to tell the Parallel Chunk End node to “Add Chunk Index to RowID”
That will at least get the loop to execute.

Then you’re going to have the problem that the fragment indices that are used in the tables are going to be specific to the individual chunks, i.e. each chunk with have a fragment 0, a fragment 1, etc. You can probably update these inside the loop itself (since the chunk ID is available as a flow variable), but that’s probably going to require some use of the java snippet node.

I hope this helps at least a little bit,
-greg

1 Like