RDKit molecule extractor returns empty field if only one structure present

Hi Everyone,

My name is Peter and I'm relatively new to KNIME. I've been trying to put together a simple library enumeration protocol using the RDKit Two Component Reaction - and it all worked really nice.

As my input I have two SDF files, not tabulated ones but a sketch - multiple structures on one canvas. I have used the Molecule Extractor Node to split the whole set of molecules into individual structures but I noticed this only works if the SDF contains two or more structures. If there's only one structure in the SDF file the Molecule Extractor generates a blank table (instead of copying the structure across - as it is written in the description). I also tried pre-filtering the molecules with the RDKit from molecule to use RDKit as the input for the Extractor node.

I was wondering if anybody else noticed that and if there's a quick fix to it? Perhaps there's a similar, non-RDKit node of this type available?

I also noticed that the structures do not need to be in individual rows for the Library Enumeration to work (no need to "extract" molecules in theory) but I would like to have this functionality (ability to use the Extractor as a pre-filter regardless of how many molecules I have in my SDF file) for other purposes.

I hope this makes sense!

Thanks,

Peter

 

Peter,

I've not tried the node you mention, but one possibility might be to use the ChemAxon 'MolConvertor' node to convert your sdf to SMILES, split the SMILES by '.' (using the 'Cell splitter' node, with '.' as the deliminator, and split into a single collection column), then an 'Ungroup' node to get each component into a separate row, and finally a 'Molecule Type Cast' to get the resulting column back to being a SMILES.

Steve

Hi Steve,

That worked very well indeed - the structures in the "canvas"-type SDF files are separated into individual rows correctly. I didn't realize that a period sign '.' is the delimiter in a SMILES file. The Molecule Type Cast node is also something I didn't know existed.

Thanks for your help.

Peter

 

Dear Peter,

thanks for reaching out. I can confirm that this is indeed wrong behavior and has slipped through when testing the node. I created a GitHub ticket (https://github.com/rdkit/knime-rdkit/issues/18) and I will fix this soon, so that it will be available in the nightly build of the RDKit nodes.

It is currently not clear when we will have a next official release, but you would be able to use the nightly build in the meantime, which you can find in the following update site: http://update.knime.org/community-contributions/trunk 

I will let you know when the fix is done.

Kind regards,
Manuel

Dear Peter,

the fix is available since end of April 2017 in the nightly build of the RDKit nodes. Please try it out, if you get a chance and let me know, if your problem resolves.

Again, thanks for reporting this bug!

Kind regards,
Manuel

Hi Manuel,

I've updated the KNIME libraries and can confirm that the RDKit Molecule Exctractor node works fine with SDF files containing both multiple or a single structure now. Thanks for putting together a fix so quick - I'm impressed.

 

Kind Regards,

Peter