Optical recognition of a molecule in a .jpg file

Good morning,

Is anyone aware of a node/workflow capable of reading an .xls file containing molecules images in one column, which can be afterwards processed to obtain the chemical structure in smiles notation?

Thank you very much in advance for the help,

regards

Alfredo

This old workflow isn’t exactly what you want, but you could start with this and modify to your needs:

https://www.myexperiment.org/workflows/3573.html

And are you able to tell us where the .xls + images comes from? Several commercial products hide the structure file* for each image in the associated ‘comment’ field for the cell. A macro that copies the comment contents to an actual cell might be all that you need.

(the other) Simon

  • some products base64 encode the structure file before copying it to the comment field.

Thank you Simon for the comment. The .xls file as prepared by a colleague, and he manually pasted the .jpg structure into a column, so I think it will be loaded it into KNIME. Am I right?
I will try to compile OSRA and tune the workflow you pointed out for my needs,
regards
Alfredo

So for those of you interested, there’s a Kaggle competition going on to address some of the issues here:

Hopefully the solutions are released to the wider community.

(the other) Simon