(molecule) images to string - using OSRA/external tool node

I've been trying now for a while to get a process up and running where I read a list of images containing pictures of molecules. These I then want to convert with help of OSRA to smiles or other.

In an older thread

http://tech.knime.org/forum/indigo/image-ocr-feature-request

the user okmijn posted a suggestion with a workflow from pdf extraction which I tried, but tbh, I don't get all the functions AND the problem with importing his example is that all the paths are "wrong" for my system, thus giving a bunch of errors which I don't understand how to remedy.

Also some funcitons aren't really clear to me (such as the unix based replacement command);

Anyway, that example is a bit overkill since I already have a directory full of images.

Anyone who has any suggestions or other working examples? OSRA on my system works (via command line), so I don't really want to switch to a different solution if possible.

Hi Docminus,

honestly, we don't have any OSRA integration, yet. We have a Tess4J Integration, which you can use to read out text from images, but we don't support reading molecules from images. Anyway, you are the second one in a short period asking for this feature, so I forwarded your request to the KNIME core developer team.

If you need any assistance reading text from images just let me know. I'm happy to help.

Christian

ok, sorry, didn't see the other post you mention.

a text recognition feature though would help as well (as a diferent work-flow, not as replacement for the image based one); do you have any ready workflows you can point me to?

also - is there a way to pm a member?

Hi Docminus,

I think you can send messages via E-mail using the forum.

Concerning Tess4J: There is an example workflow available right here: http://tech.knime.org/book/knime-image-processing-tesseract-ocr-extension.

Hope this helps,

Christian

 

Cheers

Hi Dominicus,

I'm not sure what you mean by 'unix based replacement command'.

If you have OSRA working on the command line, you should be able to replace each field in the External Tool node with paths to files and binary executables for your system. The Input Data File path points to any blank text file. The node needs that field, but I couldn't get the node to pass the field to OSRA, so include the file path in the Commandline Arguments field instead.

What do the error messages look like?

As far as input files go, the OSRA web site indicates that OSRA can process image files, so that shouldn't be a problem.

The current version of the workflow only acts on the first file found. I never had time to  implement a loop to work through the entire file list found in a directory.

(the other) Simon

sorry, Okmijn, didn't see your reply.

I think I see now what you did, maybe that is where the confusion lied for me. Also by playing around with this example I got to learn more about Knime in general....

In the end though, I managed a work-around by writing a batchfile to the system and then call OSRA via the batchfile, i.e. the external tool calls the batch file instead. The batch file creation is done via loop and contains per single line a call to OSRA. I've done it under Windows 7 (resp. DOS for the batch file).

Not pretty, but works and gives a bit more flexibility.