Tess4J OCR returns only empty strings

Dear IMG-KNIMErs, :)

In my setup on Windows (KNIME 2.9.2, using CommunityContributions_trunk_201405011346.zip as source), Tess4J OCR returns only empty strings.

It's the published demo workflow I'm using, and it had issues parsing the images directly as well (Image Reader returning blank cells / throwing errors). So I went to "Read PNG images" combined with "PNG Image to ImgPlus" instead, which works for the demo GIFs as well. Not sure what's going on...

Any ideas about what might cause this? Stack trace for the Tess4J node below.

Thanks,
E

DEBUG ExecuteAction Creating execution job for 1 node(s)...

DEBUG NodeContainer Tess4J 0:4 has new state: CONFIGURED_MARKEDFOREXEC

DEBUG NodeContainer Tess4J 0:4 has new state: CONFIGURED_QUEUED

DEBUG NodeContainer KNIME_ocr_example 0 has new state: EXECUTING

DEBUG NodeContainer ROOT has new state: EXECUTING

DEBUG WorkflowManager Tess4J 0:4 doBeforePreExecution

DEBUG NodeContainer Tess4J 0:4 has new state: PREEXECUTE

DEBUG WorkflowManager Tess4J 0:4 doBeforeExecution

DEBUG NodeContainer Tess4J 0:4 has new state: EXECUTING

DEBUG LocalNodeExecutionJob Tess4J 0:4 Start execute

DEBUG WorkflowFileStoreHandlerRepository Adding handler 325a43d2-d494-414d-b740-5e7defa2223f (Tess4J 0:4: <no directory>) - 6 in total

ERROR Tess4JNodeModel Execute failed: Exception was thrown.

ERROR Tess4JNodeModel Execute failed: Exception was thrown.

DEBUG MemoryObjectTracker Adding org.knime.core.data.container.Buffer$BufferMemoryReleasable (7 in total)

INFO LocalNodeExecutionJob Tess4J 0:4 End execute (0 secs)

DEBUG WorkflowManager Tess4J 0:4 doBeforePostExecution

DEBUG NodeContainer Tess4J 0:4 has new state: POSTEXECUTE

DEBUG WorkflowManager Tess4J 0:4 doAfterExecute - success

DEBUG NodeContainer Tess4J 0:4 has new state: EXECUTED

DEBUG Table Cell Viewer Configure succeeded. (Table Cell Viewer)

DEBUG NodeContainer KNIME_ocr_example 0 has new state: IDLE

DEBUG NodeContainer ROOT has new state: IDLE

Hi,

unfortunately I don't know whats causing the problems. But lets try to find it out. Can you reproduce these problems with our "Stable Release" and/or our current nightly build? 

If so, which images are you trying to read in? can you upload one example file? Then I would first like to check why the image reader fails and then go on with the Tess4J issue :-)

Thanks for your help!

Christian

 

 

Hi Christian,

I'll try a stable release tomorrow and report back. For the imagery I'm simply using the GIFs supplied with the OCR demo workflow to start with. If I manage to work it out with these, I'll surely get more ambitious. :-)

Thanks so far, TTYL!
E

Christian,

For what it's worth, on my home machine (2.9.4, Windows) the Tesseract node also throws an exception - stack trace below. The rest of the demo workflow works out of the box, though.

Cheers
E

INFO      KNIMECorePlugin                    Setting console view log level to DEBUG
DEBUG     ExecuteAction                      Creating execution job for 1 node(s)...
DEBUG     NodeContainer                      Tess4J 3:4 has new state: CONFIGURED_MARKEDFOREXEC
DEBUG     NodeContainer                      Tess4J 3:4 has new state: CONFIGURED_QUEUED
DEBUG     NodeContainer                      KNIME_ocr_example 3 has new state: EXECUTING
DEBUG     WorkflowManager                    Tess4J 3:4 doBeforePreExecution
DEBUG     NodeContainer                      ROOT  has new state: EXECUTING
DEBUG     NodeContainer                      Tess4J 3:4 has new state: PREEXECUTE
DEBUG     WorkflowManager                    Tess4J 3:4 doBeforeExecution
DEBUG     NodeContainer                      Tess4J 3:4 has new state: EXECUTING
DEBUG     LocalNodeExecutionJob              Tess4J 3:4 Start execute
DEBUG     WorkflowFileStoreHandlerRepository     Adding handler dae641e6-b143-4ad4-b8de-88d47882743b (Tess4J 3:4: <no directory>) - 5 in total
ERROR     Tess4JNodeModel                    Execute failed: Exception was thrown.
ERROR     Tess4JNodeModel                    Execute failed: Exception was thrown.
DEBUG     MemoryObjectTracker                Adding org.knime.core.data.container.Buffer$BufferMemoryReleasable (5 in total)
INFO      LocalNodeExecutionJob              Tess4J 3:4 End execute (0 secs)
DEBUG     WorkflowManager                    Tess4J 3:4 doBeforePostExecution
DEBUG     NodeContainer                      Tess4J 3:4 has new state: POSTEXECUTE
DEBUG     WorkflowManager                    Tess4J 3:4 doAfterExecute - success
DEBUG     NodeContainer                      Tess4J 3:4 has new state: EXECUTED
DEBUG     Table Cell Viewer                  Configure succeeded. (Table Cell Viewer)
DEBUG     NodeContainer                      KNIME_ocr_example 3 has new state: CONFIGURED
DEBUG     NodeContainer                      ROOT  has new state: IDLE

Answer stuck in moderation queue after fixing a typo... :(

The short of it: unfortunately still happens on my home PC, same stack trace contents.

Thanks
E
 

Hi,

what is the configuration of your home PC?

Windows 64bit + Stable release?

As soon as I know the configuration, I will try to reproduce the problem!

Thanks!

Christian

 

Christian,

Win 8 64bit, KNIME 2.9.4 64bit, yesterday's stable Community Contributions. Demo workflow executes fine (only warnings in the image reader) up to the Tess node, which throws a console error but gives no UI-based warning. Result: empty cells.

Thanks!
E

Confirm same issue at work with stable release, only differences: Win 7/64 and still on KNIME 2.9.2/64.

TY
E

Great, thanks. I will try to reproduce and let you know as soon as I know whats going on.

 

Thanks again for your help,


Christian

 

Cool, thanks.

While you're at it, any chance of adding "PDF to Image" parsing to KNIME's capabilities? Kilian's PDF parser node uses PDFbox, which is perfectly capable of saving PDF pages to images for OCR... that'd be quite powerful. :-)

Cheers,
E

Hi,

I just tried it. Setting: Windows 64, KNIME 2.9.4, KNIME Image Processing 1.1.2, Tess4J 0.8.1 and it seems to work.

Can you try with 2.9.4?

Sorry for the inconveniences.

 

About your feature request: This makes sense and it's on our list now :-)

Christian,

I'm beginning to wonder if it's the combo between the "trusted" KNIP and the "stable" Tess... happens on both of my machines, but both are set up that mixed-up way as well. I'll experiment a bit further and let you know.

Also, thanks for the feature consideration. :)

-E

Stable vs. Trusted shouldn't be a problem. Tess4J is not trusted yet, as we  still face problems like yours. The reason is, that we use native libraries and sometimes this is a bit tricky. Anyway, I see

ERROR Tess4JNodeModel Execute failed: Exception was thrown.

Is there a more detailed message in your knime log file?

 

I see, won't experiment stable/trusted then. Unfortunately, the log I posted is as detailed as it got... :-( I can only assume that special flags in my knime.ini can play a role, that's the only non-stadard setting I use.

-Djava.library.path=C:\R-3.1.0\library\rJava\jri\x64 (for R 3.x interactive nodes)

I used to have one related to forcing Eclipse into UTF-8, but it's not on the PC I'm currently checking, so that can't be it. Realistic to assume this single setting might impact other third-party libraries?

TY,
E

Actually, I don't see how your lib should influence Tess4J. But I have another idea to find out more: I will add more detailed error messages to our Tess4J in the Nightly-Build. Can you update KNIP & Tess4J in approx 2h to our nightly build (see http://tech.knime.org/community) and tell me the error message?

We will get this running!!! ;-)

Hi,

I just updated our nightly build to output more detailed error messages. can you update your to our nightly build (KNIP & Tess4J) and a.) see if the error still occurs and b.) if the error is still present post the message here? There should be one then ;-)
 

Thank you,

Christian

 

Hi Christian,

OK, it'll have to be tonight though - firewall/proxy woes.

Thanks
E

Christian,

This looks like a major Homer Simpson D'oh! moment... looks like I haven't installed all image processing packages, and it looks like you didn't anticipate anone doing this. :-) At least that what I read into the stack trace below:

ERROR     Tess4JNodeModel                    Execute failed: Exception was thrown.
DEBUG     Tess4JNodeModel                    Execute failed: Exception was thrown.
net.sourceforge.tess4j.TesseractException: java.lang.RuntimeException: Need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at org.knime.knip.tess4j.base.node.Tess4JNodeModel.compute(Tess4JNodeModel.java:130)
    at org.knime.knip.tess4j.base.node.Tess4JNodeModel.compute(Tess4JNodeModel.java:1)
    at org.knime.knip.base.node.ValueToCellNodeModel$1.getCells(ValueToCellNodeModel.java:377)
    at org.knime.core.data.container.RearrangeColumnsTable.calcNewCellsForRow(RearrangeColumnsTable.java:496)
    at org.knime.core.data.container.RearrangeColumnsTable.calcNewColsSynchronously(RearrangeColumnsTable.java:416)
    at org.knime.core.data.container.RearrangeColumnsTable.create(RearrangeColumnsTable.java:350)
    at org.knime.core.node.ExecutionContext.createColumnRearrangeTable(ExecutionContext.java:372)
    at org.knime.knip.base.node.ValueToCellNodeModel.execute(ValueToCellNodeModel.java:506)
    at org.knime.core.node.NodeModel.executeModel(NodeModel.java:556)
    at org.knime.core.node.Node.invokeNodeModelExecute(Node.java:1069)
    at org.knime.core.node.Node.execute(Node.java:924)
    at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:418)
    at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
    at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:182)
    at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:113)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
    at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:238)
Caused by: java.lang.RuntimeException: Need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
    at net.sourceforge.vietocr.ImageIOHelper.getImageByteBuffer(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.setImage(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)

I'll try and install the rest of the packages and report back.

Cheers
E

Christian,

Installing everything from trunk gives me the following:

KNIME_ocr_example 0 loaded with errors
  KNIME_ocr_example 0
    Unable to load node with ID suffix 2 into workflow, skipping it: Version mismatch: jar version is '24', native library version is '23'
    Unable to load node with ID suffix 3 into workflow, skipping it: Could not initialize class org.knime.knip.view3d.render.LWJGLVTKInteractiveCanvas
    Unable to load node with ID suffix 4 into workflow, skipping it: Could not initialize class org.knime.knip.view3d.render.LWJGLVTKInteractiveCanvas
    Unable to load node with ID suffix 5 into workflow, skipping it: Could not initialize class org.knime.knip.view3d.render.LWJGLVTKInteractiveCanvas
    State has changed from CONFIGURED to EXECUTED

That translates to all nodes being gone from the workflow (except for "image reader" and "list files"). I'll poke/try around a little more and let you know. Note that the "version mismatch" is thrown by my modified workflow only, where the KNIME base distro's "PNG reader" filled in for the (previously defunct) "image reader". The final result is the same in the unmodified example workflow, though.

-E