Dear IMG-KNIMErs, :)
In my setup on Windows (KNIME 2.9.2, using CommunityContributions_trunk_201405011346.zip as source), Tess4J OCR returns only empty strings.
It's the published demo workflow I'm using, and it had issues parsing the images directly as well (Image Reader returning blank cells / throwing errors). So I went to "Read PNG images" combined with "PNG Image to ImgPlus" instead, which works for the demo GIFs as well. Not sure what's going on...
Any ideas about what might cause this? Stack trace for the Tess4J node below.
Thanks,
E
DEBUG ExecuteAction Creating execution job for 1 node(s)...
DEBUG NodeContainer Tess4J 0:4 has new state: CONFIGURED_MARKEDFOREXEC
DEBUG NodeContainer Tess4J 0:4 has new state: CONFIGURED_QUEUED
DEBUG NodeContainer KNIME_ocr_example 0 has new state: EXECUTING
DEBUG NodeContainer ROOT has new state: EXECUTING
DEBUG WorkflowManager Tess4J 0:4 doBeforePreExecution
DEBUG NodeContainer Tess4J 0:4 has new state: PREEXECUTE
DEBUG WorkflowManager Tess4J 0:4 doBeforeExecution
DEBUG NodeContainer Tess4J 0:4 has new state: EXECUTING
DEBUG LocalNodeExecutionJob Tess4J 0:4 Start execute
DEBUG WorkflowFileStoreHandlerRepository Adding handler 325a43d2-d494-414d-b740-5e7defa2223f (Tess4J 0:4: <no directory>) - 6 in total
ERROR Tess4JNodeModel Execute failed: Exception was thrown.
ERROR Tess4JNodeModel Execute failed: Exception was thrown.
DEBUG MemoryObjectTracker Adding org.knime.core.data.container.Buffer$BufferMemoryReleasable (7 in total)
INFO LocalNodeExecutionJob Tess4J 0:4 End execute (0 secs)
DEBUG WorkflowManager Tess4J 0:4 doBeforePostExecution
DEBUG NodeContainer Tess4J 0:4 has new state: POSTEXECUTE
DEBUG WorkflowManager Tess4J 0:4 doAfterExecute - success
DEBUG NodeContainer Tess4J 0:4 has new state: EXECUTED
DEBUG Table Cell Viewer Configure succeeded. (Table Cell Viewer)
DEBUG NodeContainer KNIME_ocr_example 0 has new state: IDLE
DEBUG NodeContainer ROOT has new state: IDLE
Hi,
unfortunately I don't know whats causing the problems. But lets try to find it out. Can you reproduce these problems with our "Stable Release" and/or our current nightly build?
If so, which images are you trying to read in? can you upload one example file? Then I would first like to check why the image reader fails and then go on with the Tess4J issue :-)
Thanks for your help!
Christian
Hi Christian,
I'll try a stable release tomorrow and report back. For the imagery I'm simply using the GIFs supplied with the OCR demo workflow to start with. If I manage to work it out with these, I'll surely get more ambitious. :-)
Thanks so far, TTYL!
E
Christian,
For what it's worth, on my home machine (2.9.4, Windows) the Tesseract node also throws an exception - stack trace below. The rest of the demo workflow works out of the box, though.
Cheers
E
INFO KNIMECorePlugin Setting console view log level to DEBUG
DEBUG ExecuteAction Creating execution job for 1 node(s)...
DEBUG NodeContainer Tess4J 3:4 has new state: CONFIGURED_MARKEDFOREXEC
DEBUG NodeContainer Tess4J 3:4 has new state: CONFIGURED_QUEUED
DEBUG NodeContainer KNIME_ocr_example 3 has new state: EXECUTING
DEBUG WorkflowManager Tess4J 3:4 doBeforePreExecution
DEBUG NodeContainer ROOT has new state: EXECUTING
DEBUG NodeContainer Tess4J 3:4 has new state: PREEXECUTE
DEBUG WorkflowManager Tess4J 3:4 doBeforeExecution
DEBUG NodeContainer Tess4J 3:4 has new state: EXECUTING
DEBUG LocalNodeExecutionJob Tess4J 3:4 Start execute
DEBUG WorkflowFileStoreHandlerRepository Adding handler dae641e6-b143-4ad4-b8de-88d47882743b (Tess4J 3:4: <no directory>) - 5 in total
ERROR Tess4JNodeModel Execute failed: Exception was thrown.
ERROR Tess4JNodeModel Execute failed: Exception was thrown.
DEBUG MemoryObjectTracker Adding org.knime.core.data.container.Buffer$BufferMemoryReleasable (5 in total)
INFO LocalNodeExecutionJob Tess4J 3:4 End execute (0 secs)
DEBUG WorkflowManager Tess4J 3:4 doBeforePostExecution
DEBUG NodeContainer Tess4J 3:4 has new state: POSTEXECUTE
DEBUG WorkflowManager Tess4J 3:4 doAfterExecute - success
DEBUG NodeContainer Tess4J 3:4 has new state: EXECUTED
DEBUG Table Cell Viewer Configure succeeded. (Table Cell Viewer)
DEBUG NodeContainer KNIME_ocr_example 3 has new state: CONFIGURED
DEBUG NodeContainer ROOT has new state: IDLE
Answer stuck in moderation queue after fixing a typo... :(
The short of it: unfortunately still happens on my home PC, same stack trace contents.
Thanks
E
Hi,
what is the configuration of your home PC?
Windows 64bit + Stable release?
As soon as I know the configuration, I will try to reproduce the problem!
Thanks!
Christian
Christian,
Win 8 64bit, KNIME 2.9.4 64bit, yesterday's stable Community Contributions. Demo workflow executes fine (only warnings in the image reader) up to the Tess node, which throws a console error but gives no UI-based warning. Result: empty cells.
Thanks!
E
Confirm same issue at work with stable release, only differences: Win 7/64 and still on KNIME 2.9.2/64.
TY
E
Great, thanks. I will try to reproduce and let you know as soon as I know whats going on.
Thanks again for your help,
Christian
Cool, thanks.
While you're at it, any chance of adding "PDF to Image" parsing to KNIME's capabilities? Kilian's PDF parser node uses PDFbox, which is perfectly capable of saving PDF pages to images for OCR... that'd be quite powerful. :-)
Cheers,
E
Hi,
I just tried it. Setting: Windows 64, KNIME 2.9.4, KNIME Image Processing 1.1.2, Tess4J 0.8.1 and it seems to work.
Can you try with 2.9.4?
Sorry for the inconveniences.
About your feature request: This makes sense and it's on our list now :-)
Christian,
I'm beginning to wonder if it's the combo between the "trusted" KNIP and the "stable" Tess... happens on both of my machines, but both are set up that mixed-up way as well. I'll experiment a bit further and let you know.
Also, thanks for the feature consideration. :)
-E
Stable vs. Trusted shouldn't be a problem. Tess4J is not trusted yet, as we still face problems like yours. The reason is, that we use native libraries and sometimes this is a bit tricky. Anyway, I see
ERROR Tess4JNodeModel Execute failed: Exception was thrown.
Is there a more detailed message in your knime log file?
I see, won't experiment stable/trusted then. Unfortunately, the log I posted is as detailed as it got... :-( I can only assume that special flags in my knime.ini can play a role, that's the only non-stadard setting I use.
-Djava.library.path=C:\R-3.1.0\library\rJava\jri\x64 (for R 3.x interactive nodes)
I used to have one related to forcing Eclipse into UTF-8, but it's not on the PC I'm currently checking, so that can't be it. Realistic to assume this single setting might impact other third-party libraries?
TY,
E
Actually, I don't see how your lib should influence Tess4J. But I have another idea to find out more: I will add more detailed error messages to our Tess4J in the Nightly-Build. Can you update KNIP & Tess4J in approx 2h to our nightly build (see http://tech.knime.org/community) and tell me the error message?
We will get this running!!! ;-)
Hi,
I just updated our nightly build to output more detailed error messages. can you update your to our nightly build (KNIP & Tess4J) and a.) see if the error still occurs and b.) if the error is still present post the message here? There should be one then ;-)
Thank you,
Christian
Hi Christian,
OK, it'll have to be tonight though - firewall/proxy woes.
Thanks
E
Christian,
This looks like a major Homer Simpson D'oh! moment... looks like I haven't installed all image processing packages, and it looks like you didn't anticipate anone doing this. :-) At least that what I read into the stack trace below:
ERROR Tess4JNodeModel Execute failed: Exception was thrown.
DEBUG Tess4JNodeModel Execute failed: Exception was thrown.
net.sourceforge.tess4j.TesseractException: java.lang.RuntimeException: Need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at org.knime.knip.tess4j.base.node.Tess4JNodeModel.compute(Tess4JNodeModel.java:130)
at org.knime.knip.tess4j.base.node.Tess4JNodeModel.compute(Tess4JNodeModel.java:1)
at org.knime.knip.base.node.ValueToCellNodeModel$1.getCells(ValueToCellNodeModel.java:377)
at org.knime.core.data.container.RearrangeColumnsTable.calcNewCellsForRow(RearrangeColumnsTable.java:496)
at org.knime.core.data.container.RearrangeColumnsTable.calcNewColsSynchronously(RearrangeColumnsTable.java:416)
at org.knime.core.data.container.RearrangeColumnsTable.create(RearrangeColumnsTable.java:350)
at org.knime.core.node.ExecutionContext.createColumnRearrangeTable(ExecutionContext.java:372)
at org.knime.knip.base.node.ValueToCellNodeModel.execute(ValueToCellNodeModel.java:506)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:556)
at org.knime.core.node.Node.invokeNodeModelExecute(Node.java:1069)
at org.knime.core.node.Node.execute(Node.java:924)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:418)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:182)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:113)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:238)
Caused by: java.lang.RuntimeException: Need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at net.sourceforge.vietocr.ImageIOHelper.getImageByteBuffer(Unknown Source)
at net.sourceforge.tess4j.Tesseract.setImage(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
I'll try and install the rest of the packages and report back.
Cheers
E
Christian,
Installing everything from trunk gives me the following:
KNIME_ocr_example 0 loaded with errors
KNIME_ocr_example 0
Unable to load node with ID suffix 2 into workflow, skipping it: Version mismatch: jar version is '24', native library version is '23'
Unable to load node with ID suffix 3 into workflow, skipping it: Could not initialize class org.knime.knip.view3d.render.LWJGLVTKInteractiveCanvas
Unable to load node with ID suffix 4 into workflow, skipping it: Could not initialize class org.knime.knip.view3d.render.LWJGLVTKInteractiveCanvas
Unable to load node with ID suffix 5 into workflow, skipping it: Could not initialize class org.knime.knip.view3d.render.LWJGLVTKInteractiveCanvas
State has changed from CONFIGURED to EXECUTED
That translates to all nodes being gone from the workflow (except for "image reader" and "list files"). I'll poke/try around a little more and let you know. Note that the "version mismatch" is thrown by my modified workflow only, where the KNIME base distro's "PNG reader" filled in for the (previously defunct) "image reader". The final result is the same in the unmodified example workflow, though.
-E