Tess4j for chinese Execute failed: Invalid memory access

Hi
I’m using Tess4j to OCR the image into Chinese. When I use the default function for English characters, it works fine. However, when I export the external training data and OCR the image to Chinese, an error came out: Execute failed: Invalid memory access.

If anyone met the same problem and has solution, please let me know. Thank you so much.

(upload://hnM1Z7UJi7Xe5WGMF6VOQ4UlBSm.png)

Best

Tao

Hi Tap,

the image upload seemed to have failed, can you reupload it?

Thank you! Iris

Hi @thu,

If possible, could you provide a workflow with some example data to reproduce the issue? Also, it would be very helpful, if you could also provide the log file (see https://www.knime.com/node/20488 for more information) which should contain a more detailed error message. With that, we should be able trace the source of the error message!

Best,
Stefan

Hi Iris

Thank you for reply. I found the problem may be the training data for Chinese text is not right.

Could you give me any suggestion about how to use the Tess4J to OCR chinese characters? Thanks.

Best,

Tao

1 Like

Hi Stefan

Thank you for reply. I found the problem may be the training data for Chinese text is not right.

Could you give me any suggestion about how to use the Tess4J to OCR chinese characters? Thanks.

Best,

Tao

1 Like

Hi Tao,

I haven’t verified if this works, but it is definitely worth a try:

  1. Download either the Chinese Traditional or Chinese Simplified archive
  2. Extract the archive to any folder and enter the folder
  3. Open your installation folder of KNIME Analytics Platform in a second window
  4. You should find a plugins/org.knime.knip.tess4j.base_1.3.3.v201906051307/tessdata folder in there
  5. Copy chi_tra.traineddata or chi_sim.traineddata from the extracted archive into the tessdata folder in your KNIME installation

Once you have restarted KNIME, it should pick up the new training data and should be able to recognize Chinese. If it still doesn’t work, please follow the instructions of my previous post and provide additional information.

Best,
Stefan

Hi Stelfrich,

Could you please re-post the two archive files? I couldn’t download them now. Thanks!

Best regards,
CY

Hi @christine_ywl,

You should be able to get them from

Best,
Stefan

1 Like

Hi Stefan,

Thanks! I tried to use chi_sim to recognize a PDF file but encounter the following error message:-

ERROR Tess4J 3:137 Execute failed: Invalid memory access

Could you please advise what I should do to get this resolve?

Best regards,
CY

I have linked to the data files for Tesseract 4.x but the KNIME extensions uses version 3.x. Could you try the two files from GitHub - tesseract-ocr/tessdata at 3.04.00?

Best,
Stefan

I replaced the files but I still cannot execute it. The following are the error messages:-

ERROR Tess4J 3:137 Error initializing Tesseract.
ERROR Tess4J 3:137 Execute failed: Invalid memory access