Image PDF to text

#1

Could you please demonstrate an example of image pdf to text

0 Likes

#2

Please, be more specific. Do you need to recognize text from PDF? If yes, then use


or

1 Like

#3

Thank you for quick reply. Can I use this workflow for Optical Character Recognition (OCR) taking images in PDF file containing texts?

0 Likes

#4

For OCR look here
https://www.knime.com/book/knime-image-processing-tesseract-ocr-extension

1 Like

#5

Thank you for quick responce

1 Like

#6

Hi Colleagues, I am using Tess4J to treat OCR. According to your instruction I need to use only png or svg files for that. Transform pdf to png I can do using Tika Parser. Unfortunately it represents me tif inline images files instead png ones. please see fragment of scan


Sometimes it gives me png for other pdf.
Tika Parser doesn’t contain any explanation in Help regarding option – Extract inline images from PDFs.
Please let me know what should I do? Thank you in advance

0 Likes

#7
1 Like

#8

Dear colleagues, please help with Tess4J component. Instead text I received such set of symbols. Please see below. I used png files and saved your settings from your example OCR_meets_SemanticWeb
image

0 Likes