Could you please demonstrate an example of image pdf to text
Please, be more specific. Do you need to recognize text from PDF? If yes, then use
Thank you for quick reply. Can I use this workflow for Optical Character Recognition (OCR) taking images in PDF file containing texts?
Thank you for quick responce
Hi Colleagues, I am using Tess4J to treat OCR. According to your instruction I need to use only png or svg files for that. Transform pdf to png I can do using Tika Parser. Unfortunately it represents me tif inline images files instead png ones. please see fragment of scan
Sometimes it gives me png for other pdf.
Tika Parser doesn’t contain any explanation in Help regarding option – Extract inline images from PDFs.
Please let me know what should I do? Thank you in advance
Dear colleagues, please help with Tess4J component. Instead text I received such set of symbols. Please see below. I used png files and saved your settings from your example OCR_meets_SemanticWeb