Extract Data from PDF


I need to extract text and photos from pdf and format them to look like in the file Result.xlsx, also pictures like in pdf, need to be written in separate cells of the excel file. I’m new to Knime, any help would be greatly appreciated. And I don’t know how to upload a pdf file here, so I’ll throw the same pdf in docx format


This is an image-based PDF, so you will need to use the Tika Parser and extract the images, then use the Tess4J node to deal with OCRing your images.

Here is resource to help you understand this problem in greater detail:


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.