Extract Data from PDF

valerii_0192 · August 25, 2022, 12:14pm

Hello,

I need to extract text and photos from pdf and format them to look like in the file Result.xlsx, also pictures like in pdf, need to be written in separate cells of the excel file. I’m new to Knime, any help would be greatly appreciated. And I don’t know how to upload a pdf file here, so I’ll throw the same pdf in docx format

victor_palacios · August 25, 2022, 2:47pm

Hello,

This is an image-based PDF, so you will need to use the Tika Parser and extract the images, then use the Tess4J node to deal with OCRing your images.

Here is resource to help you understand this problem in greater detail:

system · November 23, 2022, 2:47pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.