Extract Data from PDF

Hello,

I need to extract text and photos from pdf and format them to look like in the file Result.xlsx, also pictures like in pdf, need to be written in separate cells of the excel file. I’m new to Knime, any help would be greatly appreciated. And I don’t know how to upload a pdf file here, so I’ll throw the same pdf in docx format

Hello,

This is an image-based PDF, so you will need to use the Tika Parser and extract the images, then use the Tess4J node to deal with OCRing your images.

Here is resource to help you understand this problem in greater detail:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.