Unable to read Non-OCR PDFs

victor_palacios · July 12, 2022, 3:43pm

Hi, I’m the PDF guy on the forum. I’ve never heard of a non-OCR PDF. What is that exactly?

We recently had a PDF extraction event via Data Connect. The slides can be found here .

For PDFs, you may also find that the tika parser is better for extraction (but it depends on how/what you want to extract).

As well, we did PDF extraction in a Just KNIME It challenge:

KNIME Hub

Extracting a Table from a PDF – alinebessa

Given a text-based PDF document with a table, can you partially extract the table into a KNIME data table for further analysis? For this challenge we will extr…

And see community solutions as well.

Please see those examples and then post your workflow so we can diagnosis the issue and so we can provide the best possible answers. Thank you!