PDF Parser / Tika Parser - Page numbers

Shivakumar · November 29, 2021, 1:20pm

Dear All, I am new to Knime data analytics platform. I would like to know if PDF parser/ Tika parser has capabilities to provide page numbers in the output. I am searching for few codes in PDF’s and once the PDF parser or Tika parser identifies the codes it should provide page number information along with the output.

Kathrin · November 30, 2021, 5:18pm

Dear @Shivakumar,

Welcome to the KNIME Community

The PDF Parser and Tika Parser node don’t have the option to automatically extract the page number, but you can add the page number with a couple of nodes and I build a little example for you:

The idea that I implemented assumes that after splitting the parsed pdf into different lines, the page information is isolated in one row and there is not other line in the pdf that has only a number.

Please have a look at the workflow and let me know in case you have any questions.
Cheers
Kathrin

system · June 1, 2022, 5:18am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.