Dear All, I am new to Knime data analytics platform. I would like to know if PDF parser/ Tika parser has capabilities to provide page numbers in the output. I am searching for few codes in PDF’s and once the PDF parser or Tika parser identifies the codes it should provide page number information along with the output.
Welcome to the KNIME Community
The PDF Parser and Tika Parser node don’t have the option to automatically extract the page number, but you can add the page number with a couple of nodes and I build a little example for you:
The idea that I implemented assumes that after splitting the parsed pdf into different lines, the page information is isolated in one row and there is not other line in the pdf that has only a number.
Please have a look at the workflow and let me know in case you have any questions.