I’ts possible to read the information of a PDF file specifying the page range?
I have a PDF with information about one person every two pages, that is, I need to iterate two pages only, to analyze the information there.
I am trying to do it with tika parser, but this takes all the PDF information in a single row.
Does anyone know how I can configure it to achieve what I want?
Or does anyone know how to do it other than with tika parser?
The PDF is in a standard format, so i know how to extract the information there with string manipulation, but I need to get from the entire full PDF only two pages for each iteration that i do, since every 2 pages, there is a new person to parse so i can extract the information how i want to.
Yes, I plan to use a loop, with the number of pages of takes per iteration as a variable. I know how to do it, but I don’t know how to change the code, because it only takes one page, I need it to take a range of pages (which I would later convert to variable).
I can’t show you an example of a PDF, because it contains real information of people, but don’t worry about its structure, because the only thing I don’t know is the change in the code mentioned above about the range in the number of pages to take for each iteration, after this, as you say, I see the information as output from the knime variable, not as a table, and then I am proceeding to do string manipulation to extract the information I want.
Do you know what change I must make in the code to read not only a page, but a range of PDF pages?
pdt, the PDF only contains 50 pages, it is not so extensive.