How to read multiple lines from PDF File

mlauber71 · August 30, 2022, 4:02pm

@Subramanyam you could try and use the R package pdftools to extract the text. It will also extract the content of the table but there might be better ways to deal with them as we already have discussed - Solutions to "Just KNIME It!" Challenge 15, Extract Table from PDF with the help of R "tabulizer" and KNIME – KNIME Hub).

Every page will be in a table row so you might take it from there and start splitting and manipulating the header in order to get your information.

kn_example_r_pdf_read_text.knwf (95.6 KB)

Then I would agree with what @ScottF says. It is difficult to follow what you want to do and at which point the community might be able to help. BTW you can enclose data within the node (preferably in the subfolder /data/) so you can have a complete example that other people might be able to run.