Hi, I want to extract the table from multiple pdf files and write it to excel. The first page of the pdf contains some text and from 2nd page onwards the table, while in some pdfs the table starts from page one itself. The issue is that from all the records present in the table some records are not extracted i.e. from 6000 rows only 4000 rows are extracted rest are omitted. What can be done to get all the present records? Can anyone guide me with this. For different pdfs the column orders can be different and also extra column can also be there. I can’t seem to find standard solution for this.

PDF parsing and Extracting Data

mlauber71 March 21, 2022, 12:21pm 6

@shivani_soni there was an example and a discussion doing this with just KNIME nodes. But I think I was not able to use that whith the example I have cited. Maybe you could give it a try or provide us with an example: