Extract data from PDF invoice

@lavvenkatesh welcome to the KNIME forum. That is possible with the help of R and the package “docxtractr” like in this workflow using your example (two others are 1 | 2):

The content from your example is split into separate tables and then exported to KNIME tables or Excel sheets:

You might have to come up with the further handling of the data. Separate columns are added to indicate where the table came from. If your invoices always have the same structure you could always use the 3rd table or you could identify the tables by column names. All .DOCX files that you would place in the /data/_docx/ folder will be scanned and imported.

Admittedly this approach might not be totally intuitive, but once you have familiarised yourself with some R and KNIME you gain a very powerful tool.

4 Likes