Hello all, I have multiple invoices converted from word to PDF. I am trying to extract the information in tabular form using KNIME. Tried using PDF Parser and Tikka Parser, nothing works. I have given a sample doc, suggestions how to go about would be timely.
@lavvenkatesh welcome to the KNIME forum. That is possible with the help of R and the package “docxtractr” like in this workflow using your example (two others are 1 | 2):
You might have to come up with the further handling of the data. Separate columns are added to indicate where the table came from. If your invoices always have the same structure you could always use the 3rd table or you could identify the tables by column names. All .DOCX files that you would place in the /data/_docx/ folder will be scanned and imported.