Any process to extract the particular details from PDF FIle

Hi Team,

I have designed the workflow which is working fine in extracting the details from PDF and creating the output. But if I use the same workflow for another PDF File it is generating the output as not expected, there is a difference in the positions of the Extracted details from PDF file using Tika Parser.

So could you please kindly suggest any new process in extracting the Delivery Date, Quantity and Amount values from the first line in the PDF File.

Attaching the workflow and the PDF File with which it is working fine and also the PDF File for which it is not providing the expected output.

PO 4502617106.pdf.doc (59.0 KB) PDF Files for which it is not providing the expected output.

Automation-IDOC–Singapore—Main Workflow.knwf (380.7 KB)

SG 4503089325.pdf.docx (63.2 KB) PDF file with which the workflow generating perfect output.

Note: The PDF Files are of dummy data.

Thanks in advance,
Subramanyam Kinthada.

Hi @Subramanyam -

You have again posted a rather lengthy workflow which is going to be almost impossible for someone else to troubleshoot unless they are familiar with your data. You provide two different files which generate unexpected and perfect output, but we have no context from you on WHY the output is good or bad, merely that it is different.

To get help here on the forum, I am afraid you will have to break down your workflow into smaller chunks, to isolate where things are going wrong. And, you must be SPECIFIC about what is going wrong. Which characters are out of place, which fields are not generated, etc.

It’s possible you may benefit from a KNIME consulting partner for this level of detail. If you are interested in that, I can get some recommendations for you.You get what you pay for and all that. :slight_smile:


@Subramanyam I still would recommend to try and use the mentioned R tools to extract content and text and table from pdf files.


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.