PDF Data capturing

Hi KNIME Members,

I have a pdf file, i need the data highlighted in yellow as end result. I have attached the sample pdf file for reference.

I tried with pdf parser node, but was not successful. I was able to read the data, create either text or csv file using R programming and then read that data and use multiple nodes as shown in the below workflow, but i have the some blank columns to be removed to achieve the end result.

Also i am not able to integrate R script in KNIME. error screen shot provided below.

Could someone help me on this please.

PDF Data.knwf (162.6 KB)
test.zip (123.2 KB)


1 Like

Hi @pawanmtm -

Two things I would suggest to you.

  1. Instead of the PDF Parser node, try the Tika Parser node. I was able to get that to work.

  2. For your R code, you need to coerce the data variable from a string to a data frame. You can do this using the as.data.frame() function. Then once you pass the data frame to knime.out, it works.

Hope this helps!

Hi Scott,

I tried with Tika Parser noce, i was able to read the data, entire data is read as 1 row (string). Now i will use manipulation nodes to get the required output.

I have not tried the 2nd point as i am able to read the data using KNIME, so i don’t think R script is required now.

Thanks a ton for your assistance. KNIME is getting popular because of contribution from people like you.