PDF Data capturing

pawanmtm · November 15, 2018, 5:52pm

Hi KNIME Members,

I have a pdf file, i need the data highlighted in yellow as end result. I have attached the sample pdf file for reference.

I tried with pdf parser node, but was not successful. I was able to read the data, create either text or csv file using R programming and then read that data and use multiple nodes as shown in the below workflow, but i have the some blank columns to be removed to achieve the end result.

Also i am not able to integrate R script in KNIME. error screen shot provided below.

Could someone help me on this please.

PDF Data.knwf (162.6 KB)
test.zip (123.2 KB)

Regards,
Pavan.

ScottF · November 15, 2018, 7:31pm

Hi @pawanmtm -

Two things I would suggest to you.

Instead of the PDF Parser node, try the Tika Parser node. I was able to get that to work.
For your R code, you need to coerce the data variable from a string to a data frame. You can do this using the as.data.frame() function. Then once you pass the data frame to knime.out, it works.

Hope this helps!

pawanmtm · November 17, 2018, 4:57am

Hi Scott,

I tried with Tika Parser noce, i was able to read the data, entire data is read as 1 row (string). Now i will use manipulation nodes to get the required output.

I have not tried the 2nd point as i am able to read the data using KNIME, so i don’t think R script is required now.

Thanks a ton for your assistance. KNIME is getting popular because of contribution from people like you.

Regards,
Pavan.