Pdf to Image conversion

Hi,

 

I am working on a KNIME workflow that needs to read scanned image files saved as PDFs. I was just wodering if it is possible in KNIME to extract an image from a PDF file or save a PDF file as image file? Any help would be much appreciated.

Thanks,

Nina

Hi Nina,

der Tika Parser node can extract inline images from PDFs. You can check the option in the dialog for that. IMages will be extracted to a specified directory.

Cheers, Kilian

Hi Kilian,

Thank you so much for your response. I have tried Tika Parser node and it is working great for extracting the images from a pdf file. However, my problem is that I need to extract the scanned image which is saved as a pdf. The Tika parser node is not extracting the scanned image file. Is there anything that I am missing here? Thanks again for your help.

 

Regards,

Nina

Hi NIna,

so you rather try to read your current "myimage.pdf" as an KNIME into KNIME, is this correct? Could you provide an example file? Maybe we can come-up with a solution for you!

Best,

Christian

Hi Christian,

Thank you for your reply. Please fina attached a sample PDF with a scanned image file. I need to convert this file into an image file or extract the image. In the Tika parser node I do not see any option for extracting images. Any help would be much appreciated.

Regards,

Nina

Hi Nina,

While there is no build-in tool in KNIME to extract images from PDF files, you can use the pdfimages tool to do so. I attached a workflow that demonstrates this. The pdf you submitted actually contains a series of pictures which need to be combined into one, which is what I am also doing in there with KNIME nodes.

On linux, pdfimages is a part of poppler. On windows you need to install XPDF. In any case I hope this helps you with your question.

best,

Gabriel

1 Like

Hi Gabriel,

Thank you so much for your detailed reply. I will give it a try. Thanks again for your time.

Regards,

Nina

Hi Gabriel,

I have tried to run the workflow you attached and it generates an empty table with no images. I have installed xpdf and I am using the same pdf file which I attached earlier. I am not sure what I am missing ?

Regards,

Nina

Hi Nina,

If you open the Wrapped Metanode in the workflow (CTRL + Click it) you will be able to run the workflow step by step, and see at which step the empty table occurs. If that doesn't help write me an email at gabriel.einsdorf@uni-konstanz.de and we can setup a skype call to look into it.

best,

Gabriel

1 Like