I am working on a KNIME workflow that needs to read scanned image files saved as PDFs. I was just wodering if it is possible in KNIME to extract an image from a PDF file or save a PDF file as image file? Any help would be much appreciated.
der Tika Parser node can extract inline images from PDFs. You can check the option in the dialog for that. IMages will be extracted to a specified directory.
Thank you so much for your response. I have tried Tika Parser node and it is working great for extracting the images from a pdf file. However, my problem is that I need to extract the scanned image which is saved as a pdf. The Tika parser node is not extracting the scanned image file. Is there anything that I am missing here? Thanks again for your help.
so you rather try to read your current "myimage.pdf" as an KNIME into KNIME, is this correct? Could you provide an example file? Maybe we can come-up with a solution for you!
Thank you for your reply. Please fina attached a sample PDF with a scanned image file. I need to convert this file into an image file or extract the image. In the Tika parser node I do not see any option for extracting images. Any help would be much appreciated.
While there is no build-in tool in KNIME to extract images from PDF files, you can use the pdfimages tool to do so. I attached a workflow that demonstrates this. The pdf you submitted actually contains a series of pictures which need to be combined into one, which is what I am also doing in there with KNIME nodes.
On linux, pdfimages is a part of poppler. On windows you need to install XPDF. In any case I hope this helps you with your question.
I have tried to run the workflow you attached and it generates an empty table with no images. I have installed xpdf and I am using the same pdf file which I attached earlier. I am not sure what I am missing ?
If you open the Wrapped Metanode in the workflow (CTRL + Click it) you will be able to run the workflow step by step, and see at which step the empty table occurs. If that doesn't help write me an email at gabriel.einsdorf@uni-konstanz.de and we can setup a skype call to look into it.