I did a search on the forum and elsewhere and there was a promising suggestion, re using an R script to do this. I’ve never used R before… so here goes!
Firstly, you’ll need to have R installed, and configured with Knime, as per the instructions found here:
https://docs.knime.com/latest/r_installation_guide/index.html
Once you’ve done everything there for your environment, you will need to run R outside of Knime (which you will have installed as part of those instructions) just to install the required packages.
So having opened R outside of Knime, type the following commands, one at a time:
install.packages('Rserve')
install.packages('pdftools')
install.packages('qpdf')
The first one on the list I think you should have already installed as per the Knime R installation instructions link. It doesn’t matter if you do it twice as far as I can tell.
When you install packages, you may be asked if you want to use your personal library (if the default R folder cannot be written to). Say yes to this if prompted.
Once that’s done, I think there’s nothing more to do outside of Knime configuration-wise but we’re going to need a couple of pdf test files.
So find yourself two pdf files. (Anything you like, but I suggest not too large for this initial test) and copy them to a new folder. For these instructions I’m going to use c:\temp\pdf as the folder, but for your environment choose your own folder and change the following instructions accordingly. If you are able to create and use c:\temp\pdf for the purposes of this test, then it will be easier for you as you shouldn’t have to edit anything.
Copy your two sample pdf files to c:\temp\pdf
rename the files as document1.pdf and document2.pdf
So you now have two pdf files in this c:\temp\pdf folder.
Now open the attached workflow.
KNIME_join_pdf.knwf (8.3 KB)
If your two pdf files are anywhere other than c:\temp\pdf, or if your files aren’t called document1.pdf and document2.pdf, you will need to open and configure the R Snippet node. The node contains just the following piece of R
knime.out <- knime.in
# Combine two pdf documents
qpdf::pdf_combine(c("C:/temp/pdf/document1.pdf",
"C:/temp/pdf/document2.pdf"),
output = "C:/temp/pdf/joined_document.pdf")
Edit the file names to suit your test files. As you can see, windows backslashes \ in the folder names are written as forward slashes /
The command pdf_combine is a command in the qpdf package that you installed earlier. This should combine the two files and create a new pdf file called joined_document.pdf
When you are happy with any changes to the script, click Apply, and OK to close the config for the R Snippet node.
You can run this workflow and see if it works.
If it does … great… (it worked for me, so fingers crossed!). All we need then is to have this workflow modified to enable it to take column or variable data as path/file names and we have a generic pdf combiner node written with R.
So… one to hand over to the R experts on the forum who can doubtless have the answer to this faster than I can, but in the meantime I’ll continue researching, and I’ll post back when and if I have anything more.
(and if the R community have any shortcuts for my “installation guide” above, or there are better ways of doing things, please let me know, as I’m an R-total-newbie!)