Merge multiple PDFs

I did a search on the forum and elsewhere and there was a promising suggestion, re using an R script to do this. I’ve never used R before… so here goes! :slight_smile:

Firstly, you’ll need to have R installed, and configured with Knime, as per the instructions found here:

https://docs.knime.com/latest/r_installation_guide/index.html

Once you’ve done everything there for your environment, you will need to run R outside of Knime (which you will have installed as part of those instructions) just to install the required packages.

So having opened R outside of Knime, type the following commands, one at a time:

install.packages('Rserve')

install.packages('pdftools')

install.packages('qpdf')

The first one on the list I think you should have already installed as per the Knime R installation instructions link. It doesn’t matter if you do it twice as far as I can tell.

When you install packages, you may be asked if you want to use your personal library (if the default R folder cannot be written to). Say yes to this if prompted.

Once that’s done, I think there’s nothing more to do outside of Knime configuration-wise but we’re going to need a couple of pdf test files.

So find yourself two pdf files. (Anything you like, but I suggest not too large for this initial test) and copy them to a new folder. For these instructions I’m going to use c:\temp\pdf as the folder, but for your environment choose your own folder and change the following instructions accordingly. If you are able to create and use c:\temp\pdf for the purposes of this test, then it will be easier for you as you shouldn’t have to edit anything.

Copy your two sample pdf files to c:\temp\pdf
rename the files as document1.pdf and document2.pdf

So you now have two pdf files in this c:\temp\pdf folder.

Now open the attached workflow.
image

KNIME_join_pdf.knwf (8.3 KB)

If your two pdf files are anywhere other than c:\temp\pdf, or if your files aren’t called document1.pdf and document2.pdf, you will need to open and configure the R Snippet node. The node contains just the following piece of R

knime.out <- knime.in

# Combine two pdf documents 
qpdf::pdf_combine(c("C:/temp/pdf/document1.pdf", 
				"C:/temp/pdf/document2.pdf"), 
		output = "C:/temp/pdf/joined_document.pdf")

image

Edit the file names to suit your test files. As you can see, windows backslashes \ in the folder names are written as forward slashes /

The command pdf_combine is a command in the qpdf package that you installed earlier. This should combine the two files and create a new pdf file called joined_document.pdf

When you are happy with any changes to the script, click Apply, and OK to close the config for the R Snippet node.

You can run this workflow and see if it works.

If it does … great… (it worked for me, so fingers crossed!). All we need then is to have this workflow modified to enable it to take column or variable data as path/file names and we have a generic pdf combiner node written with R.

So… one to hand over to the R experts on the forum who can doubtless have the answer to this faster than I can, but in the meantime I’ll continue researching, and I’ll post back when and if I have anything more. :slight_smile:

(and if the R community have any shortcuts for my “installation guide” above, or there are better ways of doing things, please let me know, as I’m an R-total-newbie!)

7 Likes