Merge multiple PDFs

Hi,

I am fairly new to KNIME and I was wondering if there was a way to merge multiple PDFs into one PDF? For example if you had pdf1 and pdf2 as your input and your output would be pdf3 which has both pdfs inside it.

1 Like

I also have the same question. Wondering if a certain node can do this.

Hi there -

We periodically get requests for this, but as of now we don’t have a node that directly supports combining PDFs. To implement something like this in KNIME you might consider calling a 3rd-party program from the command line using the External Tool or Bash nodes.

1 Like

I did a search on the forum and elsewhere and there was a promising suggestion, re using an R script to do this. I’ve never used R before… so here goes! :slight_smile:

Firstly, you’ll need to have R installed, and configured with Knime, as per the instructions found here:

https://docs.knime.com/latest/r_installation_guide/index.html

Once you’ve done everything there for your environment, you will need to run R outside of Knime (which you will have installed as part of those instructions) just to install the required packages.

So having opened R outside of Knime, type the following commands, one at a time:

install.packages('Rserve')

install.packages('pdftools')

install.packages('qpdf')

The first one on the list I think you should have already installed as per the Knime R installation instructions link. It doesn’t matter if you do it twice as far as I can tell.

When you install packages, you may be asked if you want to use your personal library (if the default R folder cannot be written to). Say yes to this if prompted.

Once that’s done, I think there’s nothing more to do outside of Knime configuration-wise but we’re going to need a couple of pdf test files.

So find yourself two pdf files. (Anything you like, but I suggest not too large for this initial test) and copy them to a new folder. For these instructions I’m going to use c:\temp\pdf as the folder, but for your environment choose your own folder and change the following instructions accordingly. If you are able to create and use c:\temp\pdf for the purposes of this test, then it will be easier for you as you shouldn’t have to edit anything.

Copy your two sample pdf files to c:\temp\pdf
rename the files as document1.pdf and document2.pdf

So you now have two pdf files in this c:\temp\pdf folder.

Now open the attached workflow.
image

KNIME_join_pdf.knwf (8.3 KB)

If your two pdf files are anywhere other than c:\temp\pdf, or if your files aren’t called document1.pdf and document2.pdf, you will need to open and configure the R Snippet node. The node contains just the following piece of R

knime.out <- knime.in

# Combine two pdf documents 
qpdf::pdf_combine(c("C:/temp/pdf/document1.pdf", 
				"C:/temp/pdf/document2.pdf"), 
		output = "C:/temp/pdf/joined_document.pdf")

image

Edit the file names to suit your test files. As you can see, windows backslashes \ in the folder names are written as forward slashes /

The command pdf_combine is a command in the qpdf package that you installed earlier. This should combine the two files and create a new pdf file called joined_document.pdf

When you are happy with any changes to the script, click Apply, and OK to close the config for the R Snippet node.

You can run this workflow and see if it works.

If it does … great… (it worked for me, so fingers crossed!). All we need then is to have this workflow modified to enable it to take column or variable data as path/file names and we have a generic pdf combiner node written with R.

So… one to hand over to the R experts on the forum who can doubtless have the answer to this faster than I can, but in the meantime I’ll continue researching, and I’ll post back when and if I have anything more. :slight_smile:

(and if the R community have any shortcuts for my “installation guide” above, or there are better ways of doing things, please let me know, as I’m an R-total-newbie!)

7 Likes

Moving this to the main AP forum for better visibility, and also so more folks see @takbb 's solution :slight_smile:

3 Likes

Thanks @ScottF , I’ve looked into how to pass flow variables as parameters to the R snippet:
image

and if put this into a demonstration workflow:

5 Likes

Hello @takbb,

when linking documentation (if not specific to exact version) you can use latest instead of yyyy-MM. This way link will always lead to latest guide.

And BTW nice solution but even more impressed with willingness and readiness to learn/tackle R :+1:

Br,
Ivan

3 Likes

Hi @ipazin, Thanks for the tip re documentation. I hadn’t realised there was a “latest” folder as I just found it via a google search (cos I’m lazy :wink: ). I’ll use that for my next “tutorial” :rofl:

Re willingness to learn/tackle R… well all I can say is my second programming language after Sinclair Basic on my ZX81 back in the eighties, was Forth on a Jupiter Ace (here’s a little self-plug of my mis-spent youth! Lunar Lander (Astrian Descent) by Brian Bates)… so R feels really verbose, lol!

(and thanks for updating the link in my post. I don’t think I could do it as my post was too old)

3 Likes

Just looooooving the sound! :sweat_smile:

2 Likes

haha… who needs mp3? :wink:

1 Like

I think any language is verbose compared to Forth! I learned a little bit by building a ‘Fignition’ computer a few years back (www.fignition.com) and I was so impressed with its novel structure. I can see why it never caught on, but huge kudos to those that build things with it.

1 Like

Well, I always used 3rd party pdf editors for that. Knime needs too many manipulations for such an easy job. It is much faster and easier to use https://pdfchef.com/merge-pdf.html to merge pages fastly and online.

Forgot to share with you the app I used. Told you about it but forgot to share the link)

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.