Rob Siegwart - Merging and Watermarking PDF Files with Python

There are many Python libraries available for working with PDF files. Here we will perform merging and watermarking operations with the Python library pdfrw. This package is bundled in the WinPython distribution.

It is assumed that your path has been set to find the correct Python interpreter such that pdfrw can be imported.

Merging

A merging script can be created in only a few lines of code. Let's create a script that will merge all of the PDF files within a directory, using a command line argument to specify the output file name. Locate the script in the same directory as the files, and it will grab all of the files with a .pdf extension and then combine those (in the same order as if issuing an 'ls' command) into a single output PDF file.

# merge.py

import sys
from glob import glob
from pdfrw import PdfReader, PdfWriter

def main():
    output_file = sys.argv[1]
    files = glob('*.pdf')
    writer = PdfWriter()
    for each in files:
        pdf = PdfReader(each)
        writer.addpages(pdf.pages)
    writer.write(output_file)

if __name__ == '__main__':
    main()

This can then be run with:

python merge.py <output file>

in the terminal or command prompt.

Watermarking

With watermarking, a separate pdf file (the watermark) is overlaid on top of another pdf file (the base file). For the watermark to show up at the correct location the watermark file must be the same size as the base pdf file. With this script the first argument will be the input file to apply the watermark to, and a second argument which is the name of the pdf file to use as the watermark. Again the script is to reside in the same directory as the files.

# watermark.py

import sys
from pdfrw import PdfReader, PdfWriter, PageMerge

def main():
    input_file = sys.argv[1]
    watermark_file = sys.argv[2]

    input_pdf = PdfReader(input_file)
    watermark = PdfReader(watermark_file).pages[0]

    for page in input_pdf.pages:
        merger = PageMerge(page)
        merger.add(watermark).render()

    writer = PdfWriter()
    writer.write(input_file.split('.pdf')[0]+'_w.pdf',input_pdf)

if __name__ == '__main__':
    main()

To run, in a terminal or command prompt issue:

python watermark.py <input file> <watermark file>