Merge PDF files using Python

Posted by jason on Sept. 28, 2011, 6:16 a.m.
Tags: programming python

Let me start this post with a disclaimer: I know there are tons of tools to do what I've done here, but I love python and look for any opportunity to write python scripts to keep my skills sharp. The script is 27 lines long, it does what I need it to do, and it's flexible enough to be worth sharing.

Dependencies

I use the following libraries:

  • pyPdf
  • argparse

Program Details

This script takes 3 parameters:

  • Files: A list of files, all within quotes and separated by commas--e.g., "file1.pdf, file2.pdf".
  • Path: The directory where the pdf files reside.
  • Output: The full path and filename to the output pdf file.

An example run would look like this:

[jason@jason ~]$ python2 mergepdfs.py --files "test1.pdf, test2.pdf, test3.pdf" --path /home/jason/pdfs/ --output /home/jason/merged.pdf

The Code

Here's the script. There's a download link at the bottom of this page.

import pyPdf
import argparse

#Parse the command-line arguments, and assign them to the appropriate variables.
parser = argparse.ArgumentParser(description='Blindly merge multiple PDF files together.')

parser.add_argument('-f', '--files', dest='files', default='', help='The base pdf filenames to merge, separated by commas.')
parser.add_argument('-p', '--path', dest='path', default='', help='The path to where the pdf files reside, with trailing slash.')
parser.add_argument('-o', '--output', dest='output', default='', help='The full path to the output pdf file.')

args = parser.parse_args()

filenames = args.__dict__['files'].split(',')
path = args.__dict__['path'].strip()
output_filename = args.__dict__['output'].strip()

output = pyPdf.PdfFileWriter()

for filename in filenames:
    input = pyPdf.PdfFileReader(file(path + filename.strip(), "rb"))
    for page in input.pages:
        output.addPage(page)


outputstream = file(output_filename, "wb")
output.write(outputstream)
outputstream.close()

0 comments