Merge PDF files using Python
Posted by jason on Sept. 28, 2011, 6:16 a.m.
Tags: programming python
Let me start this post with a disclaimer: I know there are tons of tools to do what I've done here, but I love python and look for any opportunity to write python scripts to keep my skills sharp. The script is 27 lines long, it does what I need it to do, and it's flexible enough to be worth sharing.
Dependencies
I use the following libraries:
- pyPdf
- argparse
Program Details
This script takes 3 parameters:
- Files: A list of files, all within quotes and separated by commas--e.g., "file1.pdf, file2.pdf".
- Path: The directory where the pdf files reside.
- Output: The full path and filename to the output pdf file.
An example run would look like this:
[jason@jason ~]$ python2 mergepdfs.py --files "test1.pdf, test2.pdf, test3.pdf" --path /home/jason/pdfs/ --output /home/jason/merged.pdf
The Code
Here's the script. There's a download link at the bottom of this page.
import pyPdf
import argparse
#Parse the command-line arguments, and assign them to the appropriate variables.
parser = argparse.ArgumentParser(description='Blindly merge multiple PDF files together.')
parser.add_argument('-f', '--files', dest='files', default='', help='The base pdf filenames to merge, separated by commas.')
parser.add_argument('-p', '--path', dest='path', default='', help='The path to where the pdf files reside, with trailing slash.')
parser.add_argument('-o', '--output', dest='output', default='', help='The full path to the output pdf file.')
args = parser.parse_args()
filenames = args.__dict__['files'].split(',')
path = args.__dict__['path'].strip()
output_filename = args.__dict__['output'].strip()
output = pyPdf.PdfFileWriter()
for filename in filenames:
input = pyPdf.PdfFileReader(file(path + filename.strip(), "rb"))
for page in input.pages:
output.addPage(page)
outputstream = file(output_filename, "wb")
output.write(outputstream)
outputstream.close()