
In this article, we will examine how to resize a PNG image using Python, a versatile programming language renowned for its simplicity and power, along
Learn how to resize and compress PDF documents using Python. This guide covers simple methods with pdf2image
, Pillow
, and more advanced tools like pikepdf
and Ghostscript
.
When it comes to managing PDFs, one common task is resizing or compressing these documents to save storage space, improve load times, or meet size requirements for uploads. Fortunately, Python provides powerful libraries that allow us to automate this process efficiently. In this blog post, we’ll cover how to resize and compress PDF documents using Python. We’ll dive into a solution that works for most PDF documents, explaining both the concepts and the tools used.
Why Compress or Resize PDF Documents? |
Before jumping into the code, let’s quickly explore why you might want to compress or resize a PDF file:
|
|
|
Tools We’ll Use |
In this post, we’ll use the following Python libraries:
|
|
|
Let’s walk through how you can resize and compress PDFs using Python. In this guide, we’ll focus on compressing PDFs by converting each page into an image, resizing the images, and then recombining the images into a PDF.
1. Install Required Libraries |
First, you need to install the required Python3 libraries. Open your terminal or command prompt and run:
sudo dnf install -y python3-pip && pip3 install pdf2image Pillow
Defaulting to user installation because normal site-packages is not writeable
Collecting pdf2image
Downloading pdf2image-1.17.0-py3-none-any.whl (11 kB)
Collecting Pillow
Downloading pillow-11.1.0-cp39-cp39-manylinux_2_28_x86_64.whl (4.5 MB)
|████████████████████████████████| 4.5 MB 13.0 MB/s
Installing collected packages: Pillow, pdf2image
Successfully installed Pillow-11.1.0 pdf2image-1.17.0
You’ll also need to install Poppler for pdf2image
to work. On Ubuntu, you can do this with:
sudo apt-get install -y poppler-utils
On Red Hat Enterprise Linux (RHEL)/CentOS or RPM-based distributions, you can do this with:
sudo dnf install -y poppler-utils
Photo by admingeek from Infotechys
For macOS, use:
brew install poppler
2. Convert PDF to Images and Compress |
We’ll use pdf2image
to convert each PDF page into an image, then compress it using Pillow. The following script shows how to perform these tasks (using your preferred text editor, open a file called compress_pdf.py
).
sudo vim compress_pdf.py
Copy and paste the following content. Then, save and exit the file:
import sys
import io
from pdf2image import convert_from_path
from PIL import Image
def compress_pdf(input_pdf, output_pdf, image_quality=50):
try:
# Convert PDF pages to images
images = convert_from_path(input_pdf)
# Compress each image
compressed_images = []
for img in images:
# Compress the image by reducing its quality
img = img.convert("RGB")
compressed_image_io = io.BytesIO()
img.save(compressed_image_io, format="JPEG", quality=image_quality)
compressed_image = Image.open(compressed_image_io)
compressed_images.append(compressed_image)
# Save the compressed images as a PDF
compressed_images[0].save(output_pdf, save_all=True, append_images=compressed_images[1:])
print(f"Compressed PDF saved as {output_pdf}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python compress_pdf.py ")
else:
input_pdf = sys.argv[1]
output_pdf = sys.argv[2]
compress_pdf(input_pdf, output_pdf)
3. How the Script Works |
Here’s a breakdown of the script:
Convert PDF to Images |
We use pdf2image.convert_from_path(input_pdf)
to convert each page of the PDF into a Pillow image object.
Compress the Images |
We iterate through the list of images and use the Pillow
library to save each image as a JPEG with reduced quality. You can adjust the image_quality
(from 0 to 100) to control the degree of compression.
Recombine the Images into a PDF |
After compressing the images, we save them as a new PDF using Pillow's save()
function, where save_all=True
ensures all pages are included.
To run the script, use the following command:
python3 compress_pdf.py ~/path/to/original_doc.pdf ~/path/to/compressed_doc.pdf
This will take the original_doc.pdf
file, compress the images within it, and save the output as compressed_doc.pdf
.
5. Customizing Compression |
The image_quality
parameter in the script controls how much compression is applied to the images. The value can range from 0 (lowest quality) to 100 (highest quality). A lower value will result in a smaller file size but poorer image quality. For example, to apply more aggressive compression, use:
compress_pdf(input_pdf, output_pdf, image_quality=30)
6. Why This Method Works |
The image_quality
parameter in the script controls how much compression is applied to the images. The value can range from 0 (lowest quality) to 100 (highest quality). A lower value will result in a smaller file size but poorer image quality. For example, to apply more aggressive compression, use:
Handles All Types of PDFs |
By converting each PDF page to an image, we avoid the complexity of dealing with different types of content (text, vector images, etc.). This method works for any PDF document, including those with scanned pages or complex layouts.
Customizable Compression |
You can fine-tune the compression level based on your needs. Lower quality will result in smaller file sizes but might degrade the quality of the document.
Simple to Use |
The script is easy to run from the command line, making it a convenient solution for automating PDF compression tasks.
While converting a PDF to images and recombining them is a simple and effective method for compressing PDFs, there are other advanced techniques to reduce PDF size:
1. Remove Unused Objects with pikepdf |
If your PDF contains unnecessary metadata, unused fonts, or orphaned objects, you can use the pikepdf library to clean up the PDF structure. Here’s an example that removes unreferenced objects from the PDF:
import pikepdf
def optimize_pdf(input_pdf, output_pdf):
with pikepdf.open(input_pdf) as pdf:
pdf.remove_unreferenced()
pdf.save(output_pdf)
print(f"Optimized PDF saved as {output_pdf}")
# Example usage
optimize_pdf("input.pdf", "optimized_output.pdf")
2. Using Ghostscript for Advanced Compression |
For even more advanced compression options, Ghostscript is a popular tool that can be used to compress PDF files by adjusting image resolution and compression settings. You can run it via the command line:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output_compressed.pdf input_file.pdf
-dPDFSETTINGS=/screen
: This setting applies compression with a lower image resolution for smaller file sizes.
3. Use QPDF to Optimize Streams |
QPDF is another powerful tool for optimizing PDFs. It allows you to compress streams within the PDF:
qpdf input_file.pdf --compress-streams=y --output output_compressed.pdf
This command reduces the file size by compressing streams within the PDF.
In this post, we’ve shown how to resize and compress PDF documents using Python. By converting PDF pages to images, compressing those images, and then saving them back into a PDF, we can significantly reduce file sizes. Additionally, we’ve covered more advanced techniques like using pikepdf, Ghostscript, and QPDF to further optimize PDF files.
Did you find this article useful? Your feedback is invaluable to us! Please feel free to share this post–along with your thoughts in the comments section below.
In this article, we will examine how to resize a PNG image using Python, a versatile programming language renowned for its simplicity and power, along
In this tutorial, we’ll explore how to leverage Python to convert PNG images to WebP format, empowering you to enhance your website’s performance and boost
Have you ever wondered how search engines gather all that information from the internet? Learn how to build your own web crawler in Python and