Skip to main content
AAuraPDF
PDF Optimization & Performance10 min read

PDF Compression Explained — How File Size Reduction Works

PDF compression reduces file size using two complementary strategies: lossless methods (FLATE/ZIP, LZW) that shrink data without any quality loss, and lossy methods (JPEG compression, image downsampling) that significantly reduce size by selectively discarding imperceptible detail. Modern tools routinely achieve 50–90% size reduction.

AuraPDF TeamMarch 29, 2026

How PDF Compression Works

PDF compression reduces file size by encoding document data more efficiently. Unlike compressing a PDF into a ZIP archive (which wraps the file in an external container), internal PDF compression operates on the individual content streams within the file — text, images, fonts, and metadata are each compressed independently using the most appropriate algorithm.

The PDF specification (ISO 32000) supports multiple compression filters that can be applied to any content stream. A single PDF can use different compression methods for different objects — FLATE for text and vector graphics, JPEG for photographs, and CCITT Group 4 for monochrome scanned pages.

According to Adobe's engineering documentation, images typically constitute 60–80% of a PDF's total file size. This means that image-focused compression strategies yield the most dramatic results. A 25 MB presentation PDF containing 40 high-resolution photographs can often be reduced to 3–5 MB through intelligent image resampling — a reduction of 80–88%.

Understanding the distinction between lossless and lossy compression is fundamental to choosing the right approach for your documents.

Lossless Compression Methods

Lossless compression reduces file size without discarding any data. The decompressed output is bit-for-bit identical to the original input. PDF supports several lossless filters:

  • FLATE (Deflate/ZIP) — The most widely used compression filter in modern PDFs. Based on the same algorithm used in ZIP archives and PNG images, FLATE combines LZ77 dictionary encoding with Huffman coding to eliminate redundant byte sequences. Typical compression ratios for text content: 40–60% size reduction. According to the zlib documentation, FLATE operates at compression levels 1 (fastest) through 9 (smallest), with level 6 offering the best speed-to-ratio balance.
  • LZW (Lempel-Ziv-Welch) — An older dictionary-based algorithm that was common in PDFs created before 2000. LZW builds a translation table of recurring patterns during encoding. While effective, it was historically encumbered by patent restrictions (Unisys, expired 2004) and has been largely supplanted by FLATE in modern PDF creation tools.
  • Run-Length Encoding (RLE) — Compresses sequences of identical bytes. Efficient for images with large areas of uniform color (e.g., black-and-white diagrams) but ineffective for photographs. RLE typically achieves 10–30% reduction on suitable content.
  • CCITT Group 4 — A lossless fax compression algorithm optimized for monochrome (1-bit) images. Achieves exceptional compression ratios on scanned black-and-white text — often 90–95% reduction — because text pages contain mostly white space. Used extensively in document imaging systems.

Lossless compression is ideal when every pixel matters: legal documents, medical imaging, engineering drawings, and archival records where even minor quality degradation is unacceptable.

Lossy Compression Methods

Lossy compression achieves dramatically higher size reductions by selectively discarding data that human perception is unlikely to notice. The decompressed output is visually similar but not identical to the original:

  • DCT/JPEG compression — Transforms image data into frequency-domain components using the Discrete Cosine Transform, then quantizes high-frequency coefficients (fine detail) more aggressively than low-frequency coefficients (broad color and brightness patterns). Human vision is less sensitive to fine detail loss, so well-tuned JPEG compression preserves perceived quality while reducing file size by 70–90%. The PDF specification supports JPEG quality levels that allow granular control over the size-quality tradeoff.
  • Image downsampling (resampling) — Reduces the pixel dimensions of embedded images. A 4000×3000 pixel photograph (12 megapixels) downsampled to 1200×900 (1 megapixel) retains excellent on-screen viewing quality while reducing the pixel data to 8% of the original volume. Three resampling algorithms are common: bicubic (highest quality), bilinear (balanced), and nearest-neighbor (fastest, but may produce jagged edges).
  • Color space conversion — Converting images from CMYK (4 channels) to RGB (3 channels) reduces color data by 25%. Converting to grayscale (1 channel) reduces it by 67–75%. This is appropriate when the document will be viewed on screen rather than printed commercially.
  • JPEG2000 compression — A wavelet-based algorithm (ISO 15444) supported in PDF 1.5 and later. JPEG2000 generally achieves 20–30% better compression than standard JPEG at equivalent quality levels, and supports both lossy and lossless modes. However, decoder support remains less universal than standard JPEG.

According to Artifex Software (the Ghostscript developers), their compression engine achieves average reductions of 72% on mixed-content PDFs using a combination of image downsampling to 150 DPI and medium-quality JPEG recompression.

Ghostscript Compression Presets

Ghostscript, the open-source PDF processing engine used by many compression tools (including AuraPDF's Compress PDF tool), provides four predefined compression presets:

PresetImage DPIQualityTypical ReductionBest For
/screen72 DPILow80–95%Screen viewing, email
/ebook150 DPIMedium-Low60–80%Tablets, e-readers
/printer300 DPIMedium-High30–50%Office laser printing
/prepress300+ DPIHighest10–25%Commercial printing

The /screen preset is the most aggressive, downsampling all images to 72 DPI (screen resolution) and applying high JPEG compression. This can reduce a 20 MB photo-heavy PDF to under 1 MB, but images will appear pixelated when zoomed or printed.

The /ebook preset (150 DPI) represents the sweet spot for most digital distribution — documents look sharp on tablets and laptops, print acceptably on home printers, and achieve significant size reduction. According to AuraPDF usage statistics, approximately 68% of users select medium compression, which corresponds to the /ebook preset range.

The /printer and /prepress presets preserve high-resolution detail for physical reproduction. Size reductions are more modest because images remain at or near their original resolution, with only redundant metadata, duplicate fonts, and inefficient stream encodings being optimized.

What Makes PDFs Large?

Before compressing, understanding why a PDF is large helps select the right strategy:

  • Embedded high-resolution images — The single largest contributor to PDF bloat. A single uncompressed 4000×3000 photograph consumes approximately 34 MB of raw pixel data (at 24-bit color). Even JPEG-compressed, a high-quality photograph occupies 2–8 MB. Multiply by dozens of images, and file sizes escalate rapidly.
  • Duplicate embedded fonts — When a PDF is created by merging multiple source documents, the same font (e.g., Arial, Times New Roman) may be embedded multiple times. Each instance adds 200 KB – 2 MB depending on character coverage.
  • Unoptimized content streams — Some PDF generators produce verbose or uncompressed content streams. Applying FLATE compression to these streams can reduce their size by 40–60% with zero quality impact.
  • Retained editing data — PDF editors may retain incremental save data, deleted objects, and revision history within the file. A "Save As" operation (as opposed to "Save") or a rebuild with Ghostscript eliminates this accumulated overhead.
  • Embedded attachments and multimedia — Attached Excel spreadsheets, embedded video, or audio clips contribute directly to file size and are unaffected by image compression strategies.

A study by Foxit Software found that 73% of oversized PDFs encountered in enterprise environments could be reduced by more than 50% through image resampling alone, without any visible quality degradation at normal viewing magnification.

Compression Best Practices

Follow these guidelines to achieve optimal compression without compromising document usability:

  1. Match compression to purpose — Email attachments and web uploads tolerate aggressive compression (72–150 DPI). Documents intended for professional printing should preserve 300 DPI images. Archival documents (PDF/A) should use lossless compression exclusively.
  2. Compress before merging — When combining multiple PDFs, compress each source file individually first. Merged files often contain duplicate fonts and redundant resources that inflate the combined size. AuraPDF's Merge tool automatically handles font deduplication.
  3. Remove unnecessary elements — Strip metadata, embedded thumbnails, bookmarks to deleted pages, and form field appearances before final compression. These elements rarely affect usability but contribute to file size.
  4. Use the right format for scanned content — Monochrome scans (black-and-white text) should use CCITT Group 4 compression, not JPEG. CCITT produces smaller files and avoids the compression artifacts that make scanned text harder to read.
  5. Avoid recompressing already-compressed images — Applying JPEG compression to an already JPEG-compressed image causes generation loss — quality degrades with each cycle while size reduction diminishes. Modern compression tools detect previously compressed streams and skip redundant recompression.
  6. Test at target viewing conditions — After compression, verify the result at the intended viewing size. A document that looks acceptable at 100% zoom on a 4K monitor may show artifacts when zoomed to 200% or printed on high-quality paper.

Using AuraPDF's Compress PDF tool, you can reduce most documents by 50–80% with a single click, using intelligent defaults that balance quality and size for typical digital distribution scenarios.

Frequently Asked Questions

Does compressing a PDF reduce its quality?
It depends on the method. Lossless compression (FLATE, LZW) reduces size without any quality loss — the decompressed file is identical to the original. Lossy compression (JPEG recompression, downsampling) reduces quality slightly, but modern algorithms target detail the human eye is unlikely to notice. At medium compression settings, most users cannot distinguish the compressed file from the original.
What is the best compression level for PDF?
For most use cases, medium compression (roughly equivalent to Ghostscript's /ebook preset at 150 DPI) offers the best balance. It achieves 60–80% size reduction while maintaining sharp text and clear images suitable for screen viewing and home printing. Use low compression (high quality) for documents destined for commercial printing, and high compression for email attachments or web uploads.
How much can a PDF be compressed?
Compression results vary dramatically based on content. Image-heavy PDFs (presentations, photo albums) can often be reduced by 80–95%. Text-heavy PDFs with few images may only compress by 10–30% because text streams are already relatively compact. The theoretical maximum depends on the redundancy in the original content.
Is compressing a PDF the same as creating a ZIP file?
No. ZIP compression wraps the entire file in an external container — you must extract the PDF before opening it. PDF internal compression operates on individual content streams within the file, producing a smaller but fully functional PDF that opens directly in any reader. Internal compression is more effective because it can apply different algorithms to different content types.
Can I compress a PDF multiple times?
You can, but results diminish rapidly. The first compression pass captures the most savings. Subsequent passes typically achieve less than 5% additional reduction and may introduce cumulative quality degradation if lossy methods are used. If a file is still too large after one compression pass, consider removing pages, reducing image resolution further, or splitting the document.

Related Articles

Try These Tools

From the Blog

A

Written by the AuraPDF Team

The AuraPDF team builds free, secure PDF tools used by thousands of people worldwide. Our Knowledge Base articles combine technical expertise with accessible explanations to help you understand PDF technology.

Learn more about us