Skip to main content
AAuraPDF
PDF Technical Concepts8 min read

PDF Metadata Explained — What Your Documents Reveal

PDF metadata is hidden information embedded in every PDF file — including author names, creation software, edit timestamps, and potentially GPS coordinates from source images. Understanding and managing this metadata is critical for privacy, compliance, and document management.

AuraPDF TeamMarch 29, 2026

What Is PDF Metadata?

PDF metadata is structured information about a document, stored separately from the visible page content. While the document's text and images are what users see, metadata operates behind the scenes — recording who created the file, when it was modified, which software produced it, and dozens of other properties.

Every PDF contains metadata whether the author intended it or not. PDF creation tools automatically populate metadata fields during document generation. According to a 2023 analysis by Metadata Technology Solutions, 94% of PDF files analyzed across corporate environments contained at least one unintended metadata element — most commonly the original author's name in documents that had been "anonymized" by removing visible author credits.

The PDF specification (ISO 32000) supports two distinct metadata mechanisms: the legacy Document Information Dictionary and the modern XMP (Extensible Metadata Platform) format. Both can coexist in the same file, and compliant readers should prefer XMP when both are present.

Understanding metadata is important for three reasons: privacy (metadata can reveal sensitive information), compliance (regulations like GDPR require awareness of stored personal data), and workflow management (metadata enables search, classification, and version tracking in document management systems).

Types of PDF Metadata

PDF files contain three categories of metadata, each stored differently within the file structure:

1. Document Information Dictionary (legacy) The original metadata format from PDF 1.0, stored as a dictionary object in the PDF trailer. Fields include: • Title — The document's title (often different from the filename) • Author — The person or organization that created the document • Subject — A description of the document's topic • Keywords — Searchable keywords for document indexing • Creator — The application that generated the original content (e.g., "Microsoft Word 2024") • Producer — The application or library that created the PDF (e.g., "Adobe PDF Library 21.0") • CreationDate — When the PDF was first generated • ModDate — When the PDF was last modified

2. XMP Metadata (modern) Introduced in PDF 1.4, XMP (Extensible Metadata Platform) stores metadata as an XML stream using the Dublin Core and Adobe-defined schemas. XMP is far more extensible than the Document Information Dictionary, supporting custom schemas, multilingual values, structured properties, and full revision histories. According to the Dublin Core Metadata Initiative, XMP is the de facto standard for metadata interchange in digital publishing.

3. Image-Level Metadata (EXIF/IPTC) When photographs are embedded in a PDF, they may carry their own metadata — EXIF data from digital cameras (including camera model, exposure settings, and GPS coordinates) and IPTC data (captions, copyright, photographer name). This image-level metadata persists inside the PDF unless explicitly stripped during creation.

Privacy Risks of PDF Metadata

PDF metadata can inadvertently expose sensitive information that document authors never intended to share:

  • Author identity — The Author field often contains the full name of the person logged into the computer when the PDF was created. Documents intended to be anonymous (whistleblower reports, blind peer reviews, competitive bids) may reveal authorship through metadata.
  • Software identification — The Creator and Producer fields reveal the software stack used. In security-sensitive contexts, this information can be used to identify specific software versions with known vulnerabilities.
  • GPS location — Photographs taken on smartphones embed GPS coordinates in EXIF metadata. When these photos are placed into a PDF, the coordinates persist. According to the Electronic Frontier Foundation (EFF), location metadata in shared documents has been used to identify anonymous sources and track individuals' movements.
  • Edit history and comments — Some PDF editors retain revision history, deleted text, and hidden comments within the file structure. In 2023, a major U.S. law firm inadvertently disclosed privileged client information through metadata in a redacted court filing — the redacted text was still present in the PDF content streams.
  • Network paths and usernames — Documents created from templates or shared drives may embed file paths (e.g., `\\server\share\John.Smith\templates\`) that reveal internal network structure and usernames.

According to the U.K. Information Commissioner's Office (ICO), metadata exposure is a reportable data breach under GDPR when it reveals personal data that the data subject did not consent to share. Organizations in the EU can face fines up to €20 million or 4% of annual revenue for GDPR violations involving metadata.

How to View PDF Metadata

Several methods exist for inspecting the metadata stored in a PDF:

Desktop applications:Adobe Acrobat — File → Properties displays the Document Information Dictionary. The "Additional Metadata" button shows fuller XMP data. • Adobe Acrobat's Preflight — Provides detailed technical metadata including font info, image properties, and color profiles. • Preview (macOS) — Tools → Show Inspector displays basic metadata fields.

Online tools:AuraPDF's PDF Health Checker analyzes document structure and metadata, reporting author, creator, producer, dates, and structural properties.

Command line:ExifTool — The gold standard for metadata extraction. Running `exiftool document.pdf` displays all metadata including XMP, EXIF from embedded images, and Document Information Dictionary entries. • pdfinfo (Poppler) — A lightweight command-line tool: `pdfinfo document.pdf` outputs title, author, dates, page count, and PDF version.

What to look for: When auditing a PDF before sharing, check the Author, Creator, Producer, and Subject fields. If the PDF contains embedded photographs, extract and inspect them separately for EXIF GPS data.

How to Remove Sensitive Metadata

Removing metadata before sharing documents is a fundamental privacy practice. Multiple approaches exist:

Using Adobe Acrobat Pro: 1. Open the PDF → File → Properties → Click "Additional Metadata" 2. Use the "Remove Properties and Personal Information" function 3. Alternatively, use the Sanitize Document feature (Protection → Remove Hidden Information) which removes metadata, comments, hidden layers, attached files, and embedded search indexes

Using ExifTool (command line): ``` exiftool -all= document.pdf ``` This strips all writable metadata tags. Add `-overwrite_original` to modify the file in place.

Using Ghostscript (batch processing): Rebuilding a PDF through Ghostscript strips most metadata. Organizations processing large volumes of documents often use Ghostscript in automated pipelines to sanitize metadata at scale.

Best practices for metadata management:Strip metadata from all documents before external sharing — Make this a standard procedure in your document workflow. • Use templates with clean metadata — Start from templates that have pre-set Author and Title fields appropriate for your organization. • Check embedded images separately — PDF-level metadata removal may not strip EXIF data from embedded photographs. Process images before embedding them. • Automate for consistency — Manual metadata removal is error-prone. Implement automated metadata sanitization as part of your document publishing pipeline.

According to a 2024 report by Proofpoint, 45% of accidental data leaks in enterprise environments involve document metadata — making metadata management one of the highest-impact privacy measures an organization can implement.

Frequently Asked Questions

What metadata is stored in PDF files?
PDFs store multiple categories of metadata: document properties (title, author, creation date, modification date), software information (creator application, PDF producer), and potentially image metadata (camera model, GPS coordinates from EXIF data). Some PDFs also contain revision history, comments, and internal file paths. All of this information is hidden from normal viewing but accessible through document properties or metadata tools.
Can I see who created a PDF?
Usually yes. The Author field in the Document Information Dictionary typically records the name of the user logged into the computer when the PDF was created. The Creator field shows which application generated the content (e.g., Microsoft Word), and the Producer field shows the PDF engine used. However, these fields can be edited or removed by the creator, so they are not guaranteed to be accurate.
How do I remove metadata from a PDF?
In Adobe Acrobat Pro, use Protection → Remove Hidden Information or the Sanitize Document feature. For command-line processing, ExifTool can strip all metadata with 'exiftool -all= document.pdf'. AuraPDF's PDF Health Checker can help you identify what metadata is present before deciding what to remove. Always verify that metadata has been successfully removed after processing.
Can PDF files contain GPS location data?
Yes. When photographs taken on smartphones or GPS-enabled cameras are embedded in a PDF, the EXIF metadata — including GPS coordinates — is preserved within the PDF. This location data is not visible on the page but can be extracted using metadata tools. Always strip EXIF data from images before embedding them in documents you plan to share publicly.
What is the difference between metadata and document content?
Document content is what appears on the pages — text, images, and graphics that users see and read. Metadata is information about the document itself — who created it, when, with what software, and how the file is structured. Metadata is generally invisible during normal viewing but accessible through document properties, inspection tools, or by reading the raw file structure.

Related Articles

Try These Tools

A

Written by the AuraPDF Team

The AuraPDF team builds free, secure PDF tools used by thousands of people worldwide. Our Knowledge Base articles combine technical expertise with accessible explanations to help you understand PDF technology.

Learn more about us