PDFs explained: what’s inside, why they’re everywhere, and why they’re annoying

PDF is the cockroach of document formats. Not in a bad way. In a “this thing survives everything” way. You can email it, print it, archive it for years, open it on basically any device, and it still looks like the same document.

But the moment you try to edit one, you learn the other side of PDF: it’s not a Word file. It’s closer to a tiny self-contained printing program.

If you’re here because you need quick help (not a lecture), jump straight to our PDF tools — free, secure online PDF tools to merge PDF, split PDF, extract PDF pages, remove PDF pages, rearrange PDF pages, compress PDF, secure PDF, and unlock PDF.

What is a PDF, really?

PDF stands for Portable Document Format. The goal is simple: “this document should look the same everywhere.” Fonts, spacing, images, page breaks — all locked in.

A good mental model: a PDF is a stack of pages, and each page is instructions like “draw this text here”, “paint this image there”, “stroke this line”, plus extra stuff like links, form fields, and annotations.

PDF structure in plain English (with just enough nerd stuff)

Under the hood, a PDF is made of building blocks called objects. They’re numbered, and they reference each other — kind of like a mini database inside one file.

1) Header

The file usually starts with something like %PDF-1.7 which says “yep, I’m a PDF, here’s my version.”

2) Body (objects + streams)

The body is where the real content lives:

Dictionaries: key/value maps that describe things (pages, fonts, images).
Streams: big blobs of data (page drawing commands, embedded images, sometimes fonts). Streams are often compressed.
Resources: fonts, color spaces, images — anything a page needs to render.

3) Cross-reference table (xref)

PDFs keep an index of where objects are located inside the file, so a reader can jump directly to “object 42” without scanning the entire document.

4) Trailer

The trailer points to the xref and the document “root” — basically the entry point to the whole structure.

Bonus: why PDFs can be “incremental”

PDFs can append changes to the end of the file without rewriting everything. That’s handy for edits, comments, and digital signatures… and also why some PDFs bloat over time.

Why PDFs are great (advantages)

Looks the same everywhere: the main selling point.
Perfect for sharing and printing: contracts, invoices, school docs, forms.
Can be searchable and accessible: if it contains real text (not just scanned images).
Supports links, forms, and signatures: annotations, AcroForm fields, digital signing.
Good for archiving: with the right settings (and sometimes PDF/A), it’s built to last.

Why PDFs drive people nuts (disadvantages)

Editing is hard by design: the format is for final layout, not authoring.
Scans are “just images”: you can’t search or copy text unless OCR was applied.
Security can be confusing: there’s “password to open” vs “password to edit”, plus different viewers behave differently.
File size can get chunky: huge images, embedded fonts, repeated resources, incremental updates.

Common PDF jobs people actually want to do

Most of the time, nobody cares about xref tables. They just need the PDF to cooperate. That usually means one of these:

Merge PDF files into one document (scans + attachments + “final_final_v3.pdf”).
Split PDF into smaller parts for emailing or uploading.
Extract PDF pages you actually need and drop the rest.
Remove PDF pages (goodbye blank pages and duplicate scans).
Rearrange PDF pages when the scanner shuffled your life.
Compress PDF to reduce file size (especially for email limits).
Secure PDF by adding a password.
Unlock PDF when you have the password but the file is protected.

That’s why we keep a dedicated hub for PDF tools. If you’re in “get it done” mode, open PDF Tools and pick what you need.

One practical tip (that saves a lot of pain)

If a PDF came from a scanner and looks like photos, it might be an image-only PDF. Tools that rearrange or merge pages will still work, but text won’t be selectable/searchable because there is no text layer. That’s normal — it’s not “broken”, it’s just how the file was created.

Keywords (because yes, people search like this): merge pdf, split pdf, extract pdf pages, remove pdf pages, rearrange pdf pages, compress pdf, secure pdf, unlock pdf, pdf tools, free secure online pdf tools.