PDF Explained – A lot more happens than meets the eye – #programming #bookreview

PDF Explained
By John Whitington
(O’Reilly, paperback, list price $19.99; Kindle edition, list price $9.99)

For many of us, a PDF is a PDF. And a file is just a file. As data goes by.

We give little thought to what actually happens when we download and read — or use word processing software to produce — a document in Portable Document Format, PDF, the International Organization for Standardization (ISO) standard for document exchange.

Yet as John Whitington, author of this informative and important new book, notes: “A typical PDF file contains many thousands of objects, multiple compression mechanisms, different font formats, and a mixture of vector and raster graphics together with a wide variety of metadata and ancillary content.”

Whitington’s clearly written and appropriately illustrated work is aimed at four specific groups of readers:

  1. “Adobe Acrobat users who want to understand the reasons behind the facilities it provides, rather than just how to use them. For example: encryption options, trim and crop boxes, and page labels.”
  2. “Power users who want to use command-line software to process PDF documents in batches by merging, splitting, and optimizing them.”
  3. “Programmers writing code to read, edit, or create PDF files.”
  4. “Industry professionals in search, electronic publishing, and printing who want to understand how to use PDF’s metadata and workflow features to build coherent systems.”

One of the first hands-on things you do in this book is build a small document in PDF from scratch using a simple text editor and pdftk, a free, open-source command line tool for Microsoft Windows, Mac OS X, and Unix. (Spoiler alert: The document will display the traditional “Hello, World!”)

Following the introduction and the chapter on building a simple PDF from scratch, the remaining eight chapters explore: 

  • File structure
  • Document structure
  • Graphics
  • Text and fonts
  • Document metadata and navigation
  • Encrypted documents
  • Working with pdftk
  • PDF software and documentation

 Whitington has the right background and credentials for creating PDF Explained.

He is, according to the book’s biographical blurb, “the author of one of the few complete PDF implementations, CamlPDF, which implements the PDF file format from the bit level up. After graduating from the University of Cambridge, he founded Coherent Graphics Ltd, developers of command line PDF tools for Windows, Mac OS X, and Unix, and the Proview PDF Editor for Mac OS X.”

Si Dunn‘s latest book is a detective novel, Erwin’s Law. His other published works include Jump, a novella, and a book of poetry, plus several short stories, including The 7th Mars Cavalry, all available on Kindle. He is a screenwriter, a freelance book reviewer and a former technical writer and software/hardware QA test specialist.