Add document information to your PDF, even without using Acrobat.
Traditional metadata includes things such as your document's title, authors, and ISBN. But you can add anything you want, such as the document's revision number, category, internal ID, or expiration date. PDF can store this information in two different ways: using the PDF's Info dictionary [Hack #80] or using an embedded Extensible Metadata Platform (XMP) stream. When you change the PDF's title, authors, subject, or keywords using Acrobat, as shown in Figure 5-13, it updates both of these resources. Acrobat 6 also enables you to export or import PDF XMP datafiles. Visit http://www.adobe.com/products/xmp/ to learn about Adobe's XMP.
In Acrobat 6, view and update metadata by selecting File Document Properties . . . Description or Advanced Document Metadata . . . . In Acrobat 5, select File Document Properties Summary. Save your PDF after making changes to the metadata.
Our pdftk [Hack #79] currently reads and writes only the metadata in a PDF's Info dictionary. However, it does not restrict you to just the title, authors, subject, and keywords. This solves the basic problem of embedding information into a PDF document; pdftk allows you to add custom metadata fields to PDF as needed. pdftk is free software.
Xpdf's (http://www.foolabs.com/xpdf/) pdfinfo reports a PDF's Info dictionary contents, its XMP stream, and other document data. pdfinfo is free software.
To create a plain-text report of PDF metadata, use pdftk's dump_data operation. It will also report PDF bookmarks and page labels, among other things. The command looks like this:
pdftk mydoc.pdf dump_data output mydoc.data.txt
Metadata will be represented as key/value pairs, like so:
InfoKey: Creator InfoValue: Acrobat PDFMaker 6.0 for Word InfoKey: Title InfoValue: Brian Eno: His Music and the Vertical Color of Sound InfoKey: Author InfoValue: Eric Tamm InfoKey: Producer InfoValue: Acrobat Distiller 6.0.1 (Windows) InfoKey: ModDate InfoValue: D:20040420234132-07'00' InfoKey: CreationDate InfoValue: D:20040420234045-07'00'
Another tool for reporting PDF metadata is pdfinfo, which is part of the Xpdf project (http://www.foolabs.com/xpdf/). In addition to metadata, it also reports page sizes, page count, and PDF permissions [Hack #52] . Running pdfinfo mydoc.pdf yields a report such as this:
Title: Brian Eno: His Music and the Vertical Color of Sound Author: Eric Tamm Creator: Acrobat PDFMaker 6.0 for Word Producer: Acrobat Distiller 6.0.1 (Windows) CreationDate: 04/20/04 23:40:45 ModDate: 04/22/04 14:39:30 Tagged: no Pages: 216 Encrypted: no Page size: 522 x 756 pts File size: 1126904 bytes Optimized: yes PDF version: 1.4
Use pdfinfo's options to fine-tune its behavior. Use its -meta option to view a PDF's XMP stream.
pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream. The command would look like this:
pdftk mydoc.pdf update_info new_info.txt output mydoc.updated.pdf
This will add or modify the Info keys given by mydoc.new_data.txt. Note that the output PDF filename must be different from the input. To remove a key/value pair, simply pass in the key/value with an empty value, like so:
InfoKey: MyDataKey InfoValue:
|
The PDF specification defines several Info fields. Be careful to use these only as described in the specification. They are Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate, and Trapped.