Take control of PDF code by mastering its XREF table.
[Hack #80] revealed the hackable plain text behind PDF. Here we edit this PDF text and then use pdftk [Hack #79] to cover our tracks. pdftk can also compress the page streams when we're done.
|
First, uncompress your PDF's page streams [Hack #80] :
pdftk mydoc.pdf output mydoc.uncompressed.pdf uncompress
Then, open this new PDF in your text editor. Locate your page of interest by searching for the text /pdftk_pageNum N, where N is the number of your page (the first page is 1, not 0). This text was added to the page dictionaries by pdftk.
Find the /Contents key in your page's dictionary. It is probably mapped to an indirect object reference: m n R. Locate this indirect object by searching for the text m n obj. This will take you to a stream or to an array of streams. If it is an array, look up any of its referenced streams the same way.
Now you should be looking at a stream of PDF drawing operations that describe your page. These operations and their interactions are best understood by studying the PDF Reference [Hack #98] . However, if your page has a lot of text on it, you can probably make it out. An example of a legal change in page text is changing [(gr)17.7(oup)] to [(grip)], or (storey) to (story). Anything inside parentheses this way is fair game. So, change something and save your work.
Editing PDF at the text level typically corrupts the XREF lookup table at the end of the file. Repair your edited PDF using pdftk like so:
pdftk mydoc.edited.pdf output mydoc.fixed.pdf
Or, if you want to compress the output and remove the /pdftk_pageNum entries, add compress to the end like so:
pdftk mydoc.edited.pdf output mydoc.fixed.pdf compress
Open your new PDF in Reader and view your page. Do you see the change you made? If it was in the middle of a paragraph, you might be surprised to find that the paragraph hasn't rewrapped to fit your altered word. Most PDFs have no concept of a paragraph, so how could it?
This procedure is an unlikely way to fix typos. We put it to better use in [Hack #82] .