Split a PDF into pages and frame them in HTML, where the fun begins.
In general, HTML files are called pages, while PDF files are called documents. By splitting a PDF document into PDF pages we shift it into HTML's paradigm where we now can program the document like a web site. Let's start with a basic document skin, shown in Figure 5-17, which gives us a cool look and handy document navigation.
Our Classic skin has a number of nice built-in features:
Table of contents portal page based on PDF bookmarks
Navigation cluster for flipping through pages
Table of Contents navigation sidebar based on PDF bookmarks
A hyperlink to the full, unsplit PDF for download on each page
Convenient Email This Page link on each page
Test-drive our online version at http://www.pdfhacks.com/eno/. The HTML, JavaScript, and user interface icons are freely distributable under the GPL, so feel free to use them in your own templates.
First, install pdftk [Hack #79] . Next, visit http://www.pdfhacks.com/skins/ and download pdfskins-1.1.zip. Unzip, and move pdfskins.exe to a convenient location, such as C:\Windows\system32\. On other platforms, compile pdfskins from the included source code. Just cd pdfskins-1.1 and run make.
Download a skin template from http://www.pdfhacks.com/skins/. The template pdfskins_classic_js uses client-side JavaScript to create the dynamic pieces. pdfskins_classic_php uses server-side PHP instead. Pick one and unzip it into a new directory:
unzip pdfskins_classic_js-1.1.zip
Copy your PDF document into this new directory and burst it into pages with pdftk. This also creates doc_data.txt, which reports on the document's title, metadata, and bookmarks:
pdftk full_doc.pdf burst
Finally, in this same directory, spin skins using pdfskins. It reads doc_data.txt, created earlier, for the document title and other data. Pass the PDF filename as the first argument, if you plan to make the full PDF document available for download. This first argument is used only for constructing the Download Full Document hyperlink. It can be a full or relative URL. Omit this filename, and this hyperlink will not be displayed.
pdfskins full_doc.pdf
Fire up your web browser and point it at index.html, located in the directory where you've been working. The portal should appear, showing the table of contents and graphic placeholders for your logo (logo.gif) and document cover thumbnail (thumb.gif). If you used the php or comments templates, the pages must be served to you by a PHP-enabled web server.
|
You can add or change data in the doc_data.txt file, or you can pass additional, overriding data to pdfskins on the command line. This is most useful for changing the default colors used in the Classic skin. For example:
pdfskins full_doc.pdf -title "Great American Novel" -color1 #336600 \ -color2 white
In the Classic skin, color1 is the color of the header and color2 is the color around the upper-left logo. Alternatively, you can add or change these lines in doc_data.txt:
InfoKey: Color1 InfoValue: #336600 InfoKey: Color2 InfoValue: white InfoKey: Title InfoValue: Great American Novel
By bursting your PDF into pages and then not making the full document available for download, you compel readers to return to your site when they desire your material. If this is your intent, you should also secure your pages against merging, so nobody can easily reassemble your pages into the original PDF document. Do this when bursting the document. For example:
pdftk full_doc.pdf burst encrypt_128bits owner_pw 23@#5dfa \ allow DegradedPrinting
See [Hack #52] for more details on how to secure documents with pdftk.
|
Now, you control the document. You can take it in any direction you choose. See [Hack #21] for some ideas on how to add full-text document search. See [Hack #72] to learn how to add online page commenting.