Hack 67 Support Online PDF Reading

figs/moderate.gif figs/hack67.gif

Serve PDF pages on demand and spare readers a long download.

Sometimes readers want to download the entire document; sometimes they want to read just a few pages. If a reader desires to read a single page from your PDF, she shouldn't be stuck downloading the entire document. A large document download will turn her away. The easiest solution is to configure your PDF and your web server for serving individual pages on request. An alternative is to use our PDF skins [Hack #71] .

5.18.1 Prepare the PDF

To permit page-at-a-time delivery over the Web, a PDF must be linearized. Linearization organizes a PDF's internal structure so that a client can request the PDF resources it needs on a byte-by-byte basis. If the reader wants to see page 12, then the client requests only the data it needs to display page 12.

Test whether a PDF is linearized by opening it in Acrobat/Reader and viewing its document properties. Open File Document Properties . . . Description (Acrobat 6) or File Document Properties Summary (Acrobat 5). A linearized PDF shows Fast Web View: Yes.

The Xpdf project (http://www.foolabs.com/xpdf/) includes a command-line tool called pdfinfo that can tell you if a PDF is linearized. Pass your PDF to pdfinfo like so:

pdfinfo  mydoc.pdf

pdfinfo will create a text report on-screen that says Optimized: Yes if your PDF is linearized. pdfinfo is free software.

To create a linearized PDF using Acrobat, first inspect your preferences. Select Edit Preferences General . . . and choose the General category (Acrobat 6) or the Options category (Acrobat 5). Place a checkmark next to Save As Optimizes for Fast Web View and click OK.

Open the PDF you want to linearize and then Save As... to the same filename. In Acrobat 6, you can change the PDF's compatibility level at the same time by selecting File Reduce File Size instead of Save As.... Open the document properties to check that it worked.

If you ever make changes to the PDF in Acrobat and then simply File Save your PDF, it will no longer be linearized. You must use Save As... to ensure that your PDF remains linearized.

Ghostscript [Hack #39] includes a command-line tool called pdfopt that can linearize PDF. To create a linearized PDF using pdfopt, invoke it from the command-line like so:


input.pdf output.linearized.pdf

5.18.2 Prepare the Server

Both Apache, Versions 1.3.17 and greater, and Microsoft IIS, Versions 3 and greater, should serve PDF pages on demand without additional configuration. The key to serving PDF pages on demand is byte range support by the web server. HTTP 1.1 describes byte range support (http://www.freesoft.org/CIE/RFC/2068/160.htm). Byte range support means that the client can request a specific range of bytes from the web server. Instead of serving the entire file, the server will send just those bytes.

The web server must indicate its support for byte ranges by sending the "Accept-Ranges: bytes" header in response to a PDF file request. Otherwise, Acrobat might not attempt page-at-a-time downloading. If you want to tell clients to not attempt page-at-a-time serving from your server, send the "Accept-Ranges: none" header instead.