Hack 83 Modify PDF Hyperlinks at Serve-Time

figs/expert.gif figs/hack83.gif

Add live session data to your PDF on its way down the chute.

After publishing your PDF online, it can be hard to gauge what impact it had on readers. Get a clearer picture of reader response by modifying the PDF's hyperlinks so that they pass document information to your web server.

For example, if your July newsletter's PDF edition has hyperlinks to:

http://www.pdfhacks.com/index.html

you can append the newsletter's edition to the PDF hyperlinks using a question mark:

http://www.pdfhacks.com/index.html?edition=0407

When somebody reading your PDF newsletter follows this link into your site, your web logs record exactly which newsletter they were reading.

Take this reader response idea a step further by adding data to PDF hyperlinks that identifies the user who originally downloaded the PDF. With a little preparation, this is easy to do as the PDF is being served.

6.11.1 Add Hyperlinks to Your PDF Using Links or Buttons

A PDF page can include hyperlinks to web content. You can create them using the Link tool, the Button tool (Acrobat 6), or the Form tool (Acrobat 5). Use the Link tool shown in Figure 6-11 if you want to add a hyperlink to existing text or graphics. Use the Button/Form tool if you want to add a hyperlink and add text/graphics to the page, as shown in Figure 6-12. For example, you would use the Button/Form tool to create a web-style navigation bar [Hack #65] .

Figure 6-11. The Button tool in Acrobat 6 (bottom left) and the general-purpose Form tool in Acrobat 5 (right)
figs/pdfh_0611.gif


Figure 6-12. Adding a hyperlink (top) to existing page text, and using a button (bottom) to add a hyperlink and text/art to the page
figs/pdfh_0612.gif


To create a hyperlink button in Acrobat 6, select the Button tool (Tools Advanced Editing Forms Button Tool). Click the PDF page and drag out a rectangle. Release the rectangle and a Field Properties dialog opens. Set the button's appearance using the General, Appearance, and Options tabs.

Open the Actions tab. Set the Trigger to Mouse Up, set the Action to Open a Web Link, and then click Add . . . . A dialog will open where you can enter the hyperlink URL.

When creating hyperlink buttons, the button's Name is not important. However, it can't be left blank, either. Set it to any unique identifier. Acrobat 6 does this for you automatically.


To create a hyperlink button in Acrobat 5, select the Form tool. Click the PDF page and drag out a rectangle. Release the rectangle and a Field Properties dialog opens. Set the field type to Button and enter a unique Name. Set the button's appearance using the Appearance and Options tabs.

Open the Actions tab. Select Mouse Up and click Add . . . . Set the Action Type to World Wide Web Link. Click Edit URL . . . and enter the hyperlink URL.

6.11.2 Use Placeholders for Hyperlink URLs

When entering your link or button URL, use an identifying name, such as urlbeg_userhome, instead of the actual URL. Pad this placeholder with asterisks (*) so that it is at least as long as your longest possible URL, as shown in Figure 6-13. Use a constant prefix across all these names (e.g., urlbeg) so that they are easy to find later using grep.

Figure 6-13. Identifying placeholders for URLs, padded with asterisks so that they are long enough to fit your longest possible URL
figs/pdfh_0613.gif


6.11.3 Format the PDF Code with pdftk

When your PDF is ready to distribute online, run it through pdftk [Hack #79] . This formats the PDF code to ensure that each URL is on its own line. Add the extension pdfsrc to the output filename instead of pdf:

pdftk  mydocument .pdf output  mydocument .pdfsrc

From this point on, you should not treat the file like a PDF, and this pdfsrc extension will remind you.

6.11.4 Add Placeholder Offsets to the PDF

Find the byte offsets to your URL placeholders with grep (Windows users visit http://gnuwin32.sf.net/packages/grep.htm or install MSYS [Hack #97] to get grep). grep will tell you the byte offset and display the specific placeholder located on that line in the PDF. For example:

ssteward@armand:~$ grep -ab 

urlbeg mydocument

.pdfsrc

9202:<</URI (urlbeg_userhome*******************)

11793:<</URI (urlbeg_userhome*******************)

17046:<</URI (urlbeg_newsletters*******************)

In your text editor [Hack #82], open your pdfsrc file and add one line for each offset to the beginning. Each line should look like this:

#- urlname - urloffset

For example, this is how the previous grep output would appear at the start of mydocument.pdfsrc:

#-userhome-9202

#-userhome-11793

#-newsletters-17046

%PDF-1.3...

After adding these lines, do not modify the PDF with pdftk, gVim, or Acrobat. The pdfsrc extension should remind you to not treat this file like a PDF. Altering the PDF could break these byte offsets.

6.11.5 The Code

This example PHP script, serve_newsletter.php, opens a pdfsrc file, reads the offset data we added, then serves the PDF. As it serves the PDF, it replaces the placeholders with hyperlinks. It uses the input GET query string's edition and user values to tailor the PDF hyperlinks.

For example, when invoked like this:

http://www.pdfhacks.com/serve_newsletter.php?edition=0307&user=84

it opens the PDF file newsletter.0307.pdfsrc and serves it, replacing all userhome hyperlink placeholders with:

http://www.pdfhacks.com/user_home.php?user=84

and replacing all newsletters placeholders with:

http://www.pdfhacks.com/newsletter_home.php?user=84&edition=0307

Tailor serve_newsletter.php to your purpose:

<?php

// serve_newsletter.php, version 1.0

// http://www.pdfhacks.com/dynamic_links/



$fp= @fopen( "./newsletter.{$_GET['edition']}.pdfsrc", 'r' );

if( $fp ) {



  if( $_GET['debug'] ) {

    header("Content-Type: text/plain"); // debug

  }

  else {

    header('Content-Type: application/pdf');

  }



  $pdf_offset= 0;

  $url_offsets= array( );



  // iterate over first lines of pdfsrc file to load $url_offsets

  while( $cc= fgets($fp, 1024) ) {

    if( $cc{0}== '#' ) { // one of our comments

      list($comment, $name, $offset)= explode( '-', $cc );



      if( $name== 'userhome' ) {

        $url_offsets[(int)$offset]= 

          'http://www.pdfhacks.com/user_home.php?user=' . $_GET['user'];

      }

      else if( $name== 'newsletters' ) {

        $url_offsets[(int)$offset]= 

          'http://www.pdfhacks.com/newsletter_home.php?user=' . 

          $_GET['user'] . '&edition=' . $_GET['edition'];

      }

      else { // default

        $url_offsets[(int)$offset]= 'http://www.pdfhacks.com';

      }

    }

    else { // finished with our comments

      echo $cc;

      $pdf_offset= strlen($cc)+ 1;



      break;

    }

  }



  // sort by increasing offsets

  ksort( $url_offsets, SORT_NUMERIC );

  reset( $url_offsets );



  $output_url_line_b= false;

  $output_url_b= false;

  $closed_string_b= false;



  list( $offset, $url )= each( $url_offsets );

  $url_ii= 0;

  $url_len= strlen($url);



  // iterate over rest of file

  while( ($cc= fgetc($fp))!= "" ) {



    if( $output_url_line_b && $cc== '(' ) {

      // we have reached the beginning of our URL

      $output_url_line_b= false;

      $output_url_b= true;



      echo '(';

    }

    else if( $output_url_b ) {

      if( $cc== ')' ) { // finished with this URL

        if( $closed_string_b ) {

          // string has already been capped; pad

          echo ' ';

        }

        else {

          echo ')';

        }



        // get next offset/URL pair

        list( $offset, $url )= each( $url_offsets );

        $url_ii= 0;

        $url_len= strlen($url);



        // reset

        $output_url_b= false;

        $closed_string_b= false;

      }

      else if( $url_ii< $url_len ) {

        // output one character of $url

        echo $url{$url_ii++};

      }

      else if( $url_ii== $url_len ) {

        // done with $url, so cap this string

        echo ')';

        $closed_string_b= true;

        $url_ii++;

      }

      else {

        echo ' '; // replace padding with space

      }

    }

    else {

      // output this character

      echo $cc;



      if( $offset== $pdf_offset ) {

        // we have reached a line in pdfsrc where

        // our URL should be; begin a lookout for '('

        $output_url_line_b= true;

      }

    }



    ++$pdf_offset;

  }



  fclose( $fp );

}

else { // file open failure

  echo 'Error: failed to open: '."./newsletter.{$_GET['edition']}.pdfsrc";

}

?>

6.11.6 Running the Hack

Upload this file to your web server along with your modified PDF file. Invoke the script with an information-packed URL, such as this one:

http://www.pdfhacks.com/newsletters.php?ed=0307&u=84572