Install Perl and use it instead of Visual Basic to drive Acrobat.
Depending on your tastes or requirements, you might want to use the Perl scripting language instead of Visual Basic [Hack #94] to program Acrobat. Perl can access the same Acrobat OLE interface used by Visual Basic to manipulate PDFs. Perl is well documented, is widely supported, and has been extended with an impressive collection of modules. A Perl installer for Windows is freely available from ActiveState.
We'll describe how to install the ActivePerl package from ActiveState, and then we'll use an example to show how to access Acrobat's OLE interface using Perl.
|
The ActivePerl installer for Windows is freely available from http://www.ActiveState.com/Products/ActivePerl/. Download and install. It comes with excellent documentation, which you can access by selecting Start Programs ActiveState ActivePerl 5.8 Documentation.
ActivePerl also includes the OLE Browser, shown in Figure 7-8, which enables you to browse the OLE servers available on your machine (Start Programs ActiveState ActivePerl 5.8 OLE-Browser). The OLE Browser is an HTML file that must be opened in Internet Explorer to work properly.
In this example, the Perl script will use Acrobat to read annotation (e.g., sticky notes) data from the currently open PDF. The script will format this data using HTML and then output it to stdout.
Copy the script in Example 7-2 into a file named SummarizeComments.pl. You can download this code from http://www.pdfhacks.com/summarize/.
# SummarizeComments.pl ver. 1.0 use strict; use Win32::OLE; my $app = Win32::OLE->new("AcroExch.App"); if( 0< $app->GetNumAVDocs ) { # a PDF is open in Acrobat # open the HTML document print "<html>\n<head>\n<title>PDF Comments Summary</title>\n</head>\n<body>\n"; my $found_notes_b= 0; # get the active PDF and drill down to its PDDoc my $avdoc= $app->GetActiveDoc; my $pddoc= $avdoc->GetPDDoc; # iterate over pages my $num_pages= $pddoc->GetNumPages; for( my $ii= 0; $ii< $num_pages; ++$ii ) { my $pdpage= $pddoc->AcquirePage( $ii ); if( $pdpage ) { # interate over annotations (e.g., sticky notes) my $page_head_b= 0; my $num_annots= $pdpage->GetNumAnnots; for( my $jj= 0; $jj< $num_annots; ++$jj ) { my $annot= $pdpage->GetAnnot( $jj ); # Pop-up annots give us duplicate contents if( $annot->GetContents ne '' and $annot->GetSubtype ne 'Popup' ) { if( !$page_head_b ) { # output the page number print "<h2>Page: " . ($ii+ 1) . "</h2>\n"; $page_head_b= 1; } # output the annotation title and format it a little print "<p><i>" . $annot->GetTitle . "</i></p>\n"; # output the note text; replace carriage returns # with paragraph breaks my $comment= $annot->GetContents; $comment =~ s/\r/<\/p>\n<p>/g; print "<p>" . $comment . "</p>\n"; $found_notes_b= 1; } } } } if( !$found_notes_b ) { print "<h3>No Notes Found in PDF</h3>\n"; } # close the HTML document print "</body>\n</html>\n"; }
Open a PDF in Acrobat, as shown in Figure 7-6, and then run this script from the command line by typing:
C:\> perl SummarizeComments.pl > comments.html
It will take a few seconds to complete. When it is done, you can open comments.html in your browser to see a summary of the PDF's comments, as shown in Figure 7-9.
As noted in [Hack #94], this example demonstrates the relationships between several fundamental PDF objects.