Hack 94 Script Acrobat Using Visual Basic on Windows

figs/moderate.gif figs/hack94.gif

Drive Acrobat using VB or Microsoft Word's Visual Basic for Applications (VBA).

Adobe Acrobat's OLE interface enables you to access or manipulate PDFs from a freestanding Visual Basic script or from another application, such as Word. You can also use Acrobat's OLE interface to render a PDF inside your own program's window. The Acrobat SDK [Hack #98] comes with a number of Visual Basic examples under the InterAppCommunicationSupport directory. The SDK also includes OLE interface documentation. Look for IACOverview.pdf and IACReference.pdf. These OLE features do not work with the free Reader; you must own Acrobat.

Acrobat Distiller also has an OLE interface. It is documented in DistillerAPIReference.pdf, which comes with the full Acrobat SDK.


The following example shows how easily you can work with PDFs using Acrobat OLE. It is a Word macro that scans the currently open PDF document for readers' annotations (e.g., sticky notes). It creates a new Word document and then builds a summary of these annotation comments.

7.3.1 The Code

To add this macro to Word, select Tools Macro Macros . . . , type in the macro name SummarizeComments, and click Create. Word will open a text editor where you can enter the code shown in Example 7-1. Save, and then test. You can download this code from http://www.pdfhacks.com/summarize.

Example 7-1. VBA code for summarizing comments
Sub SummarizeComments( )

Dim app As Object

Set app = CreateObject("AcroExch.App")

If (0 < app.GetNumAVDocs) Then

  ' a PDF is open in Acrobat

  ' create a new Word doc to hold the summary

  Dim NewDoc As Document

  Dim NewDocRange As Range

  Set NewDoc = Documents.Add(DocumentType:=wdNewBlankDocument)

  Set NewDocRange = NewDoc.Range

  

  Dim found_notes_b As Boolean

  found_notes_b = False

  

  ' get the active doc and drill down to its PDDoc

  Dim avdoc, pddoc As Object

  Set avdoc = app.GetActiveDoc

  Set pddoc = avdoc.GetPDDoc

  

  ' iterate over pages

  Dim num_pages As Long

  num_pages = pddoc.GetNumPages

  For ii = 0 To num_pages - 1

    

    Dim pdpage As Object

    Set pdpage = pddoc.AcquirePage(ii)

    If (Not pdpage Is Nothing) Then

      

      ' iterate over annotations (e.g., sticky notes)

      Dim page_head_b As Boolean

      page_head_b = False

      Dim num_annots As Long

      num_annots = pdpage.GetNumAnnots

      For jj = 0 To num_annots - 1

        

        Dim annot As Object

        Set annot = pdpage.GetAnnot(jj)

        ' Popup annots give us duplicate contents

        If (annot.GetContents <> "" And _

            annot.GetSubtype <> "Popup") Then

          

          If (page_head_b = False) Then ' output the page number

            NewDocRange.Collapse wdCollapseEnd

            NewDocRange.Text = "Page: " & (ii + 1) & vbCr

            NewDocRange.Bold = True

            NewDocRange.ParagraphFormat.LineUnitBefore = 1

            page_head_b = True

          End If

          

          ' output the annotation title and format it a little

          NewDocRange.Collapse wdCollapseEnd

          NewDocRange.Text = annot.GetTitle & vbCr

          NewDocRange.Italic = True

          NewDocRange.Font.Size = NewDocRange.Font.Size - 1

          NewDocRange.ParagraphFormat.LineUnitBefore = 0.6

          

          ' output the note text and format it a little

          NewDocRange.Collapse wdCollapseEnd

          NewDocRange.Text = annot.GetContents & vbCr

          NewDocRange.Font.Size = NewDocRange.Font.Size - 2

          

          found_notes_b = True

        End If

      Next jj

    End If

  Next ii

  

  If (Not found_notes_b) Then

    NewDocRange.Collapse wdCollapseEnd

    NewDocRange.Text = "No Notes Found in PDF" & vbCr

    NewDocRange.Bold = True

  End If

End If

End Sub

7.3.2 Running the Code

Open a PDF in Acrobat, as shown in Figure 7-6. In Word, run the macro by selecting Tools Macro Macros . . . SummarizeComments and then clicking Run. After a few seconds, a new Word document will appear, as shown in Figure 7-7. It will list all the comments that readers have added to each page of the currently visible PDF.

Figure 7-6. PDF Comments displayed in Acrobat
figs/pdfh_0706.gif


Figure 7-7. The PDF Comments in Word after extraction via SummarizeComments
figs/pdfh_0707.gif


7.3.3 Hacking the Hack

This script demonstrates the typical process of drilling down through layers of PDF objects to find desired information. Here is a simplified sketch of the layers:


app

The currently running Acrobat program. Use the app to alter the user interface or Acrobat's preferences.


avdoc

The PDF currently displayed in Acrobat. Use the avdoc to change how the PDF appears in the viewer or to print pages.


pddoc

Represents the underlying PDF document. Use the pddoc to access or manipulate the PDF's pages or metadata.


pdpage

Represents the underlying PDF page. Use the pdpage to access or manipulate a page's annotations, its rotation, or its cropping.

These OLE objects closely resemble the objects exposed by the Acrobat API [Hack #97] . The API gives you much more power, however.