Drive Acrobat using VB or Microsoft Word's Visual Basic for Applications (VBA).
Adobe Acrobat's OLE interface enables you to access or manipulate PDFs from a freestanding Visual Basic script or from another application, such as Word. You can also use Acrobat's OLE interface to render a PDF inside your own program's window. The Acrobat SDK [Hack #98] comes with a number of Visual Basic examples under the InterAppCommunicationSupport directory. The SDK also includes OLE interface documentation. Look for IACOverview.pdf and IACReference.pdf. These OLE features do not work with the free Reader; you must own Acrobat.
The following example shows how easily you can work with PDFs using Acrobat OLE. It is a Word macro that scans the currently open PDF document for readers' annotations (e.g., sticky notes). It creates a new Word document and then builds a summary of these annotation comments.
To add this macro to Word, select Tools Macro Macros . . . , type in the macro name SummarizeComments, and click Create. Word will open a text editor where you can enter the code shown in Example 7-1. Save, and then test. You can download this code from http://www.pdfhacks.com/summarize.
Sub SummarizeComments( ) Dim app As Object Set app = CreateObject("AcroExch.App") If (0 < app.GetNumAVDocs) Then ' a PDF is open in Acrobat ' create a new Word doc to hold the summary Dim NewDoc As Document Dim NewDocRange As Range Set NewDoc = Documents.Add(DocumentType:=wdNewBlankDocument) Set NewDocRange = NewDoc.Range Dim found_notes_b As Boolean found_notes_b = False ' get the active doc and drill down to its PDDoc Dim avdoc, pddoc As Object Set avdoc = app.GetActiveDoc Set pddoc = avdoc.GetPDDoc ' iterate over pages Dim num_pages As Long num_pages = pddoc.GetNumPages For ii = 0 To num_pages - 1 Dim pdpage As Object Set pdpage = pddoc.AcquirePage(ii) If (Not pdpage Is Nothing) Then ' iterate over annotations (e.g., sticky notes) Dim page_head_b As Boolean page_head_b = False Dim num_annots As Long num_annots = pdpage.GetNumAnnots For jj = 0 To num_annots - 1 Dim annot As Object Set annot = pdpage.GetAnnot(jj) ' Popup annots give us duplicate contents If (annot.GetContents <> "" And _ annot.GetSubtype <> "Popup") Then If (page_head_b = False) Then ' output the page number NewDocRange.Collapse wdCollapseEnd NewDocRange.Text = "Page: " & (ii + 1) & vbCr NewDocRange.Bold = True NewDocRange.ParagraphFormat.LineUnitBefore = 1 page_head_b = True End If ' output the annotation title and format it a little NewDocRange.Collapse wdCollapseEnd NewDocRange.Text = annot.GetTitle & vbCr NewDocRange.Italic = True NewDocRange.Font.Size = NewDocRange.Font.Size - 1 NewDocRange.ParagraphFormat.LineUnitBefore = 0.6 ' output the note text and format it a little NewDocRange.Collapse wdCollapseEnd NewDocRange.Text = annot.GetContents & vbCr NewDocRange.Font.Size = NewDocRange.Font.Size - 2 found_notes_b = True End If Next jj End If Next ii If (Not found_notes_b) Then NewDocRange.Collapse wdCollapseEnd NewDocRange.Text = "No Notes Found in PDF" & vbCr NewDocRange.Bold = True End If End If End Sub
Open a PDF in Acrobat, as shown in Figure 7-6. In Word, run the macro by selecting Tools Macro Macros . . . SummarizeComments and then clicking Run. After a few seconds, a new Word document will appear, as shown in Figure 7-7. It will list all the comments that readers have added to each page of the currently visible PDF.
This script demonstrates the typical process of drilling down through layers of PDF objects to find desired information. Here is a simplified sketch of the layers:
The currently running Acrobat program. Use the app to alter the user interface or Acrobat's preferences.
The PDF currently displayed in Acrobat. Use the avdoc to change how the PDF appears in the viewer or to print pages.
Represents the underlying PDF document. Use the pddoc to access or manipulate the PDF's pages or metadata.
Represents the underlying PDF page. Use the pdpage to access or manipulate a page's annotations, its rotation, or its cropping.
These OLE objects closely resemble the objects exposed by the Acrobat API [Hack #97] . The API gives you much more power, however.