![]()
Automatically scrape clipboard data into a new Word document.
In general, PDFs aren't as smart as they appear. Unless they are tagged [Hack #34], they have no concept of paragraph, table, or column. This becomes a problem only when you must create a new document using material from an old document. Ideally, you would use the old document's source file, or maybe even its HTML edition. This isn't always possible, however. Sometimes you have only a PDF to work with.
Adobe Acrobat 6 enables you to convert your PDF
to many different formats with the Save As . . . dialog. These
filters work best when the PDF is tagged. Try one to see if it suits
your requirements. Adobe Reader enables you to convert your PDF to
text by selecting File Save As Text . . . .
If your PDF is not tagged, Acrobat uses an inference engine to assemble the letters into words and the words into paragraphs. It tries to detect and create tables. It works best on documents with very simple formatting. Tables and formatted pages generally don't survive.
Fully automatic conversion of PDF to a structured format such as Word's DOC is not generally possible because the problem is too big. One workaround is to break the problem down to the point where the automation has a chance. The TAPS tool [Hack #7] works well because you meet the automation halfway. You tell it where the table is and it creates a table from the given data. This approach can be scaled to fit the larger problem of converting entire documents.
Copy/Paste works fine for a few items, but it grows cumbersome when processing several pages of data. AutoPasteLoop is a Word macro that watches the clipboard for new data and then immediately pastes it into your new document. Instead of copy/paste, copy/paste, copy/paste, you can just copy, copy, copy. Word automatically pastes, pastes, pastes.
|
Create a new Word macro named AutoPasteLoop in Normal.dot and program it like this:
'AutoPasteLoop, version 1.0
'Visit: http://www.pdfhacks.com/autopaste/
'
'Start AutoPasteLoop from MS Word and switch to Adobe Reader or Acrobat.
'Copy the material you want, and AutoPasteLoop will automatically
'paste it into the target Word document. When you are done, switch back
'to MS Word and AutoPasteLoop will stop.
Option Explicit
' declare Win32 API functions that we need
Declare Function Sleep Lib "kernel32" (ByVal insdf As Long) As Long
Declare Function GetForegroundWindow Lib "user32" ( ) As Long
Declare Function GetOpenClipboardWindow Lib "user32" ( ) As Long
Declare Function GetClipboardOwner Lib "user32" ( ) As Long
Sub AutoPasteLoop( )
'the HWND of the application we're pasting into (MS Word)
Dim AppHwnd As Long
'assume that we are executed from the target app.
AppHwnd = GetForegroundWindow( )
'keep track of whether the user switches out
'of the target application (MS Word).
Dim SwitchedApp As Boolean
SwitchedApp = False
'reset this to stop looping
Dim KeepLooping As Boolean
KeepLooping = True
'the HWND of our target document; GetClipboardOwner returns the
'HWND of the app. that most recently owned the clipboard;
'changing the clipboard's contents (Cut) makes us the "owner"
'
'note that "owning" the clipboard doesn't mean that it's locked
'
Dim DocHwnd As Long
Selection.TypeText Text:="abc"
Selection.MoveLeft Unit:=wdCharacter, Count:=3, Extend:=wdExtend
Selection.Cut
DocHwnd = GetClipboardOwner( )
Do While KeepLooping
Sleep 200 'milliseconds; 100 msec == 1/10 sec
'if the user switches away from the target
'application and then switches back, stop looping
'
Dim ActiveHwnd As Long
ActiveHwnd = GetForegroundWindow( )
If ActiveHwnd = AppHwnd Then
If SwitchedApp Then KeepLooping = False
Else
SwitchedApp = True
End If
'if the clipboard owner has changed, then somebody else
'has put something on it; if the clipboard resource isn't
'locked (GetOpenClipboardWindow), then paste its contents
'into our document; use Copy to change the clipboard owner
'back to DocHwnd
'
If GetClipboardOwner( ) <> DocHwnd And _
GetOpenClipboardWindow( ) = 0 Then
Selection.Paste
Selection.MoveLeft Unit:=wdCharacter, Count:=1, Extend:=wdExtend
Selection.Copy
Selection.Collapse wdCollapseEnd
End If
Loop
End SubOpen a new Word document. Start
AutoPasteLoop by opening
the Macros dialog box (Tools Macros
Macros . .
. ), selecting the macro name AutoPasteLoop, and
clicking Run. When your loop is running, you are not able to interact
with Word. Stop the loop by switching to another application and then
switching back to Word.
Start the loop. Switch to Acrobat (or Reader) and use its tools to individually select and copy its columns, tables, paragraphs, and images. Switch back to Word and you should find all of your selections pasted into the new document. Start AutoPasteLoop again if you want to copy more material.
Add content filters or your own inference logic to the AutoPasteLoop macro. Use your knowledge of the input documents to tailor the loop, so it creates documents that require less postprocessing.
AutoPasteLoop isn't just a PDF hack. It works with any program that can copy content to the clipboard.
![]() | PDF hacks |