Teach Windows XP or 2000 how to search the full text of your PDF along with your other documents. Or, use Adobe Reader to search PDF only.
Search is essential for utilizing document archives. Search can also find things where you might not have thought to look. The problem is that Windows search doesn't know how to read PDF files, by default. We present a couple of solutions.
The free Adobe Reader 6.0 provides the easiest solution. It enables you to perform searches across your entire PDF collection (Edit Search). Its detailed query results include links to individual PDF pages and snippets of the text surrounding your query, as shown in Figure 2-5. Its Fast Find setting, enabled by default, caches the results of your searches, so subsequent searches go much faster. View or change the Reader search preferences by selecting Edit Preferences Search.
The downside to Adobe Reader search is that it searches PDF documents only.
It makes sense to search across all file types from a single interface. Newer versions of Windows enable you to extend its built-in search feature to include PDF documents. With Windows 2000, all you need to do is install the freely available PDF IFilter from Adobe. With Windows XP, you must also apply a couple of workarounds. In both cases, you can use the Windows Indexing Service to speed up searches.
The Windows Indexing Service is powerful but needs to be configured for best performance. The next section introduces you to the Indexing Service. We then discuss installing and troubleshooting Adobe's PDF IFilter.
You don't need Indexing Service to search your computer, but it can be handy. Queries run much faster, and you can use advanced search features such as Boolean operators (e.g., AND, OR, and NOT), metadata searches (e.g., @DocTitle Contains "pdf"), and pattern matching. The downside is that the Indexing Service always runs in the background, using resources to index new or updated documents. A little configuration ensures that you get the best performance.
First off, do you have Indexing Service? If not, how do you install it? Both questions are answered in the Windows Components Wizard window. In Windows XP or 2000, open this wizard by selecting Start Settings Control Panel Add or Remove Programs and clicking the Add/Remove Windows Components button on the left. Find the Indexing Service component and place a check in its box, if it is empty, as shown in Figure 2-6. Click Next and proceed through the wizard.
Access Indexing Service configuration and documentation from the Computer Management window, shown in Figure 2-7. Right-click My Computer and select Manage. In the left pane, unroll Services and Applications and then Indexing Service.
Sometimes you must stop or start the Indexing Service. Right-click the Indexing Service node and select Stop or Start from the context menu.
Under the Indexing Service node you'll find index catalogs, such as System. Add, delete, and configure these catalogs so that they index only the directories you need. For details on how to do this, I highly recommend the documentation under Help Help Topics Indexing Service. This document also details the advanced query language.
|
You still can search the directories you do not index by selecting Start Search For Files or Folders, so don't feel compelled to index your entire computer.
Before installing the PDF IFilter, create a special catalog for testing purposes. Put a few PDFs in its directory. Disable indexing on all other catalog directories by double-clicking these directories and selecting "Include in Index? No." This will simplify testing because indexing many documents can take a long time.
|
On Windows XP and 2000, you have two kinds of searches: indexed and unindexed. An indexed search relies on the Indexing Service, as we have discussed. An unindexed search takes a brute-force approach, scanning all files for your queried text, as shown in Figure 2-8. In both cases, the system uses filters to handle the numerous file types. These filters use the IFilter API to interface with the system.
A PDF IFilter is freely available from Adobe. Visit http://www.adobe.com/support/salesdocs/1043a.htm and download ifilter50.exe. Adobe's web page states that this PDF IFilter works only on servers. In fact, it works on XP Home Edition, too.
If you run Windows 2000, you can install the PDF IFilter and it will work for both indexed and unindexed PDF searching.
If you run Windows XP Home Edition and install the PDF IFilter (Version 5.0), you might need to disable the PDF IFilter for unindexed PDF searches. Unindexed searching of PDFs on XP Home Edition with the PDF IFilter can leave open file handles lying around, which will cause all sorts of problems. Visit http://www.pdfhacks.com/ifilter/ and download PDFFilt_FileHandleLeakFix.reg. We will use it in our installation instructions, later in this hack. This registry hack ensures that only the Indexing Service uses the PDF IFilter. After you apply this hack, PDFs will be treated like plain-text files during unindexed searches. You can undo this registry hack with PDFFilt_FileHandleLeakFix.uninstall.reg.
|
On XP, installing the PDF IFilter might require a couple of registry hacks. First we'll install it, then we'll troubleshoot.
In the Computer Management window (right-click My Computer and select Manage), right-click Services and Applications Indexing Service and select Stop.
Run the Adobe PDF IFilter installer through to completion.
Windows XP Home users: install PDFFilt_FileHandleLeakFix.reg by double-clicking it and selecting Yes to confirm installation. (If you need to undo this registry hack, run PDFFilt_FileHandleLeakFix.uninstall.reg.)
Start Indexing Service back up again (right-click Services and Applications Indexing Service and select Start).
Rescan your test catalog. Do this by selecting the catalog's Directories node, right-clicking your test directory, and selecting All Tasks Rescan (Full).
Wait for the rescan to complete.
|
|
To test your index, don't select Start Search. Instead, in the Computer Management window, select the Query Catalog node listed under your test catalog. Submit a few queries that would work only on the full text of your PDFs. Avoid using document headings or titles. Did it work? If so, you're done! If you get no results, as shown in Figure 2-9, work through the next section, which explains a common workaround for Windows XP.
PDF IFilter and Indexing Service don't see eye to eye on Windows XP. If querying indexed PDF yields empty sets, give this a try:
In the Computer Management window (right-click My Computer and select Manage), right-click Services and Applications Indexing Service and select Stop.
Open the Registry Editor (Start Run . . . Open: regedit OK).
Select HKEY_CLASSES_ROOT and then search for pdffilt.dll in the registry data (Edit Find . . . Find what: pdffilt.dll Look at: Data Find Next).
You should hit upon an InprocServer32 key that references pdffilt.dll and specifies its ThreadingModel. Double-click the ThreadingModel and change it from Apartment to Both.
Select HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex and double-click the DLLsToRegister key to edit it.
In the list of DLLs, delete the following line:
C:\Program Files\Adobe\PDF IFilter 5.0\PDFFilt.dll
Click OK, and then close the Registry Editor.
Start the Indexing Service back up (right-click Services and Applications Indexing Service and select Start).
Rescan your test catalog. Do this by opening the catalog's Directories node, right-clicking your test directory, and selecting All Tasks Rescan (Full).
Wait for rescan to complete.
Your test query should now work, as shown in Figure 2-10.
|
When searching PDFs by selecting Start Search For Files and Folders, don't search for Documents. Search All Files and Folders instead. The Documents search overlooks PDFs.
If you indexed a specific folder instead of an entire drive, that folder (or one of its subfolders) must be given in the Look In: field when using Start Search For Files and Folders. Otherwise, the index won't be consulted; an unindexed search will be performed instead, even within the indexed folder. Set the Look In: field to a specific folder by clicking the drop-down box and selecting Browse . . . , as demonstrated in Figure 2-11.
When searching within an indexed folder, you can use advanced search terms (e.g., @DocTitle Contains "earnings"). Consult the Indexing Service online documentation, described earlier, for details.
Using the older Windows search tool on PDF still can be useful, even if it doesn't access the full text of your document. If the PDF documents are not encrypted, their metadata (Title, Author, etc.) and bookmarks are visible to the search tool as plain text. PDF shortcut titles [Hack #17] also are searched.