Hack 35 Read Web Pages Offline


Take the Web with you wherever you go, and put it into an easily searchable database on your PC.

One of the main problems with doing research on the Web is that there's no easy way to save all the information you find and no simple way to read web pages when you're offline. Internet Explorer includes some basic tools for saving web pages and reading through them when you're not connected to the Internet. If you need to save only occasional pages and don't need to do searches through those pages, then these tools will work reasonably well for you. But if you want to store pages in categories and folders and need to do full-text searches, then you'll need a third-party program. This hack shows you how to do both.

4.4.1 Reading Web Pages Offline Using IE

To save your current web page to your hard disk so you can read it again in Internet Explorer when you're not connected to the Internet, choose File Save As. You'll be given several options for how to save it. If you're not planning to edit the HTML of the file, your best bet is to save it as a Web "Archive, single file" (.mht). That way, you don't clutter up your hard disk with extra folders and files stored in different locations; everything is saved to a single file. Saving it as a "Web Page, complete" stores the HTML file as well as associated graphics, in a folder structure. Saving it as a "Web Page, HTML only" saves just the HTML file itself, with no associated graphics and no folder structure. You can also save it as a text file, but if you do, expect to spend time cleaning it up, because it saves all the text on the page, often in an unstructured way. To read the page after you've saved it to your disk, choose File Open, browse to the directory where you've saved the page, and open it.

There are times when you want to save not just the page you're on, but also the pages linked off it. To do that, you'll have to save your pages another way. First, save the page to your Favorites list by pressing Ctrl-D or choosing Favorites Add to Favorites. Then, right-click on the page where it's listed in Favorites and choose Make Available Offline. A wizard will appear. Follow its instructions, and when you get to the screen shown in Figure 4-6, tell it how many links deep you want pages saved. Be very careful when doing this, because even choosing to keep one link level can take up a substantial amount of hard disk space.

Figure 4-6. Saving web pages offline several links deep using the Offline Favorite Wizard

When you finish the wizard, you're asked how you want to synchronize the page or pages you've chosen to save to disk. When you synchronize a web page, IE grabs the latest version of the page or pages, and overwrites your existing page or pages. If you want to keep a permanent copy of the page or pages, and don't want them updated, choose "Only when I choose Synchronize from the Tools menu." Then, simply don't synchronize the page. If you instead do want to synchronize the page so that a more current version is available on your hard disk, choose "I would like to create a new schedule," and follow the instructions for creating a schedule.

4.4.2 Save Web Pages in an Offline Database with SurfSaver

If you need to save many web pages and want to be able to search through them by full-text or keyword searches, you'll have to use a third-party program. My favorite is SurfSaver, available from http://www.surfsaver.com (see Figure 4-7). It integrates directly into Internet Explorer and lets you save pages in separate folders within the program. You can add keywords and notes to each page, and then search by keyword or full text, or browse by folder.

When you visit a web page you want to save locally, right-click on the page, choose SurfSaver Save, and choose which SurfSaver folder you want to save it in. You can save the page with or without graphics. When you want to search, right-click on the page, choose SurfSaver Search, and then search by keyword, through notes, or through the full text on the page to easily find the page and information you want. SurfSaver also integrates directly with the freeform askSam database.

Figure 4-7. Saving web pages in a database with SurfSaver