Hack 86 Saving Web Pages for Offline Reading

figs/beginner.giffigs/hack86.gif

Save a single web page or even clusters of web pages in their entirety for reading on public transportation, at 35,000 feet, or anywhere else you happen to be.

There comes a time when we happen across a web page that is so uproariously funny, we simply must archive it forever. On the other hand, sometimes we want to save a few online transactions for proof of purchase. Even more commonly, we may run across a large site that we want to read in its entirety, but we don't want to tie up our phone line or incur bandwidth charges. Thankfully, OS X satisfies our offline-reading desires in a number of ways.

When we need to archive a web page or site quickly, a few options present themselves, depending on our goal. The quickest and closest opportunity is to use Microsoft Internet Explorer, bundled with Mac OS X. Within this popular browser lies a Scrapbook, as well as the ability to create Web Archives.

The concept of a Scrapbook harkens back to the pre-OS X days with a built-in system accessory called, conveniently enough, Scrapbook. With it, you could drop in files, text, sounds, movies, and pictures and then flip through the pages, viewing each item as part of a grander book.

A similar concept is built into Internet Explorer. At any time, you can take the web page you're currently looking at and save it into your the IE Scrapbook for later viewing. To do so, make sure the Explorer Bar is enabled (View Explorer Bar) and click the Scrapbook tab to slide out its panel. To add the current web page into the Scrapbook, click the now-visible Add button.

You'll immediatley see the title of the current web page show up in the Scrapbook panel along with a camera icon, signifying that this is a snapshot of the current page. Now, or any time in the future, you can click an item in your Scrapbook and see an exact copy of what you were looking at, along with the time it was archived and its original URL. Just like bookmarks, you can organize your Scrapbook items into folders, rename and delete, and give them comments.

The Scrapbook excels at saving one page but doesn't do well at multiple pages; you'll need to create new entries for each page of the site manually. If you're looking to archive a whole site (the chapters of a book, news items in central Florida, etc.), you'll want to look at Internet Explorer's Save As feature, which has an easily-ignored Web Archive output.

There are a number of options available to a Web Archive, and all of them concern how much you want to save to disk. By default, a Web Archive does the same thing as a Scrapbook item, taking the current page (and all its images), wrapping it up into one proprietary file, and then saving it to your hard drive. The various options allow you to save sounds and movies but, more importantly, specify how many levels of other linked pages you want to archive, along with the current one.

Of course, the more things you want to archive, the larger the archive is going to become. The Save As Web Archive option is certainly more powerful than the Scrapbook, simply because of its namesake: it's more an archive then a single page in a book. However, it does have two limitations.

Since the Web Archive feature creates a single, Internet Explorer-only file, it's not ideal if you're looking to collect and store only certain data, like the illustrations of your favorite artist; you're going to get the pages whether you want them or not, and there's no easy way for you to extract the images from the resultant single archive. The second limitation is how much you can archive; you can't choose more than five levels deep, which, granted, is probably enough.

If these limitations are deal breakers, then there are many other utilities for you to explore. The shareware Web Devil (http://www.chaoticsoftware.com/ProductPages/WebDevil.html) has been around for years and provides a handy, powerful GUI for your web-sucking needs. If you prefer shell utilities, then look at GNU wget (http://apple.com/downloads/macosx/unix_open_source/wget.html), which provides a powerful, automated interface to downloading [Hack #61] and mirroring. Both utilities support filtering (i.e., save only .jpg and .gif files, etc.).