13.6 RSS Aggregators

Newspapers, magazines, and other traditional forms of publication are increasingly putting their data online. Add to this the steadily growing number of weblogs, and you have an enormous pool of information sources to draw on.

To help in this process, many of these publications publish RSS files providing headlines, entry links, and brief descriptions of newly published writing. News aggregators gather this data from numerous sources, presenting it to you in one page for quick perusal. If you see a story of interest, you can then click on the provided link to get to the publication.

This section takes a look at three popular aggregators: one Mac-only aggregator (NetNewsWire), one cross-platform desktop-based reader (AmphetaDesk), one web based (Meerkat), and one Mac-only aggregator (NetNewsWire). First, though, a brief discussion about RSS autodiscovery is in order.

At its simplest, you can view RSS files in a web browser, formatting the data with an associated stylesheet, though you won't have the aggregation capability with this approach.

13.6.1 RSS Autodiscovery

How do you discover an RSS file, RDF based or otherwise? A person could embed a graphic or a link to the file in every page that has an associated RSS file, but this forces the person to add the button manually and the subscriber to go to the linked page and then add it as an RSS feed. A better approach is to use RSS autodiscovery.

RSS autodiscovery is enabled by adding a link element in your primary web page that provides the link to your syndication RSS. The HTML format is as follows:

<link rel="alternate" type="application/rss+xml" title="RSS" href="url/to/rss/file">

In my weblog, the autodiscovery is set to the following, using the XHTML version of link?the preferred approach:

<link rel="alternate" type="application/rss+xml" title="RSS" href="http://weblog.
burningbird.net/index.rdf" />

With autodiscovery, your readers can just provide the web page URL to the aggregator that supports autodiscovery, and the tool uses this to find the RSS file on its own. Note that the format of the link must be consistent?the only thing that should change is the URL to the RSS file. Also note that this link should go in your document's HEAD section.

13.6.2 AmphetaDesk

AmphetaDesk is a news aggregator that resides on your (Linux, Windows, or Mac OS) computer, and that touches RSS files at set frequencies looking for updates. Its creator is Kevin Hemenway, who goes by the pseudonym of Morbus Iff.

AmphetaDesk is open source and freely available, supported only by donations (download and read more about the application at http://www.disobey.com/amphetadesk/ and the open source project at http://sourceforge.net/projects/amphetadesk/).

In addition to creating software such as AmphetaDesk, Kevin is also a prolific author of both books and articles, primarily for O'Reilly. Kevin's main web site is at disobey.com.

AmphetaDesk is one of the simplest-to-use aggregators, but that's not its only appeal. AmphetaDesk is also customizable, from changing channels to changing the appearance to changing the code itself (AmphetaDesk is written in Perl).

13.6.2.1 Using AmphetaDesk

Once installed, running the application for the first time opens a web browser with a page containing preset aggregated news items. You can continue using the settings as is or you can customize the data. For instance, if you have several weblogs you're interested in keeping abreast of, you can add each of them as channels; the tool will track changes for you.

I customized AmphetaDesk by removing all of the preselected channels, and then added my own favorites. From the browser page I clicked the link labeled My Channels and then checked all of the boxes next to the items listed, as shown in Figure 13-1.

Figure 13-1. Customizing AmphetaDesk by removing existing channels
figs/prdf_1301.gif

The channels are RSS files. To add new channels, you can specify the URL of the feed in the text box as shown in Figure 13-1, clicking the button labeled Add This Channel, or you can use RSS autodiscovery by providing the URL of the web page that contains the autodiscovery link. Additionally, you can use the built-in Add a Channels page that contains 9000+ prediscovered channels.

AmphetaDesk, like many other aggregators, supports adding a variation of a Subscribe link to your links toolbar on your browser, by dragging a link from a web page to the bar. I did this with my Mozilla browser. Now when I visit a page I'm interested in subscribing to, I just click the Subscribe link on the toolbar and the RSS feed for the site is added as a channel to my customized AmphetaDesk.

I visited several of my favorite news sources and weblogs and added each of them as an RSS channel. Next, I went to the application settings (accessible through the My Settings link) and further customized the tool by changing the time to check for updates to one hour (60 minutes). The application checks the RSS sources every 60 minutes and posts new entries to the Channels page.

13.6.2.2 More advanced customization

If you like to play around with your software, you can further customize AmphetaDesk in several different ways. For instance, you can add a new "skin" to the application, which means controlling the look of the application by adjusting the templates the tool uses. However, use caution and always back up the files as you experiment. The documentation for customizing the application skin is included with the product.

If you're a developer, you can further customize the application, either working within the Source Forge project or creating your own customized version of AmphetaDesk. However, be forewarned that if you change the code base for your own installation, you'll need to merge the changed code in with new releases of AmphetaDesk. Find out more about AmphetaDesk code customization at http://www.decafbad.com/twiki/bin/view/Main/AmphetaOutlines and http://www.cantoni.org/software/AmphetaDesk.html

A particularly interesting aspect to AmphetaDesk is that you can modify the source code for the application and run it locally, without having Perl installed.

13.6.3 Meerkat

Meerkat is an online news aggregation support that can be customized, though customization is limited to querying from existing news sources. It's accessible online and requires no installation on your machine?only a browser.

Access the main site at http://oreillynet.com/meerkat/. Meerkat was written by Rael Dornfest, from O'Reilly.

After accessing the Meerkat site, the first page that opens is the application page. It has a form at the top that allows you to pick predefined filters, labeled Profiles/Mobs. Selecting one of the predefined filters limits the news headlines to just those that are related to the filter. In Figure 13-2, the filter picked was labeled Apache, and the news headlines were related to Apache-related items.

Figure 13-2. Meerkat news filtered on Apache-related items
figs/prdf_1302.gif

You can also filter the results based on categories and channels or search for specific terms. Additionally, you can modify how many entries show by selecting a time frame. Clicking on the Refresh button updates the display after you've changed your selections.

I've been a Meerkat subscriber for more than a year, yet I hadn't accessed the site directly in months, not until I started working on this chapter. This seemingly contradictory statement can be explained by discussing the real power behind Meerkat?the ability to incorporate the Meerkat services into your own applications, web pages, or desktop.

Meerkat has exposed APIs, based on different technologies, to access the news feed (more on this at http://www.oreillynet.com/pub/a/rss/2000/05/09/meerkat_api.html). These flavors, as the different open APIs are called with Meerkat, allow you to incorporate the Meerkat into your like-flavored application.

For instance, I incorporate a JavaScript-based Meerkat feed into one of my web sites, using the following code:

<div style="font-size: 8pt; font-family: Times New Roman">
<script language="JavaScript"
src="http://meerkat.oreillynet.com/?p=1&_fl=js&_de=0">
</div>

This code describes a JavaScript-flavored news feed, with full descriptions turned off (to save space), using a profile of 1, which is all news stories. I could have further modified the feed by using other parameters. For instance, the following parameters impact which stories show:

s

Specify search, using plus sign (+) to delimit keywords

sw

To specify what is searched, such as title, description, and so on

c

To display a specific channel

t

To set time period of displayed items

p

To specify a particular profile

m

To specify a particular mob

i

To specify a particular story

In addition, the following parameters influence the display:

_fl

Flavor

_de

Whether descriptions are shown

_ca

Whether category is shown

_ch

Which channel the story came from

_da

Story date

_dc

The Dublin Core metadata associated with the stories

So, I can change my current setting to the following:

<div style="font-size: 8pt; font-family: Times New Roman">
<script language="JavaScript"
src="http://meerkat.oreillynet.com/?p=1&_fl=js&_de=1&t=1DAY">
</div>

Notice that the parameters are separated by the ampersand.

The setting just shown instructs Meerkat to give me the stories from profile 1, one day's worth, JavaScript formatted, and with full descriptions. All other parameters are set to their default settings. The result of this JavaScript embedded into a web page is shown in Figure 13-3.

Figure 13-3. Meerkat display using JavaScript to access feed
figs/prdf_1303.gif

You can see how uncomplicated the Meerkat feed is to access. Best of all, the feed uses CSS, so you can modify the display of the feed by incorporating CSS settings for predefined Meerkat values.

The Meerkat API has been ported for use with many languages and technologies including a raw RSS feed, XML, PHP, Sherlock plug-in, N3, HTML, and others. In addition, you can access Meerkat services using XML-RPC.

13.6.4 NetNewsWire and NetNewsWire Lite

An RSS aggregator, plus more, that is gaining considerable popularity with Mac OS X users is Ranchero Software's NetNewsWire and its lighter version, NetNewsWire Lite. I use it myself on my PowerBook and am very impressed with its ease of use. And as with so many other aggregation tools, you can do more than just read RSS feeds with the tool?with the commercial version still in beta when I wrote this book, you can edit weblog postings with the tool and use it to post them to your weblog.

You can download NetNewsWire from http://ranchero.com/netnewswire/. The RSS aggregator-only version is NetNewsWire Lite; the commercial version NetNewsWire is the one with the extra features such as weblog posting. I used the NetNewsWire beta 1.01b for this chapter.

When you first install and open NetNewsWire, it comes with only a few sites already subscribed. However, it's an easy matter to subscribe to new feeds, particularly if the web page that provides the feed supports autodiscovery.

Figure 13-4 shows NetNewsWire Pro with some of my favorite weblogs, including my own, subscribed.

Figure 13-4. NetNewsWire Pro with a few subscribed RSS feeds
figs/prdf_1304.gif

You can use the tool to jump to unread items, or you can click on any of the subscriptions to display the current RSS contents. Clicking on any of these opens the excerpt associated with the item. To add new items, just click the Subscribe button and fill in the URL of the RSS file or the X(HTML) page that has the RSS autodiscovery entry. That's it. The tool will fill in the necessary information from the file, including the title associated with the RSS feed.

If you double-click on any one of the items, the actual page opens in your default browser, which you can set via your System Preferences.

You can also open the actual RSS XML of the feed in TextEdit as shown in Figure 13-5. This is a handy way of getting a little more familiar with RDF/RSS?by using the tool to take a look at RDF/RSS files out on the Web.

Figure 13-5. Examining a feed in TextEdit
figs/prdf_1305.gif

Other functionality NetNewsWire Pro supports includes the ability to create weblog postings, check their spelling, and then post directly from the tool.