Section 9.11. Aggregating RSS with Blagg

Blagg (short for Blossom Aggregator) affords Blosxom the ability to aggregate (i.e., read and blog) RSS syndicated feeds of many flavors (0.9, 0.91, 0.92, 1.0) via a simple, command-line interface. Blagg builds on the simple, lightweight framework of Blosxom, interspersing aggregated stories amongst original entries.

Blagg is maintained as a separate project from Blosxom, because while it was designed with Blosxom in mind and is indeed fully integrated, it has its own place as the basis for just about anything needing a simple, lightweight RSS reader/aggregator. Indeed, this is already the case: via plugins (Blaggplugs), Blagg is able to push aggregated RSS to email recipients, instant messenger buddies, and to weblog software such as of MovableType and Blogger.

Blagg has two modes: interactive and automatic. Interactive mode occurs when Blagg is run from the command line. It picks up, displays, and prompts you for inclusion of individual stories from elsewhere into your weblog.

The automatic mode is designed to be run untended on a regular basis from a service such as cron. The aggregator simply drops new stories from your favorite feeds into your weblog. One use of automatic aggregation is to create a "daily dose" of your favorite feeds in one place to be read at your leisure. A group of like-minded individuals could produce a weblog composed of entries from their individual weblogs. An author can have her main weblog reflect the writing she does across various sites and weblogs.

9.11.1 Requirements

Blagg doesn't need much ? not even an XML-parser, for those of you who know what that means. It has only a single requirement on top of Blosxom: a command-line application capable of fetching a remote resource over the Internet; cURL, Lynx, and wget are good good choices.

Check for the availability of one or more of these applications ? if you're running Mac OS X or some form of Unix ? by typing which for each in turn on the command line:

% which curl

which: no curl in (/usr/local/bin...

% which lynx

which: no curl in (/usr/local/bin...

% which wget

/usr/bin/wget

Windows users can pick up a precompiled version of wget at:

ftp://sunsite.dk/projects/wget/windows/

9.11.2 Downloading

Blagg lives at http://www.raelity.org/lang/perl/blagg/, the Blagg home page. The latest version of the script itself is always available for download at http://www.raelity.org/lang/perl/blagg/download/blagg.

Download Blagg by visiting the Blagg home page, right-clicking (that's Ctrl-clicking, if you're on a Macintosh) the download link, and saving the file to your hard drive. Alternately, from the OS X or Unix command line, you can use the application Blagg will be using to grab remote RSS files. Here's how to use wget from the command line to download Blagg:

% wget http://www.raelity.org/lang/perl/blagg/downloads/blagg

--23:55:10--  http://www.oreillynet.com/%7Erael/lang/perl/blagg/downloads/blagg

           => `blagg'

Connecting to www.oreillynet.com:80... connected!

HTTP request sent, awaiting response... 200 OK

Length: 3,347 [text/plain]

    0K -> ...                                                    [100%]

23:55:10 (272.38 KB/s) - `blagg' saved [3347/3347]

9.11.3 Installing Blagg

Blagg setup is all but identical to that of Blosxom ? simply open the blagg script in your favorite text editor and adjust a few lines.

First make sure the first line of the script (#!/usr/bin/perl -w) correctly identifies the location of your Perl interpreter. This will be the same as it was in Section 9.3.

Blagg needs to know the location of your Blosxom install's data directory. Copy the $datadir line from blosxom.cgi and paste it in place of the default:

my $datadir = "/Library/WebServer/Documents/blosxom";

Specify which command-line application you'd like Blagg to use (the one you found in Section 9.1) to retrieve remote RSS documents by changing as appropriate:

my $get_prog = 'curl';

Mac OS X users can leave the line as it is:

my $get_prog = 'curl';

Unix users should find either lynx or wget close at hand:

my $get_prog = '-source';

or:

my $get_prog = ' wget --quiet -O -;

Windows users, if you grabbed a precompiled version of wget, should use:

my $get_prog = ' wget --quiet -O -;

Save the blagg script. You won't want to put the blagg script anywhere beneath your web server's directory; it should live in your home directory or wherever you would usually put executables ? a bin directory under your home directory is always a good place. Ensure that the blagg script is executable with:

% chmod 700 blagg

9.11.4 Configuring

First, you need to tell Blagg about your favorite RSS feeds. Fire up your favorite text editor and create an RSS datafile (rss.dat) consisting of one line per feed like so:

nickname  URL  [interactive or automatic]

These are broken down as follows:

Nickname

A short alphanumeric nickname for the feed (e.g., raelitybytes). This nickname is prepended to the filenames of all aggregated entries (e.g., raelitybytes.a_title.txt).

URL

The URL of the RSS feed to aggregate (e.g., http://www.raelity.org?flav=rss).

interactive or automatic

Whether this feed should be aggregated interactively (on the command line, story by story) or automatically (on a regular basis, blogging each and every story).

Example 9-6 shows a sample rss.dat file that automatically blogs anything it finds on the Reality Bytes weblog, yet prompts interactively for particular stories to blog from the Boing Boing weblog.

Example 9-6. A sample rss.dat file
raelitybytes  http://www.raelity.org?flav=rss   automatic

boingboing  http://www.newsisfree.com/HPE/xml/feeds/33/2733.xml  interactive

Save the RSS datafile as rss.dat in your main Blosxom $datadir or subdirectory of the weblog into which you'd like to aggregate these feeds. You can add, edit, or remove feed entries at any time simply be reediting the rss.dat file.