1.4 Legal Implications

The copyright implications for RSS feeds are quite simple. There are two choices for feed publishers, and these reflect on the user.

First, the publisher can decide that the feed must be licensed in some way. In this case, only authorized users can use the feed. It is good manners on the part of the publisher to make it as obvious as possible that this is the case ? by providing a copyright notice in an XML comment, at least, and preferably by making it difficult for unauthorized users to get to the feed. Registering a pay-only feed with all the aggregators is asking for trouble.

Second, and most commonly, the publisher can decide that the RSS feed is entirely free to use. In this case, it is only polite for the publishers of public RSS feeds to consider the feed entirely in the public domain ? free to be used by anyone, for anything. This might sound a little radical to the average company vice president, but remember: there is nothing in the RSS feed that is not, in some way, in the actual source information in the first place. It is rather futile to get upset that someone might not be using your headlines in the company-approved font, or committing a similar infraction, and somewhat against the spirit of the exercise.

Screen scraping a site to create a feed, by writing a script to read the site-specific layout, is a different matter. It has already been legally proven, in U.S. courts at least (in the Ticketmaster versus Tickets.com case of October 1999 to March 2000), that linking to a page is not in itself a breach of copyright. And one could argue, perhaps less convincingly, that reproducing headlines and excerpts from a site comes under fair-use guidelines for review purposes. However, it is extremely bad form to continue scraping a site if the site owner asks you to stop. This is not encouraged at all. Instead, try to evangelize RSS to the site owner, and get him to start a proper feed. Buy him this book: it's great for gifts!

If You Are Being Scraped

If you are being scraped heavily and want to stop it, there are four ways to do so. First, scrapers should obey the robots.txt directive ? setting a robots.txt file in the root directory of your site should control things. Second, you can contact the scraper and ask them to stop. If they are professional they will do so immediately. Third, you can block the IP address of the scraper, although this is sometimes rather like herding cats ? scrapers can move around.

The fourth and best way is to make an RSS feed of your own. We'll show how to do this in Chapter 2.