6.2 Defining the Vocabulary: Business and Scope

As the Web has matured, more and more of the posted content is aging beyond usefulness. In many cases, this aged content is just deleted from a web site, resulting in "404 Page not found" errors when you click through to the content from some search engine or via a link from another web page. Hitting a missing page is particularly frustrating if you've come to the page because of a description associated with it that exactly fits your current interest, and you don't even know why the page was deleted or if the resource might exist somewhere else.

A further problem with maturing web sites is that site structure doesn't remain constant?due to the use of new technologies or new directions in content management, resources may be moved around at the site or even moved to new domains. When you access the content, the less-than-helpful sites return with something along the lines of:

404 Not Found 
We're sorry, the file that you requested does not exist or has moved. 

Well, which is it? Is the page missing, or was the request invalid because the content's moved? If you get this message as a result of clicking on a link from another site, is it because the content's really been deleted or moved, or because the linking site made a mistake with the link? Is the site that owns the content using a new system of cataloging its resources, breaking existing links?

Other sites provide a page with a forwarding message and a link to redirect you to the new content. As important as these redirections are, though, the reasons behind the move may be additional information that can be useful in determining whether the resource is worth pursuing through what could end up being a chain of redirections, with each link in the chain reflecting a different move.

Unfortunately, the reasons for the move aren't maintained with the redirection in most cases.

Another problem is aging content that isn't deleted. With this type of page, you could be halfway through reading it only to realize that it talks about a product or technology that's been obsolete for years. There's nothing to indicate the relevance of the page, and external factors associated with the page, such as the page title or label, may not provide enough context to determine whether the resource is useful for your purposes or not.

Netscape's support of Dynamic HTML (DHTML) for the company's browser is a classic case of content being under one label?DHTML?with two drastically different implementations based on browser version. DHTML for Version 4.x of Netscape won't work with the current Netscape 6.x products and vice versa. The only way to determine whether a page titled "Working with DHTML in Netscape" is useful for your purposes is to read it and hope you know enough about the subject to know whether you're wasting your time.

Content management systems such as FrontPage, Vignette, and others help with creating, posting, and managing the original content, but do not help provide information about the context of the resource. meta tags can be attached to each HTML resource providing copyright information, keywords, or authorship, but nothing regarding the expected life expectancy of the resource or its move history, including reasons for the move, unless you put this information into the description ? an approach that isn't standardized and therefore not useful.

These systems are as helpless as web browsers at determining whether a 404 error occurred because of a typo, a relocation, or a resource no longer being maintained at the site.

What's needed is a content system that takes over after the content management systems have finished their task of posting the content: a postcontent information system that can be accessed by a runtime application and provide information about the resource to the resource consumers. Such a system must provide information that is useful for humans and is also usable by automated processes.

We'll use this type of system to demonstrate how to create an RDF vocabulary and, eventually, how to use the vocabulary just created. For simplicity in this chapter (and later in the book), I'll refer to this system as PostCon.