5.1 Stylesheets

XML and stylesheets go together like naked people and clothes. Let's take a moment to familiarize ourselves with the general concepts behind stylesheets. First, why do you want them? Second, how do they work? Finally, are there limitations, and what can we do about them?

5.1.1 Why We Need Them

I can rant about why it's important to keep information pure and separate presentation into stylesheets, but this would ignore a critical question: if it's easier to write in presentational markupand I admit that it iswhy would you want to bother with stylesheets? After all, the Web itself testifies to the fact that presentational markup is working quite well for what it was designed to do. For that matter, what's wrong with plain text?

If you are already familiar with the sermon, then skip this section, because I'm going to preach the religion of stylesheets now.

XML was inspired, to a large extent, by the limitations of HTML. Tim Berners-Lee, the inventor of HTML, always had stylesheets in mind, but for some reason, they had been forgotten in the huge initial surge of webification. Although HTML had only limited presentational capabilities built in, it was enough to satisfy the hordes of new web authors. Easy to implement and even easier to learn, HTML was soon stretched far beyond its original intentions as a simple report-formatting language, forced to encode everything from product catalogs to corporate portals. But the very thing that led to its rapid uptake, presentational markup, is also holding HTML back.

Here are some problems associated with presentational markup and some solutions provided by stylesheets:

Low information content

Presentational markup is not much better than plain text. A machine can't understand the difference between a typical body paragraph and a poem or code sample. Nor does it know that one thing is marked bold because it's a stock price and another is bold because it's the name of a town. Consequently, you can't easily mine pages for information. Search engines can only try to match character strings, since any sense of context is missing from the markup.

With stylesheets, you are free to mark up a document in a way that preserves information. XML markup languages are tailored for their data, using element names that describe what things are rather than how they should look. This makes it possible to create data mining software or search engines that use markup information in addition to content.

Management nightmare

When your markup is presentational, the design details are inextricably mixed with the content. You have to get into the document itself anytime you need to make a change to the design. This can be quite laborious. For example, to change italicized proper nouns to bold typeface means editing each case manually, since no automatic search-and-replace utility can recognize a proper noun from an emphasized phrase.

Stylesheets untangle presentation details from the document and store them in a separate place. If you need to make a change to the design of a document, you can change one place in the stylesheet, rather than manually fix every hard-coded tag in the markup. What's more, since one stylesheet can be used for many documents (see Figure 5-1), one simple change can affect many pages. In a setting where design changes occur frequently, this is a godsend.

Figure 5-1. One stylesheet used by many documents
figs/lx2_0501.gif
Dubious vendor extensions

Seeking to extend their market share by addressing the frustrations of web authors, some tool vendors have taken the initiative of extending HTML on their own, circumventing the standards process. Differentiating their products this way seems good at first, but it leads to a horrible fracturing of your audience. People have to use the same tools as you in order to view your information. Web pages with messages like "best when viewed by browser X" have fallen into this trap.

Stylesheets are regulated by standards bodies. Instead of inventing ad hoc extensions that fragment the community, vendors instead are encouraged to implement the same standards as everyone else. If something that people need is missing, it's better to tell the standard maintaining body than to go it alone. (That's the theory, anyway. In practice, it doesn't always work. CSS has been around for a long time, and to date no one has completely implemented it correctly. There are also a few vendor extensions.)

Device dependence

Designers often focus too much on getting something to look just right instead of making it good enough for a range of uses. In the process, they end up making their document too inflexible for viewers. For example, it's tempting to set the column widths of a table with absolute widths. It may look terrific on the designer's 17-inch screen, but simply awful on a smaller, 14-inch monitor, not to mention PDAs and cell phones.

Instead of trying to exploit side-effects of tags or tinkering with minute details in markup to get the right effect, designers now can work with a stylesheet. Stylesheets typically are designed with multiple purposes in mind. For example, typeface selection is flexible, taking into account the capabilities of the reader's local system. Stylesheets supply more options for specifying sizes, lengths, colors, margins, and other properties, usually with the option for flexible, relative dimensions.

Limited reusability

Sometimes you want your document to be used in different ways, but are limited by the hardwired presentation details. Write a document in HTML and it's only good for viewing online. You can print it out, but the typeface looks big and blocky because it's designed for a computer screen. You could cut and paste the content into another program, but then you'd lose design information or you'd carry in some unwanted artifacts (spurious whitespace for example).

Again, the separation of stylesheet from document will help you here. You can write as many stylesheets as you want for different purposes (see Figure 5-2). In the document, you only need to change a simple setting to associate it with a different stylesheet. A web server can detect what device is requesting a page and select an appropriate stylesheet. The user can also make changes to the presentation, substituting their own stylesheet, or just overriding a few styles.

Figure 5-2. Mixing and matching stylesheets for different purposes
figs/lx2_0502.gif

To summarize, the three principle ways in which stylesheets help you are:

  • Making design changes easier by disentangling the details from the document.

  • Enabling multiple uses for one document.

  • Giving the end user more control and accessibility.

The key to all this is what we in XML intellectual circles call late binding. Keep the document as far away from its final product as possible, and you maximize its flexibility. The stylesheet extends the information into a particular realm, be it online viewing or print or spoken text.

5.1.2 How They Work

Think of applying a stylesheet to a document as preparing a meal from a cookbook. Your XML document is a bunch of raw, unprocessed ingredients; the stylesheet is a recipe, telling the chef how to handle each ingredient and how to combine them. The software that transmutes XML into another format, based on the stylesheet's instructions, is the chef in our analogy. After the dicing, mixing, and baking, we have something palatable and easily digested.

5.1.2.1 Applying properties

In the simplest sense, a stylesheet is like a table that maps style properties to elements in the document (see Figure 5-3). A property is anything that affects the appearance or behavior of the document, such as typeface, color, size, or decoration. Each mapping from element to property set is called a rule. It consists of a part that matches parts of a document, and another that lists the properties to use.

Figure 5-3. A stylesheet used to help produce a formatted document
figs/lx2_0503.gif

The software that uses a stylesheet to generate formatted output is a stylesheet processor. As it reads the XML document, it keeps consulting the stylesheet to find rules to apply to elements. For each element, there may be multiple rules that match, so the processor may apply them all, or it may just try to find one that is the best fit. Matching rules to elements can be a complex process in writing a stylesheet.

5.1.2.2 Client and server-side processing

The result of this processing is either a generated display or another document. The first case is what you would see in a web browser on screen, or a printout from the browser. It's a device-dependent result, created on the client end of the transaction. Client-side processing like this takes the load off of the server making information propagate faster. It also gives the end user more control over the appearance by being able to override some style settings.

The other kind of output from a style processor is a new document. XSLT, which we will explore in Chapter 7, is such a stylesheet language. This sort of process is also known as a transformation because it effectively transforms the original document into a different form. Transformations can be performed on either the server or the client end, before or during the transaction. For example, a DocBook document can be transformed into HTML for presentation on a browser. It's a very powerful technique that we will have fun talking about later.

5.1.2.3 Cascading styles

Stylesheets can be modularized to mix and match rules. This is the source of the term "cascading" in CSS. The idea is that no stylesheet ought to be monolithic. Instead, you should be able to combine styles from different sources, create subsets, and override rules in different situations. For example, you can use a general-purpose stylesheet combined with one that fine-tunes the style settings for a specific product (see Figure 5-4).

Figure 5-4. A cascade of stylesheets
figs/lx2_0504.gif

One reason for doing this is to make it easier to manage a huge set of style rules. The fix-once-see-everywhere principle is enhanced when changes to one set of rules are inherited by many stylesheets.

Cascading rules also make it possible to change the result at any point in the transaction. All browsers in fact have a default client-side stylesheet that specifies how to render HTML in the absence of any specifications from the server. Some browsers, such as Mozilla allow you to edit this local stylesheet, or user stylesheet, so you can customize the appearance to suit your tastes. For example, if you think most text in pages is hard to read, you can increase the default size of fonts and save yourself some eyestrain.

5.1.2.4 Associating a stylesheet to a document

There is no hard and fast rule for associating resources to documents in XML. Each markup language may or may not provide support for stylesheets. Some languages, like MoDL, have their own software to display or process the data, so it wouldn't make any sense to use stylesheets. In others, the language doesn't have an explicit method for association, so stylesheets are a good idea, as is the case with DocBook.

HTML happens to define an element for linking resources like stylesheets to the document. The link element will bind a web page to a CSS stylesheet like this:

<html>
  <head>
    <title>The Behavior of Martian Bees</title>
    <link rel="stylesheet" type="text/css" href="honey.css"/>
  </head>
  ...

Alternatively, in HTML you can embed the stylesheet inside the document:

<html>
  <head>
    <title>The Behavior of Martian Bees</title>
    <style>
      body { background-color: wheat;
             color: brown;
             font-family: sans-serif; }
      .sect { border: thin solid red; 
              padding: 0.5em; }
    </style>
  </head>
  ...

Of course, this is more limiting, since the stylesheet can't be applied to more than one document and the document is stuck with this stylesheet. In general, I think keeping the two separate is a better idea, but there may be advantages to putting them together, such as making it easier to transport a document through email.

These two solutions work because HTML evolved together with stylesheets. Not all languages are designed with them in mind. For this purpose, XML provides a generic way to embed a stylesheet using a processing instruction named <?xml-stylesheet?> whose syntax is shown in Figure 5-5.

Figure 5-5. Syntax for a stylesheet declaration
figs/lx2_0505.gif

The declaration begins with the processing instruction delimiter and target, <?xml-stylesheet (1). The PI (processing instruction) includes several property assignments similar to attributes, two of which are required: type and href. type (2), is set to the MIME type (3) of the stylesheet (for CSS, this is text/css). The value of the other property, href (4), is the URL of the stylesheet (5), which can be on the same system or anywhere on the Internet. The declaration ends with the closing delimiter (6).

Here's how it would be used in a document:

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="bookStyle.css"?>
<book>
  <title>Tom Swift's Aerial Adventures</title>
  <chapter>
    <title>The Dirigible</title>
  ...

Using a processing instruction for this purpose is smart for a few reasons. First, it doesn't "pollute" the language with an extra element that has to be declared in a DTD. Second, it can be ignored by processors that don't care about stylesheets, or older ones that don't know how to work with them. Third, it isn't really part of the document anyway, but rather a recommended way to work with the information.

5.1.3 Limitations

Not surprisingly, there are limits to what you can do with stylesheets. Languages for stylesheets are optimized for different purposes. You need to be aware of how a stylesheet language works to use it most effectively.

CSS, for example, is designed to be compact and efficient. Documents have to be rendered quickly because people don't want to wait a long time for something to read. The stylesheet processor is on the client end, and doesn't have a lot of computing power at its disposal. So the algorithm for applying styles needs to be very simple. Each rule that matches an element can only apply a set of styles. There is no other processing allowed, no looking backward or forward in the document for extra information. You have only one pass through the document to get it right.

Sometimes, information is stored in an order other than the way you want it to be rendered. If that is the case, then you need something more powerful than CSS. XSLT works on a tree representation of the document. It provides the luxury of looking ahead or behind to pull together all the data you need to generate output. This freedom comes at the price of increased computational requirements. Although some browsers support client-side XSLT processing (e.g., Internet Explorer), it's more likely you'll want transformations to be done on the server side, where you have more control and can cache results.

Property sets are finite, so no matter how many features are built into a stylesheet language, there will always be something lacking, some effect you want to achieve but can't. When that happens, you should be open to other options, such as post-processing with custom software. In Chapter 10, I'll talk about strategies for programming with XML. This is the ultimate and most work-intensive solution, but sometimes there just is no other way to get what you want.

Unquestionably, implementation among clients has been the biggest obstacle. The pace of standards development was much faster than actual implementation. Browsers either didn't support them or had buggy and incomplete implementations. This is quite frustrating for designers who want to support multiple platforms but are stymied by differing behaviors among user agents. Not only does behavior vary among vendors, but among versions and platforms too. Internet Explorer, for example, behaves very differently on Macintosh than it does on Windows for versions that came out at the same time.

When I wrote the first edition of this book, I was quite disappointed by the level of support for CSS. Any but the most simple example would not work on more than one browser. Since then, the situation has improved a little. Mozilla has much better support for CSS now. Internet Explorer, which used to be the leader, has inexplicably remained stuck for over a year and Microsoft has suggested its development as a standalone application is complete. Sure, CSS is rich and featured, but it shouldn't be that difficult to implement. I think the open source movement offers the most hope, because there will always be an opportunity for some hacker, fed up with an unimplemented option, to go in and get it working; whereas with corporations, we have to wait until the marketing department deems it a high enough priority to put it on the schedule.