8.5 An Example: TEI

Let us take a break now from terminology and see an actual example. The document to format is a Shakespearean sonnet encoded in TEI-XML, a markup language for scholarly documents, shown in Example 8-2. It consists of a header section with title and other metadata, followed by the text itself, which is broken into individual lines of poetry.

Example 8-2. A TEI-XML document
<?xml version="1.0"?>
<!DOCTYPE TEI.2 SYSTEM "http://www.uic.edu/orgs/tei/lite/teixlite.dtd">
<TEI.2>
<!-- The metadata. TEI has a rich vocabulary for describing a 
     document, which is important for scholarly work.
-->
  <teiHeader>
    <fileDesc>
      <titleStmt><title>Shall I Compare Thee to a Summer's Day?</title>
        <author>William Shakespeare</author>
      </titleStmt>
      <publicationStmt>
        <p>
Electronic version by Erik Ray 2003-03-09. This transcription is in
the public domain.
        </p>
      </publicationStmt>
      <sourceDesc>
        <p>Shakespeare's Sonnets XVIII.</p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>

  <!-- The body of the document, where the sonnet lives. <lg> is a
       group of lines, and <l> is a line of text.
   -->
  <text>
    <body>
      <lg>
<l>Shall I compare thee to a summer's day?</l>
<l>Thou art more lovely and more temperate:</l>
<l>Rough winds do shake the darling buds of May,</l>
<l>And summer's lease hath all too short a date:</l>
<l>Sometime too hot the eye of heaven shines,</l>
<l>And often is his gold complexion dimm'd;</l>
<l>And every fair from fair sometime declines,</l>
<l>By chance or nature's changing course untrimm'd;</l>
<l>But thy eternal summer shall not fade</l>
<l>Nor lose possession of that fair thou owest;</l>
<l>Nor shall Death brag thou wander'st in his shade,</l>
<l>When in eternal lines to time thou growest:</l>
<l>So long as men can breathe or eyes can see,</l>
<l>So long lives this and this gives life to thee.</l>
      </lg>
    </body>
  </text>

</TEI.2>

A typical TEI document consists of two parts: a metadata section in a teiHeader element and the text body in a text element. Using an XSLT stylesheet, we will transform this into an FO tree, then run the tree through a formatter to generate a formatted page. Actually, I used FOP for this, which combines the XSLT transformation and XSL-FO formatting into one step for me.

TEI

The Text Encoding Initiative (TEI) is an international standard for libraries, museums, and academic institutions to encode scholarly texts for online research and sharing. Founded in 2000 by several collaborating research groups, the TEI Consortium is funded by the National Endowment for the Humanities, the European Union, the Canadian Social Research Council, and private donors. It publishes a DTD for marking up texts and a detailed tutorial.

TEI-XML is a markup language that can be applied to anything from groups of books to plays and poetry. Its hallmarks include a header with a very detailed set of metadata elements, and a text body whose structure is reminiscent of DocBook. Although the DTD is very large, it is modular and can be customized for specific needs.

The effort has been very successful with tens of thousands of books, articles, poems, and plays now encoded in TEI-XML. Huge repositories have made these documents available to the public, such as the Electronic Text Center (http://etext.lib.virginia.edu/) at the University of Virginia Library. It holds approximately 70,000 humanities texts in 13 languages.

The TEI Consortium's web site (http://www.tei-c.org/) is brimming with helpful documents and pointers to resources. The tutorial, TEI P4: Guidelines for Electronic Text Encoding and Interchange, is available online at http://www.tei-c.org/P4X/SG.html.

In designing the stylesheet, I like to start with the page layout. This is a very small document that can fit entirely on one page, so we only need define one page master. (Later on in this chapter, I'll show you a more complex example that uses a lot of different page masters.) Below, I've created a page master that defines two regions, a header and a body. My style is to define this information in an XSLT template matching the root node.

<xsl:template match="/">
  <fo:root>   <!-- The root element contains 
                   everything in the FO tree -->

    <fo:layout-master-set>

      <!-- The page master object with settings for
           the page's content rectangle -->

      <fo:simple-page-master 
            master-name="the-only-page-type"
            page-height="11in" page-width="8.5in"
            margin-top="1in" margin-bottom="1in"
            margin-left="1.2in" margin-right="1.2in">

        <!-- A body region 20 millimeters below the top 
             of the page's content rectangle.  -->

        <fo:region-body margin-top="20mm"/>

        <!-- A header that is 3/10 inch tall -->

        <fo:region-before extent="0.3in"/>

      </fo:simple-page-master>
    </fo:layout-master-set>
    <xsl:apply-templates/>
  </fo:root>
</xsl:template>

Note that when you set the margin-top of the region-body to less than the extent of the region-before, there's a risk that the header and the region body will overlap.

Next, we have to define the flows. There is one static flow for the header, and one regular flow for the body. In my stylesheet, I chose to output the page-sequence FO from a template matching the document element. First to be defined is the header flow:

<xsl:template match="TEI.2">
  <fo:page-sequence master-reference="the-only-page-type">
    <fo:static-content flow-name="xsl-region-before">
      <fo:block
        font-family="geneva, sans-serif"
        font-size="8pt"
        text-align="left"
        border-bottom="solid 1pt black"
     >
        <xsl:value-of select="teiHeader//sourceDesc"/>
      </fo:block>
    </fo:static-content>

The content of this flow is a single block. It sets a sans serif font of 8 points aligned to the left, and a thin rule underneath. The text to be displayed here is taken from the metadata block, the content of a sourceDesc element (i.e., the source of the document.

This is followed by the body flow object:

    <fo:flow flow-name="xsl-region-body">
      font-family="didot, serif"
      font-size="10pt"
    >
      <xsl:apply-templates select="teiHeader//title"/>
      <xsl:apply-templates select="teiHeader//author"/>
      <xsl:apply-templates select="text/body"/>
    </fo:flow>
  </fo:page-sequence>
</xsl:template>

The main flow formats the title, author, and finally the body of the document, with apply-templates statements.

All that remains is to write templates for the other element types we wish to format. These are included in the complete XSLT stylesheet listing in Example 8-3.

Example 8-3. XSLT stylesheet for the TEI-XML document
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:fo="http://www.w3.org/1999/XSL/Format"
                version="1.0"
>
<xsl:output method="xml"/>

<!-- Start the FO tree and layout the page -->

<xsl:template match="/">
  <fo:root>
    <fo:layout-master-set>
      <fo:simple-page-master 
            master-name="the-only-page-type"
            page-height="11in" page-width="8.5in"
            margin-top="1in" margin-bottom="1in"
            margin-left="1.2in" margin-right="1.2in">
        <fo:region-body margin-top="20mm"/>
        <fo:region-before extent="0.3in"/>
      </fo:simple-page-master>
    </fo:layout-master-set>
    <xsl:apply-templates/>
  </fo:root>
</xsl:template>

<!-- Begin the flows -->

<xsl:template match="TEI.2">
  <fo:page-sequence master-reference="the-only-page-type">
    <fo:static-content flow-name="xsl-region-before">
      <fo:block
        font-family="geneva, sans-serif"
        font-size="8pt"
        text-align="left"
        border-bottom="solid 1pt black"
     >
        <xsl:value-of select="teiHeader//sourceDesc"/>
      </fo:block>
    </fo:static-content>
    <fo:flow flow-name="xsl-region-body"
      font-family="didot, serif"
      font-size="10pt"
   >
      <xsl:apply-templates select="teiHeader//title"/>
      <xsl:apply-templates select="teiHeader//author"/>
      <xsl:apply-templates select="text/body"/>
    </fo:flow>
  </fo:page-sequence>
</xsl:template>

<!-- Render the title in bold to stand out from the sonnet. -->

<xsl:template match="teiHeader//title">
  <fo:block
    font-weight="bold"
    space-before="2mm"
    space-after="4mm"
 >
    <xsl:apply-templates/>
  </fo:block>
</xsl:template>

<!-- Format the author's name in italic to set it apart
     from the sonnet itself. Add the word "by" for a
     further decorative touch. 
-->

<xsl:template match="teiHeader//author">
  <fo:block
    font-style="italic"
    space-after="4mm"
 >
    <xsl:text>by </xsl:text>
    <xsl:apply-templates/>
  </fo:block>
</xsl:template>

<xsl:template match="text/body">
  <xsl:apply-templates/>
</xsl:template>

<!-- Put the lines in one block to group them together. Later, we
     might want to do something special with this grouping, such
     as to surround it in a border. Hint: we use "select" here so that
     we can use position() later to select lines for indenting.
-->

<xsl:template match="lg">
  <fo:block>
    <xsl:apply-templates select="l"/>
  </fo:block>
</xsl:template>

<!-- As a nice formatting effect, we will indent or space certain
     lines. Every other line will be indented 4mm. Add space after
     every fourth line. The final two lines are traditionally 
     indented in sonnets. 
-->

  <xsl:template match="l">
    <fo:block>
      <xsl:if test="position() mod 4 = 0">
        <xsl:attribute name="space-after">4mm</xsl:attribute>
      </xsl:if>
      <xsl:if test="position()> 11 or
                    position() mod 2 = 0">
        <xsl:attribute name="start-indent">5mm</xsl:attribute>
      </xsl:if>
      <xsl:apply-templates />
    </fo:block>
  </xsl:template>

</xsl:stylesheet>

Now, run the XSLT engine on the document using this stylesheet and you will get the result file in Example 8-4. (I've added some space in places to make it more readable).

Example 8-4. Formatting object tree of the TEI document
<?xml version="1.0"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <fo:layout-master-set>
    <fo:simple-page-master master-name="the-only-page-type"
          page-height="11in" page-width="8.5in" margin-top="1in"
          margin-bottom="1in" margin-left="1.2in" margin-right="1.2in">
      <fo:region-body margin-top="20mm"/>
      <fo:region-before extent="0.3in"/>
    </fo:simple-page-master>
  </fo:layout-master-set>
  <fo:page-sequence master-reference="the-only-page-type">
  <fo:static-content flow-name="xsl-region-before">
    <fo:block font-family="geneva, sans-serif" font-size="8pt" 
          text-align="left" border-bottom="solid 1pt black">
      Shakespeare's Sonnets XVIII.
    </fo:block>
  </fo:static-content>
  <fo:flow flow-name="xsl-region-body" 
        font-family="didot, serif" font-size="10pt">
    <fo:block font-weight="bold" space-before="2mm" space-after="4mm">
      Shall I Compare Thee to a Summer's Day?
    </fo:block>
    <fo:block font-style="italic" space-after="4mm">
      by William Shakespeare
    </fo:block>
    <fo:block>
      <fo:block>
Shall I compare thee to a summer's day?
      </fo:block>
      <fo:block start-indent="5mm">
Thou art more lovely and more temperate:
      </fo:block>
      <fo:block>
Rough winds do shake the darling buds of May,
      </fo:block>
      <fo:block space-after="4mm" start-indent="5mm">
And summer's lease hath all too short a date:
      </fo:block>
    <!-- Rest removed because it's more of the same -->
    </fo:flow>
  </fo:page-sequence>
</fo:root>

I formatted this with FOP and got a PDF file as output. Figure 8-8 shows how it looks in Acrobat Reader.

Figure 8-8. The sonnet in PDF
figs/lx2_0808.gif