Use Apache's FOP engine together with XSL-FO to generate PDF output.
Apache's Formatting Objects Processor (FOP, available at http://xml.apache.org/fop/) is an open source Java application that reads an XSL-FO (http://www.w3.org/TR/xsl/) tree and renders the result primarily as PDF, although other formats are possible, such as Printer Control Language (PCL), PostScript (PS), Scalable Vector Graphics (SVG), an area tree representation of XML, Java Abstract Windows Toolkit (AWT), FrameMaker's Maker Interchange Format (MIF), and text.
XSL-FO defines formatting objects that help describe blocks, paragraphs, pages, tables, and such. These formatting objects are aided by a large set of formatting properties that control things such as fonts, text alignment, spacing, and the like, many of which match the properties used in CSS (http://www.w3.org/Style/CSS/). XSL-FO's formatting objects and properties provide a framework for creating attractive, printable pages.
XSL-FO is a huge, richly detailed XML vocabulary for formatting documents for presentation. XSL-FO is the common name for the XSL specification produced by the W3C. The spec is nearly 400 pages long. At one time, XSL-FO and XSLT (finished spec is less than 100 pages) were part of the same specification, but they split into two specs in April 1999. XSLT became a recommendation in November 1999, but XSL-FO did not achieve recommendation status until October 2001.
To get you started, we'll go over a few simple examples. The first example, time.fo, is a XSL-FO document that formats the contents of the elements in time.xml:
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-reference="Time" page-height="11in" page-width="8.5in" margin-top="1in" margin-bottom="1in" margin-left="1in" margin-right="1in"> <fo:region-body margin-top=".5in"/> <fo:region-before extent="1.5in"/> <fo:region-after extent="1.5in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-name="Time"> <fo:flow flow-name="xsl-region-body"> <!-- Heading --> <fo:block font-size="24px" font-family="sans-serif" line-height="26px" space-after.optimum="20px" text-align="center" font-weight="bold" color="#0050B2">Time</fo:block> <!-- Blocks for hour/minute/second/atomic status --> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start">Hour: 11 </fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start">Minute: 59</fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start">Second: 59</fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start">Meridiem: p. m.</fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start">Atomic? true</fo:block> </fo:flow> </fo:page-sequence> </fo:root>
The root element of an XSL-FO document is (surprise) root. The namespace name is http://www.w3.org/1999/XSL/Format, and the conventional prefix is fo. Following root is the layout-master-set element where basic page layout is defined. simple-page-master holds a few formatting properties such as page-width and page-height, and some margin settings (you could use page-sequence-master for more complex page layout, in place of simple-page-master). The region-related elements such as region-body are used to lay out underlying regions of a simple page master. The master-reference attribute links with the master-name attribute on the page-sequence element.
The page-sequence element contains a flow element that essentially contains the flow of text that will appear on the page. Following that are a series of block elements, each of which has properties for the text they contain (blocks are used for formatting things such as headings, paragraphs, and figure captions). Properties specify formatting such as the font size, font family, text alignment, and so forth.
FOP is pretty easy to use. To generate a PDF from this XSL-FO file, download and install FOP from http://xml.apache.org/fop/download.html. At the time of this writing, FOP is at Version 20.5. In the main directory, you'll find a fop.bat file for Windows or a fop.sh file for Unix. You can run FOP using these scripts.
To create a PDF from time.fo, enter this command:
fop time.fo time-fo.pdf
time.fo is the input file, and time-fo.pdf is the output file. FOP will let you know of its progress with a report such as this:
[INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [INFO] FOP 0.20.5 [INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [INFO] building formatting object tree [INFO] setting up fonts [INFO] [1] [INFO] Parsing of document complete, stopping renderer
Figure 6-21 shows the result of formatting time.fo with FOP in Adobe Acrobat.
You also can incorporate XSL-FO markup into an XSLT stylesheet, then transform and format a document with just one FOP command. Here is a stylesheet (time-fo.xsl) that incorporates XSL-FO:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="xml" encoding="utf-8" indent="yes"/> <xsl:template match="/"> <fo:root> <fo:layout-master-set> <fo:simple-page-master master-reference="Time" page-height="11in" page-width="8.5in" margin-top="1in" margin-bottom="1in" margin-left="1in" margin-right="1in"> <fo:region-body margin-top=".5in"/> <fo:region-before extent="1.5in"/> <fo:region-after extent="1.5in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-name="Time"> <fo:flow flow-name="xsl-region-body"> <xsl:apply-templates select="time"/> </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> <xsl:template match="time"> <!-- Heading --> <fo:block font-size="24px" font-family="sans-serif" line-height="26px" space-after.optimum="20px" text-align="center" font-weight="bold" color="#0050B2"> Time </fo:block> <!-- Blocks for hour/minute/second/atomic status --> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start"> Hour: <xsl:value-of select="hour"/> </fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start"> Minute: <xsl:value-of select="minute"/> </fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start"> Second: <xsl:value-of select="second"/> </fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start"> Meridiem: <xsl:value-of select="meridiem"/> </fo:block> <fo:block font-size="12px" font-family="sans-serif" line-height="16px" space-after.optimum="10px" text-align="start"> Atomic? <xsl:value-of select="atomic/@signal"/> </fo:block> </xsl:template> </xsl:stylesheet>
The same XSL-FO markup you saw in time.fo is interspersed with templates and instructions that transform time.xml. Now with this command, you can generate a PDF like the one you generated with time.fo:
Fop -xsl time-fo.xsl -xml time.xml -pdf time-fo.pdf
For more information on these technologies, see Michael Fitzgerald's Learning XSLT (O'Reilly) or Dave Pawson's XSL-FO (O'Reilly).
?Michael Fitzgerald