13.1 Creating a Report

After you've decided you need to produce printable output, the next step is to decide what format to produce it and what tools to use to create it. As Andrew Tanenbaum said in his classic quote from the first edition of his book, Computer Networks, "The nice thing about standards is that you have so many to choose from; furthermore, if you do not like any of them, you can just wait for next year's model." He might have foreseen web reporting!

So, there are many things to think about in deciding on reporting tools and formats:


Middle-tier platform

What platform is your middle-tier installed on? If it's Microsoft Windows, a choice for creating reports is Microsoft Word and a portable format to produce is RTF (Rich Text Format). If it's a Unix environment, PostScript can be produced with several tools and is a well-supported format by Unix users. However, for almost all platforms, Adobe's PDF (Portable Document Format) has a wide range of tools and libraries for production.


Client platform

What platform do your users use? The answer is most likely to be mostly Microsoft Windows, and so a format that's friendly to those users is essential. Importantly, reporting tools are similar to browsers: you are unlikely to have control over the environment the user has, and the best approach is to choose a format that is likely to be used by the majority.


Richness of content

What features do you need? Are you producing reports that contain images, text, graphics, tables, forms, graphs, or a combination of those? Do you only need to produce a printable copy of the web page? The answers determine if you can use a simple library (or a template) for text and tables, or whether you need the full power of tools that can create pixels and lines.


Speed

How fast does reporting have to be? There are several easy-to-use tools that are slow to create a report file, and several hard-to-use tools that are fast. However, most tools allow you to save output in a file or database so that it can be delivered to many clients without recreating the report.


Price

Do you want to pay? Are you prepared to purchase tools for reporting, or do you want free or open source software?


Flexibility

Do you need to be flexible? Do you want to offer more than one format to minimize the chance that a user will need to install a third-party tool?

We discuss these issues in the remainder of this section.

13.1.1 Formats

There are many possible formats for reports, and this section discusses most of the popular choices for web reporting.

13.1.1.1 Portable Document Format (PDF)

Adobe's Portable Document Format (PDF) is a well-documented, well-understood and powerful format for reporting. It's now the dominant reporting format on the Web and we use it in this chapter because it meets most of our criteria in the previous section:

  • It's ideal for reporting because it supports a wide range of fonts, colors, and graphics. Moreover, it doesn't matter what tools are used to create or view a report, it'll produce the same, high-quality output.

  • It's portable. Adobe's free PDF viewer (known as Adobe Reader) is available for almost all platforms, including Mac OS X, Linux, Free BSD, Solaris, all Microsoft Windows variants, Pocket PC, and Palm. There are also Open Source viewers available such as xpdf and ghostview.

  • It's full of features. It's simple to use, but it's also powerful: fonts can be embedded in a document, it can be combined with XML markup (which is discussed later in this section), embedded links can be included, forms are easy to integrate, and multimedia can be linked in. Adobe's distiller (a commercial product) is a powerful tool for creating PDFs, and it also allows you to create templates that you can later populate with data.

  • It's used by very large organizations. For example, the U.S. government (including the IRS) delivers most of its documents to its users in PDF, as do newswire services such as Associated Press (AP). This means most of your users will already be familiar with the format.

  • It's flexible for the Web. You can deliver one page from a large document and it can be rendered at the client without retrieving the rest of the document. (However, this requires some configuration that we don't discuss.)

  • There's a wide variety of tools to produce it. We discuss this next.

You can read the PDF specification at http://partners.adobe.com/asn/tech/pdf/specifications.jsp.

There are two major external libraries that can be used to create PDF with PHP: PDFlib (available from http://www.pdflib.com/) and ClibPDF (available from http://www.fastio.com/). Both are function libraries that integrate into PHP, but both need to be downloaded, purchased (if you're doing commercial work), and configured, and then PHP needs to be recompiled to support them. The integration process is sometimes tricky, but good notes on the process can be found in the user-contributed comments in the online PHP manual. At the time of writing, PDFlib was more popular.

Both PDFlib and ClibPDF allow creation of low- and high-level report features. For example, you can create a text-only document using a few lines of code, or you can draw lines and shapes by moving a cursor with tens or hundreds of lines of code in a complex program. Both libraries also allow you to include external graphics in reports, and to use almost all of the features of PDF.

Because both function libraries are commercial products and require integration, we favor other, free solutions that are now becoming popular. Later in this chapter, we show you how to use the R&OS PDF class library. It's almost as powerful as PDFlib, and we show you how to use it create and format documents that contains tables, images, and reports.

There are also other, simpler libraries. For example, RustyPart's HTML_ToPDF is a simple tool to turn your HTML page into a PDF document for printing, and it makes use of freely available tools to carry out the process. You can find out more from http://www.rustyparts.com/pdf.php.

13.1.1.2 Rich Text Format (RTF)

Microsoft's Rich Text Format (RTF) is an interchange format for documents. Similarly to PDF, it's an open standard that's implemented in a wide range of tools on many platforms. For example, Microsoft Word can save and read documents in RTF format, as can tools such as the writers in OpenOffice, StarOffice, and most commercial word processors. However, much like HTML, there's no guarantee that an RTF document will look the same in a different word processor or on a different platform.

Reports in RTF are different from those in PDF. An RTF format document is designed to be opened, edited, and manipulated in the same way as any other word processor document. It's therefore a good format for reports that need to be edited or documents that need to be exchanged, but it's not a good format when you want to produce a report that's the same on all platforms. However, as a reporting format, it's preferable to Microsoft Word's proprietary .doc binary format.

You can find out more about the RTF specification from http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnrtfspec/html/rtfspec.asp

13.1.1.3 PostScript

Adobe's PostScript format is a printer language. Most laser printers understand PostScript, and can convert a PostScript description into a high-quality printout. PostScript has within it tools to control whether printing is simplex or duplex, what paper to use, and even whether to staple. It's not designed for users in the same way as Adobe's PDF: for example, it doesn't support hypertext-style linking, embedding of sounds and movies, or pages being downloaded individually.

Despite its focus as a printer language, most Unix users are familiar with PostScript and happy with it as a report format. Tools such as GhostView (or GSView or ggv) are commonly installed on Unix platforms, and do a good job of rendering PostScript documents on a screen. Adobe's Reader and Mac OS X's Preview also display PostScript documents.

You can find out more about the PostScript language from http://partners.adobe.com/asn/tech/ps/index.jsp.

13.1.1.4 HTML and XML

Perhaps the most obvious report type for a web database application is the web page itself.

This works as follows: using PHP code in an application you produce HTML, it's sent to the user, the user's browser renders the page, and (in most browsers) the user can then print the page directly. But despite its simplicity, this doesn't work well for most reporting: different browsers render pages differently, window width and depth doesn't usually align with paper width and depth, and there's no guarantee that colors, fonts, and images will transpose well into the printed environment. However, as discussed previously, there are some good tools available to convert HTML to PDF for printing.

So far in this section, we've described several different formats in which documents or reports are described using a language or markup. The Extensible Markup Language, XML, is another markup language designed to identify structure in text and it is a sibling of HTML (their parent is SGML). XML is conceptually simple, yet developers have found uses for it in a wide range of applications:


Storing content in large and dynamic web sites

Storing content marked-up with XML can make content re-use and management much easier.


Standardizing transporting data between applications

When applications are difficult to integrate, XML provides a common protocol that allows data to be shared.


To define new standards

Scalable Vector Graphics (SVG) and XSL-Flow Objects (XSL-FO) are both examples of standards that are represented with XML. It's also used in conjunction with PDF to, for example, markup forms within a document.


As a component for other technologies

The Simple Object Access Protocol (SOAP) provides a mechanism for manipulating objects over a wide area network?such as the Web?using XML to encode the object messages.

Much like RTF, XML is a possible choice for a reporting format (and for many other tasks): it's powerful and independent of presentation, platform, and operating system. PHP has excellent XML support, and this has been completely redeveloped in PHP5. However, a detailed discussion of XML is outside the scope of this book.

13.1.1.5 Email and plain text

Plain text without markup is a simple report format, as is a plain text email to a user. What's more, text is compact, easy to format, and fast to send by email or to a browser. However, you have even less control than with HTML over presentation or printing, and it's unlikely to be an effective way to layout information except for the shortest reports. Despite this, as we show in Chapter 19, email receipts are still a useful reporting tool to acknowledge actions in a web database application.