Using DBI in Web Applications

The DBI scripts developed thus far have been designed for use from the shell in a command-line environment, but DBI is useful in other contexts as well, such as in the development of Web-based applications. When you write DBI scripts that can be invoked from a Web browser, you open up new and interesting possibilities for interacting with your databases. For example, if you display data in tabular form, you can easily turn each column heading into a link that you can select to re-sort the data on that column. This allows you to view your data in a different way with a single click, without entering any queries. Or you can provide a form into which a user can enter criteria for a database search, and then display a page containing the results of the search. Simple capabilities like this can dramatically alter the level of interactivity you provide for accessing the contents of your databases. In addition, Web browser display capabilities typically are better than what you get with a terminal window, so you can create nicer-looking output as well.

In this section, we'll create the following Web-based scripts:

  • A general browser for the tables in the sampdb database. This isn't related to any specific task we want to accomplish with the database, but it illustrates several Web programming concepts and provides a convenient means of seeing what information the tables contain.

  • A score browser allowing us to see the scores for any given quiz or test. This is handy as a quick means of reviewing grade event results for the grade-keeping project, and it's useful when we need to establish the grading curve for a test so we can mark papers with letter grades.

  • A script to find Historical League members who share a common interest. This is done by allowing the user to enter a search phrase, and then searching the interests column of the member table for that phrase. We already wrote a command-line script, interests.pl, to do this in the earlier section "Finding Historical League Members with Common Interests." But the command-line version can be executed only by people on the machine where it is installed. Providing a Web-based version opens up the directory to anyone who has a Web browser. Having another version also provides an instructive point of reference, allowing comparison of multiple approaches to the same task. (Actually, we'll develop two Web-based implementations. One is based on pattern matching, just like interests.pl. The other performs FULLTEXT searches.)

To write these scripts, we'll use the CGI.pm Perl module, which provides an easy way to link DBI to the Web. (For instructions on getting the CGI.pm module, see Appendix A.) CGI.pm is so called because it helps you write scripts that use the Common Gateway Interface protocol that defines how a Web server communicates with other programs. CGI.pm handles the details involved in a number of common housekeeping tasks, such as collecting the values of parameters passed as input to your script by the Web server. CGI.pm also provides convenient methods for generating HTML output, which reduces the chance of writing out malformed HTML compared to writing raw HTML tags yourself.

You'll learn enough about CGI.pm in this chapter to write your own Web applications, but of course not all of its capabilities are covered. To learn more about this module, see Official Guide to Programming with CGI.pm, by Lincoln Stein (John Wiley, 1998), or check the online documentation at http://stein.cshl.org/WWW/software/CGI/.

Another text covering CGI.pm that's specifically targeted to MySQL and DBI is my book MySQL and Perl for the Web (New Riders, 2000).

The Web-based scripts described in the remainder of this chapter are found under the perlapi/web directory of the sampdb distribution.

Setting up Apache for CGI Scripts

In addition to DBI and CGI.pm, there's one more component we need for writing Web-based scripts?a Web server. The instructions here are geared toward using scripts with the Apache server, but you should be able to use a different server if you like by adapting the instructions a bit.

I will assume here that the various parts of your Apache installation are located under /usr/local/apache for UNIX and under C:\Apache for Windows. For our purposes, the most important subdirectories of the Apache top-level directory are htdocs (for the HTML document tree), cgi-bin (for executable scripts and programs to be invoked by the Web server), and conf (for configuration files). These directories may be located somewhere else on your system. If so, make the appropriate adjustments to the following notes.

You should verify that the cgi-bin directory is not located within the Apache document tree. This is a safety precaution that prevents clients from requesting the source code for your scripts as plain text. You don't want malicious clients to be able to examine your scripts for security holes by siphoning off the text of the scripts and studying them.

To install a CGI script for use with Apache, copy it to your cgi-bin directory. Under UNIX, the script must begin with a #! line and have its mode set to be executable, just as for a command line script. In addition, it's a good idea to set the script to be owned by the user that Apache runs as and to be accessible only to that user. For example, if Apache runs as a user named www, use the following commands to make a script named myscript.pl owned by and executable and readable only by that user:

# chown www myscript.pl 
# chmod 500 myscript.pl

You may need to run these commands as root. If you don't have permission to install scripts in the cgi-bin directory, ask your system administrator to do so on your behalf.

Under Windows, the chown and chmod commands are unnecessary, but the script should still begin with a #! line. The line can list the full pathname to your Perl program. For example, if Perl is installed as C:\Perl\bin\perl.exe, the #! line can be written as follows:

#! C:/Perl/bin/perl -w 

Alternatively, on Windows NT-based systems, you can write the line more simply as follows if your PATH environment variable is set to include the directory in which Perl is installed:

#! perl -w 

The Perl scripts in the sampdb distribution all specify the pathname of Perl on the #! line as /usr/bin/perl. You'll need to modify each script as necessary for your own system.

After a script has been installed in the cgi-bin directory, you can request it from your browser by sending the appropriate URL to your Web server. For example, if the Web server host is www.snake.net, you would request myscript.pl from it using a URL like the following:

http://www.snake.net/cgi-bin/myscript.pl

Requesting a script with your browser causes it to be executed by the Web server. The script's output is sent back to you, and the result appears as a page in your browser.

When you run DBI scripts from the command line, warnings and error messages go to your terminal. In a Web environment, there is no terminal, so these messages go to the Apache error log. You should determine where this log is located because it can provide useful information to help debug your scripts. On my system, it's the error_log file in the logs directory under the Apache root, /usr/local/apache. It may be somewhere else on your system. The location of the log is determined by the ErrorLog directive in the httpd.conf configuration file, which is located in Apache's conf directory.

A Brief Introduction to CGI.pm

To write a Perl script that uses the CGI.pm module, put a use CGI statement near the beginning of the script that imports the module's function names. The standard set of the most commonly used functions can be imported as follows:

use CGI qw(:standard); 

Then you can invoke CGI.pm functions to produce various kinds of HTML structures. In general, the function names are the same as the corresponding HTML elements. For example, to produce a level 1 header and a paragraph, invoke h1() and p():

print h1 ("This is a header"); 
print p ("This is a paragraph");

CGI.pm also supports an object-oriented style of use that allows you to invoke its functions without importing the names. To do this, include a use statement and create a CGI object:

use CGI; 
my $cgi = new CGI;

The object gives you access to CGI.pm functions, which you invoke as methods of the object:

print $cgi->h1 ("This is a header"); 
print $cgi->p ("This is a paragraph");

The object-oriented interface requires that you write the $cgi-> prefix all the time; in this book I'll use the simpler function call interface. However, one disadvantage of the function call interface is that if a CGI.pm function has the same name as a Perl built-in function, you must invoke it in a non-conflicting way. For example, CGI.pm has a function named tr() that produces the <tr> and </tr> tags that surround the cells in a row of an HTML table. That function's name conflicts with the name of Perl's built-in tr transliteration function. To work around this problem when using the CGI.pm function call interface, invoke tr() as Tr() or as TR(). When you use the object-oriented interface, this problem does not occur because you invoke tr() as a method of your $cgi object (that is, as $cgi->tr()), which makes it clear that you're not referring to the built-in function.

Checking for Web Input Parameters

One of the things CGI.pm does for you is to take care of all the ugly details involved in collecting input information provided by the Web server to your script. All you need to do to get that information is invoke the param() function. You can get the names of all available parameters as follows:

my @param = param (); 

To retrieve the value of a particular parameter, pass its name to param(). If the parameter is set, param() will return its value, or undef if it isn't set:

my $my_param = param ("my_param"); 
if (defined ($my_param))
{
    print "my_param value: $my_param\n";
}
else
{
    print "my_param is not set\n";
}
Writing Web Output

Many of CGI.pm's functions are used for generating output to be sent to the client browser. Consider the following HTML document:

<html> 
<head>
<title>My Simple Page</title>
</head>
<body bgcolor="white">
<h1>Page Heading</h1>
<p>Paragraph 1.</p>
<p>Paragraph 2.</p>
</body>
</html>

The following script uses CGI.pm output functions to produce the equivalent document:

#! /usr/bin/perl -w 
# simple_doc.pl - produce simple HTML page

use strict;
use CGI qw(:standard);

print header ();
print start_html (-title => "My Simple Page", -bgcolor => "white");
print h1 ("Page Heading");
print p ("Paragraph 1.");
print p ("Paragraph 2.");
print end_html ();

The header() function generates a Content-Type: header that precedes the page content. It's necessary to write this header when producing Web pages from scripts to let the browser know what kind of document to expect. (This differs from the way you write static HTML pages, for which it's not necessary to produce a header; the Web server sends one to the browser automatically.) By default, header() writes a header that looks like this:

Content-Type: text/html 

Following the header() invocation are calls to functions that generate the page content. start_html() produces the tags from the opening <html> tag through the opening <body> tag, h1() and p() write the heading and paragraph elements, and end_html() adds the closing document tags.

As illustrated by the start_html() call, many CGI.pm functions allow you to specify named parameters, with each parameter given in -name=>value format. This is advantageous for functions that take many parameters that are optional, because you can specify just those parameters you need, and you can list them in any order.

Use of CGI.pm output-generating functions doesn't preclude you from writing out raw HTML yourself if you want. You can mix the two approaches, combining calls to CGI.pm functions with print statements that generate literal tags. However, one of the advantages of using CGI.pm to generate output instead of writing HTML yourself is that you can think in logical units rather than in terms of individual markup tags, and your HTML is less likely to contain errors. (The reason I say "less likely" is that CGI.pm won't prevent you from doing bizarre things, such as including a list inside of a heading.)

CGI.pm also provides some portability advantages that you don't get by writing your own HTML. For example, as of version 2.69, CGI.pm automatically writes XHTML output. If you're using an older version of CGI.pm that writes plain HTML, you can upgrade your scripts to start writing XHTML instead; all you need to do is update CGI.pm itself.

XHTML is similar to HTML but has a more well-defined format. HTML is easy to learn and use, but one of its problems is that browser implementations tend to differ in how they interpret HTML. For example, they are forgiving of malformed HTML in different ways. This means that a not-quite-correct page may display properly in one browser but incorrectly in another. XHTML's requirements are stricter, to help ensure that documents are well formed. Some of the differences between HTML and XHTML are as follows:

  • Unlike HTML, every opening tag in XHTML must have a closing tag. For example, paragraphs are written using <p> and </p> tags, but the closing </p> tag often is omitted in HTML documents. In XHTML, the </p> tag is required. For HTML tags that don't have any body, such as <br> and <hr>, the XHTML requirement that all tags be closed leads to ungainly constructs like <br></br> and <hr></hr>. To deal with this, XHTML allows single-tag shortcut forms (<br/>, <hr/>) that serve for both the opening and closing tags. However, older browsers that see tags like these will sometimes mistake the tag names as br/ and hr/. Inserting a space before the slash and writing the tags as <br /> and <hr /> helps to minimize the occurrence of such problems.

  • In HTML, tag and attribute names are not case sensitive. For example, <tr> and <TR> are the same. In XHTML, tag and attribute names should be lowercase, so only <tr> is allowable.

  • HTML attribute values can be unquoted or even missing. For example, the following table data cell construct is legal in HTML:

    <td width=40 nowrap>Some text</td> 
    

    In XHTML, attributes must have values, and they must be quoted. A common convention for HTML attributes that normally are used without a value is to use the attribute name as its value. The XHTML equivalent of the preceding <tr> element looks like the following:

    <td width="40" nowrap="nowrap">Some text</td> 
    

All the Web scripts in this book generate output that conforms to XHTML rules. In this chapter, we'll rely on CGI.pm to generate properly-formatted XHTML markup. The scripts discussed in Chapter 8 will generate the markup tags for themselves, because PHP doesn't provide tag-generating functions the way CGI.pm does.

Escaping HTML and URL Text

If text that you write to a Web page may contain special characters, you should make sure they are escaped properly by processing the text with escapeHTML(). This is also true when you construct URLs that may contain special characters, although in that case you should use the escape() function instead. It's important to use the appropriate encoding function because each one treats different sets of characters as special and encodes special characters using formats that differ from one another. Consider the following short Perl script, escape_demo.pl:

#! /usr/bin/perl -w 
# escape_demo.pl - demonstrate CGI.pm output-encoding functions

use CGI qw(escapeHTML escape);  # import escapeHTML() and escape()

$s = "x<=y, right?";
print escapeHTML ($s) . "\n";   # encode for use as HTML text
print escape ($s) . "\n";       # encode for use in a URL

The script encodes the string $s using each function and prints the result. When you run it, the script produces the following output, from which you can see that encoding conventions for HTML text are not the same as encoding for URLs:

x&lt;=y, right? 
x%3C%3Dy%2C%20right%3F

The escape_demo.pl script imports the names of the encoding functions in the use CGI statement. They are not included in the standard set of functions, so you'll need to import them even if you also import the standard set, like this:

use CGI qw (:standard escapeHTML escape); 
Writing Multiple-Purpose Pages

One of the primary reasons to write Web-based scripts that generate HTML instead of writing static HTML documents is that a script can produce different kinds of pages depending on the way it's invoked. All of the CGI scripts we're going to write have that property. Each one operates as follows:

  • When you first request the script from your browser, it generates an initial page that allows you to select what kind of information you want.

  • When you make a selection, your browser sends a request back to the Web server that causes the script to be re-invoked. The script then retrieves and displays in a second page the specific information you requested.

The primary problem here is that you want the selection that you make from the first page to determine the contents of the second page, but Web pages normally are independent of one another unless you make some sort of special arrangements. The solution is to have the script generate pages that set a parameter to a value that tells the next invocation of the script what you want. When you first invoke the script, the parameter will have no value; this tells the script to present its initial page. When you indicate what information you'd like to see, the script is invoked again, but this time the parameter will be set to a value that instructs the script what to do.

There are different ways for Web pages to pass instructions to a script. One way is for the page to include a form that the user fills in. When the user submits the form, its contents are submitted to the Web server. The server passes the information along to the script, which can find out what was submitted by invoking the param() function. This is what we'll do to implement keyword searches of the Historical League directory: The search page includes a form into which the user enters the word to search for.

Another way of specifying instructions for a script is to add parameter values to the end of the URL that you send to the Web server when you request the script. This is the approach we'll use for our sampdb table browser and score browser scripts. The way this works is that the script generates a page containing hyperlinks. When you select a link, it invokes the script again, but the link includes a parameter value that instructs the script what to do. In effect, the script invokes itself in different ways to provide different kinds of results, depending on which link you select.

A script can allow itself to be invoked by sending to the browser a page containing a self-referential hyperlink?that is, a link to its own URL. For example, if a script myscript.pl is installed in the Web server's cgi-bin directory, it can produce a page that contains the following link:

<a href="/cgi-bin/myscript.pl">Click Me!</a> 

When the user clicks the text Click Me! in the page, the user's browser sends a request for myscript.pl back to the Web server. Of course, in and of itself, all that will do is cause the script to send out the same page again because no other information is supplied in the URL. However, if you add a parameter to it, that parameter is sent back to the Web server when the user selects the link. The server invokes the script and the script can call param() to detect that the parameter was set and take action according to its value.

To attach a parameter to the end of the URL, add a '?' character followed by a name=value pair indicating the parameter name and its value. For example, to add a size parameter with a value of large, write the URL as follows:

/cgi-bin/myscript.pl?size=large 

To attach multiple parameters, separate them by ';' characters:[1]

[1] CGI.pm also understands '&' as a parameter separator character. Other language APIs for Web programming vary in their conventions, so you'll need to know whether they expect ';' or '&' and construct URLs accordingly.

/cgi-bin/myscript.pl?size=large;color=blue 

To construct a self-referencing URL with attached parameters, a script should begin by calling the url() function to obtain its own URL, and then append parameters to it as follows:

$url = url ();          # get URL for script 
$url .= "?size=large";  # add first parameter
$url .= ";color=blue";  # add second parameter

Using url() to get the script path allows you to avoid hardwiring the path into the code.

To generate a hyperlink, pass the URL to CGI.pm's a() function:

print a ({-href => $url}, "Click Me!"); 

The print statement produces a hyperlink that looks like the following:

<a href="/cgi-bin/url.pl?size=large;color=blue">Click Me!</a> 

The preceding example constructs the value of $url in somewhat cavalier fashion, because it doesn't take into account the possibility that the parameter values or the link label might contain special characters. Unless you're certain that the values and the label don't require any encoding, it's best to use the CGI.pm encoding functions. The escape() function encodes values to be appended to a URL, and escapeHTML() encodes regular HTML text. For example, if the value of the hyperlink label is stored in $label, and the values for the size and color parameters are stored in the variables $size and $color, you can perform the proper encoding as follows:

$url = sprintf ("%s?size=%s;color=%s", 
                url (), escape ($size), escape ($color));
print a ({-href => $url}, escapeHTML ($label));

To see how self-referential URL construction works in the context of an application, consider the following short CGI script, flip_flop.pl. When first invoked, it presents a page called Page A that contains a single hyperlink. Selecting the link invokes the script again, but the link also includes a pageb parameter to tell flip_flop.pl to display Page B. (In this case, we don't care about the parameter's value, just whether or not it's set.) Page B will also contain a link to the script, but without a pageb parameter. This means that selecting the link in Page B causes the original page to be redisplayed. In other words, subsequent invocations of the script flip the page back and forth between Page A and Page B:

#! /usr/bin/perl -w 
# flip_flop.pl - simple multiple-output-page CGI.pm script

use CGI qw(:standard);

my $url;

# determine which page to display based on absence or presence
# of the pageb parameter

if (!defined (param ("pageb"))) # display page A w/link to page B
{
    $this_page = "A";
    $next_page = "B";
    $url = url () . "?pageb=1";
}
else                            # display page B w/link to page A
{
    $this_page = "B";
    $next_page = "A";
    $url = url ();
}

print header ();
print start_html (-title => "Flip-Flop: Page $this_page",
                    -bgcolor => "white");
print p ("This is Page $this_page. To select Page $next_page, "
        . a ({-href => $url}, "click here"));
print end_html ();

Install the script in your cgi-bin directory, and then request it from your browser using the appropriate URL:

http://www.snake.net/cgi-bin/flip_flop.pl

Select the link in the page several times to see how the script alternates the pages that it generates.

Now, suppose another client comes along and starts requesting flip_flop.pl. What happens? Will the two of you interfere with each other? No, because the initial request from each of you will include no pageb parameter, and the script will respond with its initial page. Thereafter, the requests sent by each of you will include or omit the parameter according to which page you currently happen to be viewing. flip_flop.pl generates an alternating series of pages properly for each client, independent of the actions of any other client.

Connecting to the MySQL Server from Web Scripts

The command-line scripts developed in the earlier section "Putting DBI to Work" shared a common preamble for establishing a connection to the MySQL server. Most of our CGI scripts will share some code, too, but it's a little different:

#! /usr/bin/perl -w 
use strict;
use DBI;
use CGI qw(:standard);

use Cwd;
# option file that should contain connection parameters for UNIX
my $option_file = "/usr/local/apache/conf/sampdb.cnf";
my $option_drive_root;
# override file values for Windows
if ($^O =~ /^MSWin/i || $^O =~ /^dos/)
{
    $option_drive_root = "C:/";
    $option_file = "/Apache/conf/sampdb.cnf";
}
# construct data source and connect to server (under Windows, save
# current working directory first, change location to option file
# drive, connect, then restore current directory)
my $orig_dir;
if (defined ($option_drive_root))
{
    $orig_dir = cwd ();
    chdir ($option_drive_root)
        or die "Cannot chdir to $option_drive_root: $!\n";
}
my $dsn = "DBI:mysql:sampdb;mysql_read_default_file=$option_file";
my $dbh = DBI->connect ($dsn, undef, undef,
                        { RaiseError => 1, PrintError => 0 });
if (defined ($option_drive_root))
{
    chdir ($orig_dir)
        or die "Cannot chdir to $orig_dir: $!\n";
}

This preamble differs from the one we used for command-line scripts in the following respects:

  • The first section now contains use CGI and use Cwd statements. The first is for the CGI.pm module. The second is for the module that returns the pathname of the current working directory; it's used in case the script is running under Windows, as described shortly.

  • No connection parameters are parsed from the command-line arguments. Instead, the code assumes they'll be listed in an option file.

  • Instead of using mysql_read_default_group to read the standard option files, we use mysql_read_default_file to read a single file intended specifically for options to be used by Web scripts that access the sampdb database. As shown, the code looks for options stored in /usr/local/apache/conf/sampdb.cnf under UNIX or in C:\Apache\conf\sampdb.cnf under Windows. Note that under Windows the code changes location to the root directory of the drive where the option file is located before connecting and back to the original directory afterward. The rationale for this ugly hack is described in the "Specifying Connection Parameters" section earlier in this chapter.

The sampdb distribution contains a sampdb.cnf file that you can install for use by your DBI-based Web scripts. It looks like this:

[client] 
host=cobra.snake.net
user=sampadm
password=secret

To use the Web-based scripts developed in this chapter on your own system, you should change the option filename in the preamble if you use a different name. You should also list in the option file those values that reflect the MySQL server host and the MySQL account name and password you want to use.

Under UNIX, you should set the option file to be owned by the account used to run Apache and set the file's mode to 400 or 600 so that no other user can read it. This keeps other users who have login accounts on the Web server host from reading the option file directly and prevents one form of security exploit.

Unfortunately, the option file can still be read by other users who can install a script for the Web server to execute. Scripts invoked by the Web server execute with the privileges of the account used for running the Web server. This means that another user who can install a Web script can write the script so that it opens the option file and displays its contents in a Web page. Because that script runs as the Web server user, it will have full permission to read the file, which exposes the connection parameters necessary to connect to MySQL and access the sampdb database. If you don't trust the other users on your machine, you may find it prudent to create a MySQL account that has read-only (SELECT) privileges on the sampdb database, and then list that account's name and password in the sampdb.cnf file, rather than your own name and password. That way you don't risk allowing scripts to connect to your database through a MySQL account that has permission to modify its tables. Chapter 11, "General MySQL Administration," discusses how to create a MySQL user account with restricted privileges. The downside of this strategy is that with a read-only MySQL account, you can write scripts only for data retrieval, not for data entry.

Alternatively, you can arrange to execute scripts under Apache's suEXEC mechanism. This allows you to execute a script as a specific trusted user, and then write the script to get the connection parameters from an option file that is readable only to that user. Another approach is available with Apache 2.x, which makes it possible to arrange for scripts located in particular directories to run with the privileges of a specific user?such as yourself. Then you can set the ownership and mode of sampdb.cnf and your scripts to be accessible only to you; other users won't be able to get at them.

Still another option for writing a script is to have it solicit a username and password from the client user, and then use those values to establish a connection to the MySQL server. This is more suitable for scripts that you create for administrative purposes than for scripts that you provide for general use. In any case, you should be aware that some methods of name and password solicitation are subject to attack by anyone who can put a sniffer on the network between the Web server and your browser, so you may want to set up a secure connection. That is beyond the scope of this book.

As you may gather from the preceding paragraphs, Web script security can be a tricky thing. It's definitely a topic about which you should read more for yourself, because it's a big subject that I really cannot do justice to here. The book MySQL and Perl for the Web mentioned earlier includes a chapter devoted specifically to Web security, including instructions for setting up secure connections using SSL. Other good sources of information are the security material in the Apache manual, and the WWW security FAQ available at http://www.w3.org/Security/Faq/.

A Web-Based Database Browser

Our first Web-based application is a simple script, db_browse.pl, that allows you to see what tables exist in the sampdb database and to examine the contents of any of these tables interactively from your Web browser. The script works like this:

  • When you first request db_browse.pl from your browser, it connects to the MySQL server, retrieves a list of tables in the sampdb database, and sends your browser a page in which each table is presented as a hyperlink. When you select a table name link from this page, your browser sends a request to the Web server asking db_browse.pl to display the contents of that table.

  • If db_browse.pl finds when it's invoked that you've selected a table name, it retrieves the contents of the table and presents the information to your Web browser. The heading for each column of data is the name of the column in the table. Headings are presented as hyperlinks; if you select one of them, your browser sends a request to the Web server to redisplay the same table, but this time sorted by the column you selected.

Before we go any further, you should be aware that although db_browse.pl is instructive in terms of illustrating several useful Web programming concepts, it also represents a security hole that can be exploited by unfriendly visitors to your site. The script is easily fooled into displaying any table that is accessible to the MySQL account named in the sampdb.cnf file. I'll describe how you can trick the script later, but for now what you should know is that you should install this script only if that account can access non-sensitive data. And it's a good idea to remove the script from your cgi-bin directory after you've tried it and understand how it works. (Alternatively, install it on a private server that is not accessible to untrusted users.) Here's a specific example of why db_browse.pl can be a problem. In Chapter 8, we'll write a script that Historical League members can use to edit their membership entries over the Web. Access to the entries is controlled through passwords that are stored in a member_pass table. Having db_browse.pl enabled at that point would allow anyone to look through the password table, and thus gain access to the information necessary to edit any member table entry.

Okay, assuming that you haven't been spooked by the preceding dire warnings, let's see how db_browse.pl works. The main body of the script puts out the initial part of the Web page and then checks the tbl_name parameter to see whether or not it's supposed to display some particular table:

#! /usr/bin/perl -w 
# db_browse.pl - Allow sampdb database browsing over the Web

use strict;
use DBI;
use CGI qw (:standard escapeHTML escape);

# ... set up connection to database (not shown) ...

my $db_name = "sampdb";

# put out initial part of page
my $title = "$db_name Database Browser";
print header ();
print start_html (-title => $title, -bgcolor => "white");
print h1 ($title);

# parameters to look for in URL
my $tbl_name = param ("tbl_name");
my $sort_col = param ("sort_col");

# If $tbl_name has no value, display a clickable list of tables.
# Otherwise, display contents of the given table.  $sort_col, if
# set, indicates which column to sort by.

if (!defined ($tbl_name))
{
    display_table_names ($dbh, $db_name)
}
else
{
    display_table_contents ($dbh, $tbl_name, $sort_col);
}

print end_html ();

It's easy to find out what value a parameter has because CGI.pm does all the work of finding out what information the Web server passes to the script. We need only call param() with the name of the parameter in which we're interested. In the main body of db_browse.pl, that parameter is tbl_name. If it's not set, this is the initial invocation of the script and it displays the table list. Otherwise, it displays the contents of the table named by the tbl_name parameter, sorted by the column named in the sort_col parameter.

The display_table_names() function generates the initial page. display_table_names() retrieves the table list and writes out a bullet list in which each item is the name of a table in the sampdb database:

sub display_table_names 
{
my ($dbh, $db_name) = @_;

    print p ("Select a table by clicking on its name:");

    # retrieve reference to single-column array of table names
    my $ary_ref = $dbh->selectcol_arrayref (qq{ SHOW TABLES FROM $db_name });

    # Construct a bullet list using the ul() (unordered list) and
    # li() (list item) functions.  Each item is a hyperlink that
    # re-invokes the script to display a particular table.
    my @item;
    foreach my $tbl_name (@{$ary_ref})
    {
        my $url = sprintf ("%s?tbl_name=%s", url (), escape ($tbl_name));
        my $link = a ({-href => $url}, escapeHTML ($tbl_name));
        push (@item, li ($link));
    }
    print ul (@item);
}

The li() function adds <li> and </li> tags around each list item and ul() adds the <ul> and </ul> tags around the set of items. Each table name in the list is presented as a hyperlink that reinvokes the script to display the contents of the named table. The resulting list generated by display_table_names() looks like this:

<ul> 
<li><a href="/cgi-bin/localhost/db_browse.pl?tbl_name=absence">absence</a></li>
<li><a href="/cgi-bin/localhost/db_browse.pl?tbl_name=event">event</a></li>
<li><a href="/cgi-bin/localhost/db_browse.pl?tbl_name=member">member</a></li>
...
</ul>

If the tbl_name parameter has a value when db_browse.pl is invoked, the script passes that value to display_table_contents(), along with the name of the column to sort the results by:

sub display_table_contents 
{
my ($dbh, $tbl_name, $sort_col) = @_;
my @rows;
my @cells;
    # if sort column not specified, use first column
    $sort_col = "1" if !defined ($sort_col);

    # present a link that returns user to table list page
    print p (a ({-href => url ()}, "Show Table List"));

    print p (strong ("Contents of $tbl_name table:"));

    my $sth = $dbh->prepare (qq{
                SELECT * FROM $tbl_name ORDER BY $sort_col
                LIMIT 200
            });
    $sth->execute ();

    # Use the names of the columns in the database table as the
    # headings in an HTML table.  Make each name a hyperlink that
    # causes the script to be reinvoked to redisplay the table,
    # sorted by the named column.

    foreach my $col_name (@{$sth->{NAME}})
    {
        my $url = sprintf ("%s?tbl_name=%s;sort_col=%s",
                            url (),
                            escape ($tbl_name),
                            escape ($col_name));
        my $link = a ({-href => $url}, escapeHTML ($col_name));
        push (@cells, th ($link));
    }
    push (@rows, Tr (@cells));

    # display table rows
    while (my @ary = $sth->fetchrow_array ())
    {
        @cells = ();
        foreach my $val (@ary)
        {
            # display value if non-empty, else display non-breaking space
            if (defined ($val) && $val ne "")
            {
                $val = escapeHTML ($val);
            }
            else
            {
                $val = "&nbsp;";
            }
            push (@cells, td ($val));
        }
        push (@rows, Tr (@cells));
    }

    # display table with a border
    print table ({-border => "1"}, @rows);
}

If no column was named, display_table_contents() adds an ORDER BY 1 clause to the query to sort the results using the first column in the table. The query also includes a LIMIT 200 clause as a simple precaution against the script sending huge amounts of data to your browser. (That's not likely to happen for the tables in the sampdb database, but it might occur if you adapt the script to display the contents of tables in other databases.) display_table_contents() shows the rows from the table as an HTML table, using the th() and td() functions to produce table header and data cells, Tr() to group cells into rows, and table() to produce the <table> tags that surround the rows.

The HTML table presents column headings as hyperlinks that redisplay the database table; these links include a sort_col parameter that explicitly specifies the column to sort on. For example, for a page that displays the contents of the event table, the column heading links look like the following:

<a href="/cgi-bin/db_browse.pl?tbl_name=event&sort_col=date">date</a> 
<a href="/cgi-bin/db_browse.pl?tbl_name=event&sort_col=type">type</a>
<a href="/cgi-bin/db_browse.pl?tbl_name=event&sort_col=event_id">event_id</a>

display_table_contents() uses a little trick of turning empty values into a non-breaking space (&nbsp;). In a bordered table, some browsers won't display borders for empty cells properly; putting a non-breaking space in the cell fixes that problem.

If you want to write a more general script, you could alter db_browse.pl to browse multiple databases. For example, you could have the script begin by displaying a list of databases on the server, rather than a list of tables within a particular database. Then you could pick a database to get a list of its tables and go from there.

Near the beginning of this section, I mentioned that the db_browse.pl script is easily fooled into displaying any table that is accessible to the sampadm account through which the script connects to the MySQL server. Suppose that account can access not only the sampdb database but also a database named hr that contains human resources information for a company, such as employment records. This can lead to a security breach. db_browse.pl does its work by constructing URLs containing the names of tables that are assumed to be in the sampdb database, but there's nothing to stop anyone from directly sending a request using a similar URL that names a table in the hr database:

http://www.snake.net/cgi-bin/db_browse.pl?tbl_name=hr.employee

In this case, the script will connect to MySQL and make sampdb the default database, but the SELECT statement that it constructs will refer explicitly to a table in the hr database:

SELECT * FROM hr.employee ORDER BY 1 

The result is that the db_browse.pl script presents the contents of a sensitive table to anyone who can reach your Web server.

A Grade-Keeping Project Score Browser

Our next Web script, score_browse.pl, is designed to display scores that have been recorded for the grade-keeping project. Strictly speaking, we should have a way of entering the scores before we create a way of retrieving them. I'm saving the score entry script until the next chapter. In the meantime, we do have several sets of scores in the database already from the early part of the grading period. We can use the script to display those scores, even in the absence of a convenient score entry method. The script displays an ordered list of scores for any test or quiz, which is useful for determining the grading curve and assigning letter grades.

score_browse.pl has some similarities to db_browse.pl (both serve as information browsers), but is intended for the more specific purpose of looking at scores for a given quiz or test. The initial page presents a list of the possible grade events from which to choose and allows the user to select any of them to see the scores associated with the event. Scores for a given event are sorted by score with the highest scores first, so you can use the result to determine the grading curve.

The score_browse.pl script needs to examine only one parameter, event_id, to see whether or not a grade event was specified. If not, score_browse.pl displays the rows of the event table so that the user can select one. Otherwise, it displays the scores associated with the chosen event:

# ... set up connection to database (not shown) ... 

# put out initial part of page
my $title = "Grade-Keeping Project -- Score Browser";
print header ();
print start_html (-title => $title, -bgcolor => "white");
print h1 ($title);

# parameter that tells us which event to display scores for
my $event_id = param ("event_id");
# if $event_id has no value, display the event list.
# otherwise display the scores for the given event.
if (!defined ($event_id))
{
    display_events ($dbh)
}
else
{
    display_scores ($dbh, $event_id);
}

print end_html ();

The display_events() function pulls out information from the event table and displays it in tabular form, using column names from the query for the table column headings. Within each row, the event_id value is displayed as a hyperlink that can be selected to trigger a query that retrieves the scores for the event. The URL for each event is simply the path to score_browse.pl with a parameter attached that specifies the event number:

/cgi-bin/score_browse.pl?event_id=n 

display_events() is implemented as follows:

sub display_events 
{
my $dbh = shift;
my @rows;
my @cells;

    print p ("Select an event by clicking on its number:");

    # get list of events
    my $sth = $dbh->prepare (qq{
        SELECT event_id, date, type
        FROM event
        ORDER BY event_id
    });
    $sth->execute ();

    # use column names for table column headings
    for (my $i = 0; $i < $sth->{NUM_OF_FIELDS}; $i++)
    {
        push (@cells, th (escapeHTML ($sth->{NAME}->[$i])));
    }
    push (@rows, Tr (@cells));

    # display information for each event as a row in a table
    while (my ($event_id, $date, $type) = $sth->fetchrow_array ())
    {
        @cells = ();
        # display event ID as a hyperlink that reinvokes the script
        # to show the event's scores
        my $url = sprintf ("%s?event_id=%s", url (), escape ($event_id));
        my $link = a ({-href => $url}, escapeHTML ($event_id));
        push (@cells, td ($link));
        # display event date and type
        push (@cells, td (escapeHTML ($date)));
        push (@cells, td (escapeHTML ($type)));
        push (@rows, Tr (@cells));
    }

    # display table with a border
    print table ({-border => "1"}, @rows);
}

When the user selects an event, the browser sends a request for score_browse.pl that has an event ID at the end. score_browse.pl finds the event_id parameter set and calls the display_scores() function to list all the scores for the specified event. This function also displays the text "Show Event List" as a hyperlink back to the initial page so that the user can easily return to the event list page and select a different event. display_scores() looks like the following:

sub display_scores 
{
my ($dbh, $event_id) = @_;
my @rows;
my @cells;

    # Generate a link to the script that does not include any event_id
    # parameter.  If the user selects this link, the script will display
    # the event list.
    print p (a ({-href => url ()}, "Show Event List"));

    # select scores for the given event
    my $sth = $dbh->prepare (qq{
        SELECT
            student.name, event.date, score.score, event.type
        FROM
            student, score, event
        WHERE
            student.student_id = score.student_id
            AND score.event_id = event.event_id
            AND event.event_id = ?
        ORDER BY
            event.date ASC, event.type ASC, score.score DESC
    });
    $sth->execute ($event_id);  # bind event ID to placeholder in query

    print p (strong ("Scores for event $event_id"));
    # use column names for table column headings
    for (my $i = 0; $i < $sth->{NUM_OF_FIELDS}; $i++)
    {
        push (@cells, th (escapeHTML ($sth->{NAME}->[$i])));
    }
    push (@rows, Tr (@cells));

    while (my @ary = $sth->fetchrow_array ())
    {
        @cells = ();
        foreach my $val (@ary)
        {
            # display value if non-empty, else display non-breaking space
            if (defined ($val) && $val ne "")
            {
                $val = escapeHTML ($val);
            }
            else
            {
                $val = "&nbsp;";
            }
            push (@cells, td ($val));
        }
        push (@rows, Tr (@cells));
    }

    # display table with a border
    print table ({-border => "1"}, @rows);
}

The query that display_scores() runs is quite similar to one that we developed way back in Chapter 1 in the "Retrieving Information From Multiple Tables" section that describes how to write joins. In that chapter, we asked for scores for a given date because dates are more meaningful than event ID values. In contrast, when we use score_browse.pl, we know the exact event ID. That's not because we think in terms of event IDs (we don't) but because the script presents a list of them from which to choose, along with their dates and types. You can see that this type of interface reduces the need to know particular details. You don't need to know an event ID; it's necessary only to be able to recognize the date of the event you want and the script will provide the ID for you.

Historical League Common-Interest Searching

The db_browse.pl and score_browse.pl scripts allow the user to make a selection from a list of choices in an initial page where the choices are presented as hyperlinks that re-invoke the script with particular parameter values. Another way to allow users to provide information is to present a form that the user fills in. This is more appropriate when the range of possible choices isn't constrained to some easily determined set of values. Our next script demonstrates this method of soliciting user input.

In the earlier section "Putting DBI to Work," we constructed a command-line script, interests.pl, for finding Historical League members who share a particular interest. However, that script isn't something that League members have access to; the League secretary must run the script from the command prompt and then mail the result to the member who requested the list. It'd be nice to make this search capability more widely available so that members could use it themselves. Writing a Web-based script is one way to do that. The rest of this section discusses two approaches to table searching. The first is based on pattern matching, and the second uses MySQL FULLTEXT search capabilities.

Performing Searches Using Pattern Matching

The first search script, ushl_browse.pl, displays a form into which the user can enter a keyword. When the user submits the form, the script is re-invoked to search the member table for qualifying members and display the results. The search is done by adding the '%' wildcard character to both ends of the keyword and performing a LIKE pattern match, which finds records that have the keyword anywhere in the interests column value.

The main part of the script displays the keyword form. It also checks to see if a keyword was just submitted and, if so, performs a search:

my $title = "U.S. Historical League Interest Search"; 
print header ();
print start_html (-title => $title, -bgcolor => "white");
print h1 ($title);

# parameter to look for
my $keyword = param ("keyword");

# Display a keyword entry form.  In addition, if $keyword is defined,
# search for and display a list of members who have that interest.

print start_form (-method => "POST");
print p ("Enter a keyword to search for:");
print textfield (-name => "keyword", -value => "", -size => 40);
print submit (-name => "button", -value => "Search");
print end_form ();

# connect to server and run a search if a keyword was specified
if (defined ($keyword) && $keyword !~ /^\s*$/)
{
    # ... set up connection to database (not shown) ...
    search_members ($dbh, $keyword);
    # ... disconnect (not shown) ...
}

The script communicates information to itself a little differently than db_browse.pl or score_browse.pl. The keyword parameter is not added to the end of a URL. Instead, the information in the form is encoded by the browser and sent as part of a POST request. However, CGI.pm makes it irrelevant how the information is sent because param() returns the parameter value no matter how it was sent?just one more thing that CGI.pm does to make Web programming easier.

Keyword searches are performed by the search_members() function. It takes a database handle and the keyword as arguments, and then runs the search query and displays the list of matching member records:

sub search_members 
{
my ($dbh, $interest) = @_;

    print p ("Search results for keyword: " . escapeHTML ($interest));
    my $sth = $dbh->prepare (qq{
                SELECT * FROM member WHERE interests LIKE ?
                ORDER BY last_name, first_name
            });
    # look for string anywhere in interest field
    $sth->execute ("%" . $interest . "%");
    my $count = 0;
    while (my $ref = $sth->fetchrow_hashref ())
    {
        format_html_entry ($ref);
        ++$count;
    }
    print p ("Number of matching entries: $count");
}

When you run the ushl_browse.pl script, you'll notice that each time you submit a keyword value, it's redisplayed in the form on the next page. This happens even though the script specifies an empty string as the value of the keyword field when it generates the form. The reason is that CGI.pm automatically fills in form fields with values from the script execution environment if they are present. If you want to defeat this behavior and make the field blank every time, include an override parameter in the textfield() call:

print textfield (-name => "keyword", 
                        -value => "",
                        -override => 1,
                        -size => 40);

search_members() uses a helper function format_html_entry() to display individual entries. That function is much like the one of the same name that we wrote earlier for the gen_dir.pl script. (See the "Generating the Historical League Directory" section earlier in this chapter.) However, whereas the earlier version of the function generated HTML by printing markup tags directly, the version used by ushl_browse.pl uses CGI.pm functions to produce the tags:

sub format_html_entry 
{
my $entry_ref = shift;

    # encode characters that are special in HTML
    foreach my $key (keys (%{$entry_ref}))
    {
        next unless defined ($entry_ref->{$key});
        $entry_ref->{$key} = escapeHTML ($entry_ref->{$key});
    }
    print strong ("Name: " . format_name ($entry_ref)) . br ();
    my $address = "";
    $address .= $entry_ref->{street}
                                if defined ($entry_ref->{street});
    $address .= ", " . $entry_ref->{city}
                                if defined ($entry_ref->{city});
    $address .= ", " . $entry_ref->{state}
                                if defined ($entry_ref->{state});
    $address .= " " . $entry_ref->{zip}
                                if defined ($entry_ref->{zip});
    print "Address: $address" . br ()
                                if $address ne "";
    print "Telephone: $entry_ref->{phone}" . br ()
                                if defined ($entry_ref->{phone});
    print "Email: $entry_ref->{email}" . br ()
                                if defined ($entry_ref->{email});
    print "Interests: $entry_ref->{interests}" . br ()
                                if defined ($entry_ref->{interests});
    print br ();
}

format_html_entry() uses the format_name() function to glue the first_name, last_name, and suffix column values together. It's identical to the function of the same name in gen_dir.pl.

Performing Searches Using a FULLTEXT Index

Historical League members may have multiple interests. If so, they are separated by commas in the interests column of the member table. For example

Revolutionary War,Spanish-American War,Colonial period,Gold rush,Lincoln 

Can you use ushl_browse.pl to search for records that match multiple interests? Sort of, but not really. You can enter several words into the search form, but records won't match unless you separate the words by commas and unless the interests in the records occur in the same order you list them. A more flexible way to approach the interest-searching task is to use a FULLTEXT index.[2] This section describes a script ushl_ft_browse.pl that does so. It requires very little work.

[2] MySQL's FULLTEXT capabilities are described in Chapter 3.

The requirements for using ushl_ft_browse.pl are that you must have MySQL 3.23.23 or later and that the member table must be a MyISAM table. If you created member as some other table type, you can convert it to a MyISAM table with ALTER TABLE:

mysql> ALTER TABLE member TYPE = MYISAM; 

Next, it's necessary to index the member table properly. To do that, use the following statement:

mysql> ALTER TABLE member ADD FULLTEXT (interests); 

That allows the interests column to be used for FULLTEXT searches. The ushl_ft_browse.pl script in the sampdb distribution is based on ushl_browse.pl and differs from it only in the search_members() function that constructs the search query. The modified version of the function looks like this:

sub search_members 
{
my ($dbh, $interest) = @_;

    print p ("Search results for keyword: " . escapeHTML ($interest));
    my $sth = $dbh->prepare (qq{
                SELECT * FROM member WHERE MATCH(interests) AGAINST(?)
                ORDER BY last_name, first_name
            });
    # look for string anywhere in interest field
    $sth->execute ($interest);
    my $count = 0;
    while (my $ref = $sth->fetchrow_hashref ())
    {
        format_html_entry ($ref);
        ++$count;
    }
    print p ("Number of matching entries: $count");
}

This version of search_members() has the following changes relative to the earlier one:

  • The query uses MATCH()...AGAINST() rather than LIKE.

  • No "%" wildcard characters are added to the keyword string to convert it to a pattern.

With these changes, you can invoke ushl_ft_browse.pl from your Web browser and enter multiple keywords into the search form (with or without commas). The script will find member entries that match any of them.

You could of course get a lot fancier with this script. One possibility would be to take advantage of the fact that FULLTEXT searches can search multiple columns at once by setting up the index to span several columns and then modifying ushl_ft_browse.pl to search them all. For example, you could drop the original FULLTEXT index and add another that uses the last_name and full_name columns in addition to the interests column:

mysql> ALTER TABLE member DROP INDEX interests; 
mysql> ALTER TABLE member ADD FULLTEXT (interests,last_name,first_name);

To use the new index, modify the SELECT query in the search_members() function to change MATCH(interests) to MATCH(interests,last_name,first_name).

Another