Putting DBI to Work

At this point you've seen a number of the concepts involved in DBI programming, so let's move on to some of the things we wanted to be able to do with our sample database. Our goals were outlined initially in Chapter 1. Those that we'll tackle by writing DBI scripts in this chapter are listed here.

For the grade-keeping project, we want to be able to retrieve scores for any given quiz or test.

For the Historical League, we want to do the following:

  • Produce the member directory in different formats. We want a names-only list for use in the program distributed at the League's annual banquet and in a format we can use for generating the printed directory.

  • Find League members that need to renew their memberships soon, and then send email to let them know about it.

  • Edit member entries. (We'll need to update their expiration dates when they renew their memberships, after all.)

  • Find members that share a common interest.

  • Put the directory online.

For some of these tasks, we'll write scripts that run from the command line. For the others, we'll create scripts in the next section, "Using DBI in Web Applications," that you can use in conjunction with your Web server. At the end of the chapter, we'll still have a number of goals left to accomplish, but we'll finish up those that remain in Chapter 8, "The PHP API."

Generating the Historical League Directory

One of our goals is to be able to produce information from the Historical League directory in different formats. The simplest format to be generated is a list of member names for the printed program distributed to guests at the annual League banquet. The format can be a simple plain text listing. It will become part of the larger document used to create the banquet program, so all we need is something that can be pasted into that document.

For the printable directory, a better representation than plain text is needed because we want something nicely formatted. A reasonable choice here is RTF (Rich Text Format), a format developed by Microsoft that is understood by many word processors. Word is one such program, of course, but many others, such as WordPerfect and AppleWorks, understand it as well. Different word processors support RTF to varying degrees, but we'll use a basic subset of the full RTF specification that should be understandable by any word processor that is RTF-aware to even a minimal degree. (For example, the Mac OS X TextEdit application can read the RTF output we'll be generating in this section.)

The procedures for generating the banquet list and RTF directory formats are essentially the same?issue a query to retrieve the entries, and then run a loop that fetches and formats each entry. Given that basic similarity, it would be nice to avoid writing separate scripts for each format. To that end, let's write a single script (gen_dir.pl) that can generate different types of output. We can structure the script as follows:

  1. Before writing out member entries, perform any initialization that might be necessary for the output format. No special initialization is necessary for the banquet program member list, but we'll need to write out some initial control language for the RTF version.

  2. Fetch and print each entry, formatted appropriately for the type of output we want.

  3. After all the entries have been processed, perform any necessary cleanup and termination. Again, no special handling is needed for the banquet list, but some closing control language is required for the RTF version.

It's possible that in the future we'll want to use this script to write output in other formats, so let's make it extensible by setting up a "switchbox," that is, a hash with an element for each output format. Each element specifies which functions to invoke to carry out each output generation phase for a given format?an initialization function, an entry-writing function, and a cleanup function:

# switchbox containing formatting functions for each output format 
my %switchbox =
(
    "banquet" =>                        # functions for banquet list
    {
        "init"      => undef,           # no initialization needed
        "entry"     => \&format_banquet_entry,
        "cleanup"   => undef            # no cleanup needed
    },
    "rtf" =>                            # functions for RTF format
    {
        "init"      => \&rtf_init,
        "entry"     => \&format_rtf_entry,
        "cleanup"   => \&rtf_cleanup
    }
);

Each element of the switchbox is keyed by a format name ("banquet" or "rtf"). We'll write the script so that you just specify the format you want on the command line when you run it:

% ./gen_dir.pl banquet 
% ./gen_dir.pl rtf

By setting up a switchbox this way, we'll be able to add the capability for a new format easily, should we want to do so:

  1. Write three formatting functions for the output generation phases.

  2. Add a new element to the switchbox that defines a format name and that points to the output functions.

  3. To produce output in the new format, invoke gen_dir.pl and specify the format name on the command line.

The code for selecting the proper switchbox entry according to the first argument on the command line is shown next. If no format name or an invalid name is specified on the command line, the script produces an error message and displays a list of the allowable names. Otherwise, $func_hashref will point to the appropriate switchbox entry:

# make sure one argument was specified on the command line 
@ARGV == 1
    or die "Usage: gen_dir format_type\nAllowable formats: "
            . join (" ", sort (keys (%switchbox))) . "\n";

# determine proper switchbox entry from argument on command line;
# if no entry is found, the format type is invalid
my $func_hashref = $switchbox{$ARGV[0]};

defined ($func_hashref)
    or die "Unknown format: $ARGV[0]\nAllowable formats: "
            . join (" ", sort (keys (%switchbox))) . "\n";

The format selection code is based on the fact that the output format names are the keys in the %switchbox hash. If a valid format name is given, the corresponding switchbox entry points to the output functions. If an invalid name is given, no entry will exist. This makes it unnecessary to hardwire any names into the format selection code. It also means that when you add a new entry to the switchbox, the code will detect it automatically with no change.

If a valid format name is specified on the command line, the preceding code sets $func_hashref. Its value will be a reference to the hash that points to the output writing functions for the selected format. We can invoke the initialization function, fetch and print the entries, and invoke the cleanup function:

# invoke the initialization function if there is one 
&{$func_hashref->{init}} if defined ($func_hashref->{init});

# fetch and print entries if there is an entry formatting function
if (defined ($func_hashref->{entry}))
{
    my $sth = $dbh->prepare (qq{
        SELECT * FROM member ORDER BY last_name, first_name
    });
    $sth->execute ();
    while (my $entry_ref = $sth->fetchrow_hashref ("NAME_lc"))
    {
        # pass entry by reference to the formatting function
        &{$func_hashref->{entry}} ($entry_ref);
    }
}

# invoke the cleanup function if there is one
&{$func_hashref->{cleanup}} if defined ($func_hashref->{cleanup});

The entry-fetching loop uses fetchrow_hashref() for a reason. If the loop fetched an array, each formatting function would have to know the order of the columns. It's possible to figure that out by accessing the $sth->{NAME} attribute (which contains column names in the order in which they are returned), but why bother? By using a hash reference, formatting functions can just name the column values they want using $entry_ref->{col_name}. That technique is much easier than using the NAME attribute and it can be used for any format we want to generate because we know that any fields we need will be in the hash.

All that remains is to write the functions for each output format (that is, for the functions named by the switchbox entries).

Generating the Banquet Program Member List

For this output format, no initialization or cleanup calls are necessary; we only need an entry formatting function (format_banquet_entry()) that takes a reference to a member entry and prints the member's name. An outline of the function looks like this:

sub format_banquet_entry 
{
    # print member name here, using first_name, last_name, and suffix
    elements of the hash printed to by the function argument
}

The tricky part of printing names is dealing with the suffix part. Suffixes such as Jr. or Sr. should be preceded by a comma and a space, whereas suffixes such as II or III should be preceded only by a space:

Michael Alvis IV 
Clarence Elgar, Jr.
Bill Matthews, Sr.
Mark York II

The letters I, V, and X are the only ones used in the roman numerals for the 1st to the 39th generation. It's unlikely that we'll need any numerals beyond that range, so we can determine whether or not to add a comma by checking whether the suffix value matches the following pattern:

/^[IVX]+$/ 

The code in format_banquet_entry() that puts the parts of the name together in the proper order is something we'll need for the RTF version of the directory as well. So instead of duplicating that code in format_rtf_entry(), let's stuff it into a helper function:

sub format_name 
{
my $entry_ref = shift;

    my $name = $entry_ref->{first_name} . " " . $entry_ref->{last_name};
    if (defined ($entry_ref->{suffix}))         # there is a name suffix
    {
        # no comma for suffixes of I, II, III, etc.
        $name .= "," unless $entry_ref->{suffix} =~ /^[IVX]+$/;
        $name .= " " . $entry_ref->{suffix}
    }
    return ($name);
}

With format_name() in place, the implementation of the format_ banquet_entry() function that prints an entry becomes almost completely trivial:

sub format_banquet_entry 
{
    printf "%s\n", format_name ($_[0]);
}

Generating the Print-Format Directory

Generating the RTF version of the directory is a little more involved than generating the member list for the banquet program. For one thing, we need to print more information from each entry. For another, we need to put out some RTF control language with each entry to achieve the effects that we want and some control language at the beginning and end of the document. A minimal framework for an RTF document looks like the following:

{\rtf0 
{\fonttbl {\f0 Times;}}
\plain \f0 \fs24
    ...document content goes here...
}

The document begins and ends with curly braces '{' and '}'. RTF keywords begin with a backslash, and the first keyword of the document must be \rtfn, where n is the RTF specification version number the document corresponds to. Version 0 is fine for our purposes.

Within the document, we specify a font table to indicate the font to use for the entries. Font table information is listed in a group consisting of curly braces containing a leading \fonttbl keyword and some font information. The font table shown in the framework defines font number 0 to be in Times. (We only need one font, but you could use more if you wanted to be fancier.)

The next few directives set up the default formatting style: \plain selects plain format, \f0 selects font 0 (which we've defined as Times in the font table), and \fs24 sets the font size to 12 points (the number following \fs indicates the size in half-points). It's not necessary to set up margins because most word processors will supply reasonable defaults.

The framework is provided by the initialization and cleanup functions, which look like the following (note the double backslashes to get single backslashes in the output):

sub rtf_init 
{
    print "{\\rtf0\n";
    print "{\\fonttbl {\\f0 Times;}}\n";
    print "\\plain \\f0 \\fs24\n";
}

sub rtf_cleanup
{
    print "}\n";
}

The content of the document is produced by the entry formatting function. To take a very simple approach, we can print each entry as a series of lines, with a label on each line. If the information corresponding to a particular output line is missing, the line is omitted. (For example, the Email: line does not need to be printed for members who have no email address.) Some lines (such as the Address: line) are composed of the information in multiple columns (street, city, state, zip), so the script must be able to deal with various combinations of missing values. The following is a sample of the output format we'll use:

Name: Mike Artel 
Address: 4264 Lovering Rd., Miami, FL 12777
Telephone: 075-961-0712
Email: mike_artel@venus.org
Interests: Civil Rights,Education,Revolutionary War

For that entry, the RTF representation looks like this:

\b Name: Mike Artel\b0} 
Address: 4264 Lovering Rd., Miami, FL 12777}
Telephone: 075-961-0712}
Email: mike_artel@venus.org}
Interests: Civil Rights,Education,Revolutionary War}

To make the Name: line bold, it's surrounded by \b (with a space afterward) to turn boldface on and \b0 to turn boldface off. The member name is formatted by the format_name() function shown earlier in the "Generating the Banquet Program Member List" section. Each line has a paragraph marker (\par) at the end to tell the word processor to move to the next line?nothing too complicated. The primary difficulties lie in formatting the address string and determining which output lines should be printed:

sub format_rtf_entry 
{
my $entry_ref = shift;

    printf "\\b Name: %s\\b0\\par\n", format_name ($entry_ref);
    my $address = "";
    $address .= $entry_ref->{street}
                                if defined ($entry_ref->{street});
    $address .= ", " . $entry_ref->{city}
                                if defined ($entry_ref->{city});
    $address .= ", " . $entry_ref->{state}
                                if defined ($entry_ref->{state});
    $address .= " " . $entry_ref->{zip}
                                if defined ($entry_ref->{zip});
    print "Address: $address\\par\n"
                                if $address ne "";
    print "Telephone: $entry_ref->{phone}\\par\n"
                                if defined ($entry_ref->{phone});
    print "Email: $entry_ref->{email}\\par\n"
                                if defined ($entry_ref->{email});
    print "Interests: $entry_ref->{interests}\\par\n"
                                if defined ($entry_ref->{interests});
    print "\\par\n";
}

You're not locked into this particular formatting style, of course. You can change how you print any of the fields, so you can change the style of your printed directory almost at will, simply by changing format_rtf_entry(). With the directory in its original form (a word processing document), that's something not so easily done.

The gen_dir.pl script is now complete, and you can generate the directory in either output format by running commands such as the following:

% ./gen_dir.pl banquet > names.txt 
% ./gen_dir.pl rtf > directory.rtf

At this point, it's a simple step to read the name list and paste it into the annual banquet program document or to read the RTF file into any word processor that understands RTF.

DBI made it easy to extract the information we wanted from MySQL, and Perl's text-processing capabilities made it easy to put that information into the format we wanted to see. MySQL doesn't provide any particularly fancy way of formatting output, but it doesn't matter because of the ease with which you can integrate MySQL's database handling abilities into a language such as Perl, which has excellent text manipulation capabilities.

Sending Membership Renewal Notices

With the Historical League directory maintained in its original form (as a word processing document), it's a time-consuming and error-prone activity to determine which members need to be notified that their memberships should be renewed. Now that we have the information in a database, it's possible to automate the renewal-notification process a bit. We want to identify members who need to renew, and send them a message via email so that we don't have to contact them by phone or surface mail.

What we need to do is determine which members are due for renewal within a certain number of days. The query for this involves a date calculation that's relatively simple:

SELECT ... FROM member 
WHERE expiration < DATE_ADD(CURDATE(), INTERVAL cutoff DAY)

cutoff signifies the number of days of leeway we want to grant. The query selects member entries that are due for renewal in fewer than that many days. To find memberships that have actually expired, a cutoff value of 0 identifies records with expiration dates in the past.

After we've identified the records that qualify for notification, what should we do with them? One option would be to send mail directly from the same script, but it might be useful first to be able to review the list without sending any messages. For this reason, we'll use a two-stage approach:

  1. Run a script, need_renewal.pl, to identify members that need to renew. You can examine this list to verify it, and then use it as input to the second stage that sends the renewal notices.

  2. Run a script, renewal_notify.pl, that sends members a "please renew" notice by email. The script should warn you about members without email addresses so that you can contact them by other means.

For the first part of this task, the need_renewal.pl script must identify which members need to renew. The main part of the script that does this is as follows:

# Use default cutoff of 30 days... 
my $cutoff = 30;
# ...but reset if a numeric argument is given on the command line
$cutoff = shift (@ARGV) if @ARGV && $ARGV[0] =~ /^\d+$/;

warn "Using cutoff of $cutoff days\n";

my $sth = $dbh->prepare (qq{
        SELECT
            member_id, email, last_name, first_name, expiration,
            TO_DAYS(expiration) - TO_DAYS(CURDATE()) AS days
        FROM member
        WHERE expiration < DATE_ADD(CURDATE(), INTERVAL ? DAY)
        ORDER BY expiration, last_name, first_name
});
$sth->execute ($cutoff);    # pass cutoff as placeholder value

while (my $entry_ref = $sth->fetchrow_hashref ())
{
    # convert undef values to empty strings for printing
    foreach my $key (keys (%{$entry_ref}))
    {
        $entry_ref->{$key} = "" if !defined ($entry_ref->{$key});
    }
    print join ("\t",
                $entry_ref->{member_id},
                $entry_ref->{email},
                $entry_ref->{last_name},
                $entry_ref->{first_name},
                $entry_ref->{expiration},
                $entry_ref->{days} . " days")
        . "\n";
}

The output from the need_renewal.pl script looks something like the following (you'll get different output because the results are determined against the current date, which will be different for you while reading this book than for me while writing it):

89  g.steve@pluto.com       Garner  Steve   2002-08-03  -32 days
18  york_mark@earth.com     York    Mark    2002-08-24  -11 days
82  john_edwards@venus.org  Edwards John    2002-09-12  8 days

Observe that some memberships need to be renewed in a negative number of days. That means they've already expired! (This happens when you maintain records manually; people slip through the cracks. Now that we have the information in a database, we're finding out that we missed a few people before.)

The second part of the renewal notification task involves a script, renewal_notify.pl, that sends out the notices by email. To make renewal_notify.pl a little easier to use, we can make it understand three kinds of command-line arguments: membership ID numbers, email addresses, and filenames. Numeric arguments signify membership ID values, and arguments containing a '@' character signify email addresses. Anything else is interpreted as the name of a file that should be read to find ID numbers or email addresses. This method enables you to specify members by their ID number or email address, and you can do so either directly on the command line or by listing them in a file. (In particular, you can save the output of need_renewal.pl in a file, and then use the file as input to renewal_notify.pl.)

For each member who is to be sent a notice, the script looks up the relevant member table entry, extracts the email address, and sends a message to that address. If there is no address in the entry, renewal_notify.pl generates a warning message that you need to contact these members in some other way.

The main argument-processing loop operates as follows. If no arguments were specified on the command line, we read the standard input for input. Otherwise, we process each argument by passing it to interpret_ argument() for classification as an ID number, an email address, or a filename:

if (@ARGV == 0)     # no arguments, read STDIN for values 
{
    read_file (\*STDIN);
}
else
{
    while (my $arg = shift (@ARGV))
    {
        # interpret argument, with filename recursion
        interpret_argument ($arg, 1);
    }
}

The read_file() function reads the contents of a file (assumed to be open already) and looks at the first field of each line. (If we feed the output of need_renewal.pl to renewal_notify.pl, each line has several fields, but we want to look only at the first one, which will contain a member ID number.)

sub read_file 
{
my $fh = shift;     # handle to open file
my $arg;

    while (defined ($arg = <$fh>))
    {
        # strip off everything past column 1, including newline
        $arg =~ s/\s.*//s;
        # interpret argument, without filename recursion
        interpret_argument ($arg, 0);
    }
}

The interpret_argument() function classifies each argument to determine whether it's an ID number, an email address, or a filename. For ID numbers and email addresses, it looks up the appropriate member entry and passes it to notify_member(). We have to be careful with members specified by email address. It's possible that two members have the same address (for example, a husband and wife), and we don't want to send a message to someone to whom it doesn't apply. To avoid this, we look up the member ID corresponding to an email address to make sure there is exactly one. If the address matches more than one ID number, it's ambiguous and we ignore it after printing a warning.

If an argument doesn't look like an ID number or email address, it's taken to be the name of a file to read for further input. We have to be careful here, too?we don't want to read a file if we're already reading a file in order to avoid the possibility of an infinite loop:

sub interpret_argument 
{
my ($arg, $recurse) = @_;

    if ($arg =~ /^\d+$/)        # numeric membership ID
    {
        notify_member ($arg);
    }
    elsif ($arg =~ /@/)         # email address
    {
        # get member_id associated with address
        # (there should be exactly one)
        my $query = qq{ SELECT member_id FROM member WHERE email = ? };
        my $ary_ref = $dbh->selectcol_arrayref ($query, undef, $arg);
        if (scalar (@{$ary_ref}) == 0)
        {
            warn "Email address $arg matches no entry: ignored\n";
        }
        elsif (scalar (@{$ary_ref}) > 1)
        {
            warn "Email address $arg matches multiple entries: ignored\n";
        }
        else
        {
            notify_member ($ary_ref->[0]);
        }
    }
    else                        # filename
    {
        if (!$recurse)
        {
            warn "filename $arg inside file: ignored\n";
        }
        else
        {
            open (IN, $arg) or die "Cannot open $arg: $!\n";
            read_file (\*IN);
            close (IN);
        }
    }
}

The notify_member() function is responsible for actually sending the renewal notice. If it turns out that the member has no email address, notify_member() can't send any message, but it prints a warning so that you know you need to contact the member in some other way. (You can invoke show_member.pl with the membership ID number shown in the message to see the full entry, to find out what the member's phone number and address are, for example.) notify_member() looks like this:

sub notify_member 
{
my $member_id = shift;

    warn "Notifying $member_id...\n";
    my $query = qq{ SELECT * FROM member WHERE member_id = ? };
    my $sth = $dbh->prepare ($query);
    $sth->execute ($member_id);
    my @col_name = @{$sth->{NAME}};
    my $entry_ref = $sth->fetchrow_hashref ();
    $sth->finish ();
    if (!$entry_ref)                        # no member found!
    {
        warn "NO ENTRY found for member $member_id!\n";
        return;
    }
    if (!defined ($entry_ref->{email}))     # no email address in entry
    {
        warn "Member $member_id has no email address; no message was sent\n";
        return;
    }
    open (OUT, "| $sendmail") or die "Cannot open mailer\n";
    print OUT <<EOF;
To: $entry_ref->{email}
Subject: Your USHL membership is in need of renewal

Greetings.  Your membership in the U.S. Historical League is
due to expire soon.  We hope that you'll take a few minutes to
contact the League office to renew your membership.  The
contents of your member entry are shown below.  Please note
particularly the expiration date.

Thank you.

EOF
    foreach my $col_name (@col_name)
    {
        printf OUT "$col_name:";
        printf OUT " $entry_ref->{$col_name}"
                        if defined ($entry_ref->{$col_name});
        printf OUT "\n";
    }
    close (OUT);
}

The notify_member() function sends mail by opening a pipe to the sendmail program and shoving the mail message into the pipe. The pathname to sendmail is set as a parameter near the beginning of the renewal_notify.pl script. You may need to change this path because the location of sendmail varies from system to system:

# change path to match your system 
my $sendmail = "/usr/sbin/sendmail -t -oi";

If you don't have sendmail, the script will not work properly. (For example, Windows systems typically do not have sendmail installed.) To handle this case, the sampdb distribution contains a modified version of renewal_notify.pl that uses the Mail::Sendmail module that works without the sendmail program. If you install that module, you can use the modified version instead.

You could get fancier with this script?for example, by adding a column to the member table to record when the most recent renewal reminder was sent out and then having renewal_notify.pl update that column when it sends mail. Doing so would help you to not send out notices too frequently. As it is, we'll just assume you won't run this program more than once a month or so.

The two scripts are done now, so you can use them as follows. First, run need_renewal.pl to generate a list of memberships that have expired or will soon do so:

% ./need_renewal.pl > tmp 

Then take a look at tmp to see if it looks reasonable. If so, use it as input to renewal_notify.pl to send renewal messages:

% ./renewal_notify.pl tmp 

To notify individual members, you can specify them by ID number or email address. For example, the following command notifies member 18 and the member having the email address g.steve@pluto.com:

% ./renewal_notify.pl 18 g.steve@pluto.com 

Historical League Member Entry Editing

After we start sending out renewal notices, it's safe to assume that some of the people we notify will renew their memberships. When that happens, we'll need a way to update their entries with new expiration dates. In the next chapter, we'll develop a way to edit member records over the Web, but here we'll develop a command-line script (edit_member.pl) that enables you to update entries using a simple approach of prompting for new values for each part of an entry. It works like this:

  • If invoked with no argument on the command line, edit_member.pl assumes you want to enter a new member, prompts for the initial information to be placed in the member's entry, and creates a new entry.

  • If invoked with a membership ID number on the command line, edit_member.pl looks up the existing contents of the entry, and then prompts for updates to each column. If you enter a value for a column, it replaces the current value. If you press Enter, the column is not changed. If you enter the word none, it clears the column's current value. (If you don't know a member's ID number, you can run show_member.pl last_name to see which entries match the given last name and from that determine the proper ID.)

It's probably overkill to allow an entire entry to be edited this way if all you want to do is update a member's expiration date. On the other hand, a script like this also provides a simple general-purpose way to update any part of an entry without knowing any SQL. (One special case is that edit_member.pl won't allow you to change the member_id field because that's automatically assigned when an entry is created and shouldn't change thereafter.)

The first thing edit_member.pl needs to know is the names of the columns in the member table:

# get member table column names 
my $sth = $dbh->prepare (qq{ SELECT * FROM member WHERE 0 });
$sth->execute ();
my @col_name = @{$sth->{NAME}};
$sth->finish ();
Then we can enter the main loop:
if (@ARGV == 0) # if no arguments were given, create a new entry
{
    # pass reference to array of column names
    new_member (\@col_name);
}
else            # otherwise edit entries using arguments as member IDs
{
    # save @ARGV, then empty it so that when the script reads from
    # STDIN, it doesn't interpret @ARGV contents as input filenames
    my @id = @ARGV;
    @ARGV = ();
    # for each ID value, look up the entry, then edit it
    while (my $id = shift (@id))
    {
        $sth = $dbh->prepare (qq{
                    SELECT * FROM member WHERE member_id = ?
                });
        $sth->execute ($id);
        my $entry_ref = $sth->fetchrow_hashref ();
        $sth->finish ();
        if (!$entry_ref)
        {
            warn "No member with member ID = $id\n";
            next;
        }
        # pass reference to array of column names and reference to entry
        edit_member (\@col_name, $entry_ref);
    }
}

The code for creating a new member entry is as follows. It solicits values for each member table column, and then issues an INSERT statement to add a new record:

sub new_member 
{
my $col_name_ref = shift;   # reference to array of column names
my $entry_ref = { };        # create new entry as a hash

    return unless prompt ("Create new entry (y/n)? ") =~ /^y/i;
    # prompt for new values; user types in new value, or Enter
    # to leave value unchanged, "none" to clear the value, or
    # "exit" to exit without creating the record.
    foreach my $col_name (@{$col_name_ref})
    {
        next if $col_name eq "member_id";   # skip key field
        my $col_val = col_prompt ($col_name, undef);
        next if $col_val eq "";             # user pressed Enter
        return if $col_val eq lc ("exit");  # early exit
        $col_val = undef if $col_val eq lc ("none");
        $entry_ref->{$col_name} = $col_val;
    }
    # show values, ask for confirmation before inserting
    show_member ($col_name_ref, $entry_ref);
    return unless prompt ("\nInsert this entry (y/n)? ") =~ /^y/i;

    # construct an INSERT query, then issue it.
    my $query = "INSERT INTO member";
    my $delim = " SET "; # put "SET" before first column, "," before others
    foreach my $col_name (@{$col_name_ref})
    {
        # only specify values for columns that were given one
        next if !defined ($entry_ref->{$col_name});
        # quote() quotes undef as the word NULL (without quotes),
        # which is what we want.  Columns that are NOT NULL will
        # be assigned their default values.
        $query .= sprintf ("%s %s=%s", $delim, $col_name,
                            $dbh->quote ($entry_ref->{$col_name}));
        $delim = ",";
    }
    $dbh->do ($query) or warn "Warning: new entry not created?\n"
}

edit_member.pl uses two routines to prompt the user for information. prompt() asks a question and returns the answer:

sub prompt 
{
my $str = shift;

    print STDERR $str;
    chomp ($str = <STDIN>);
    return ($str);
}

col_prompt() takes a column name argument. It prints the name as a prompt to solicit a new column value and returns the value entered by the user:

sub col_prompt 
{
my ($col_name, $entry_ref) = @_;

    my $prompt = $col_name;
    if (defined ($entry_ref))
    {
        my $cur_val = $entry_ref->{$col_name};
        $cur_val = "NULL" if !defined ($cur_val);
        $prompt .= " [$cur_val]";
    }
    $prompt .= ": ";
    print STDERR $prompt;
    my $str = <STDIN>;
    chomp ($str);
    return ($str);
}

The second argument to col_prompt() is a reference to the hash that represents the member entry. For creating a new entry, this value will be undef, but when editing existing records, it will point to the current contents of the entry. In that case, col_prompt() includes the current value of the column that it's prompting for in the prompt string so that the user can see what it is. The user can accept the value simply by pressing Enter.

The code for editing an existing member is similar to that for creating a new member. However, we have an entry to work with, so the prompt routine displays the current entry values, and the edit_member() function issues an UPDATE statement rather than an INSERT:

sub edit_member 
{
# references to array of column names and to entry hash
my ($col_name_ref, $entry_ref) = @_;

    # show initial values, ask for okay to go ahead and edit
    show_member ($col_name_ref, $entry_ref);
    return unless prompt ("\nEdit this entry (y/n)? ") =~ /^y/i;
    # prompt for new values; user types in new value, or Enter
    # to leave value unchanged, "none" to clear the value, or
    # "exit" to exit without changing the record.
    foreach my $col_name (@{$col_name_ref})
    {
        next if $col_name eq "member_id";   # skip key field
        my $col_val = col_prompt ($col_name, $entry_ref);
        next if $col_val eq "";             # user pressed Enter
        return if $col_val eq lc ("exit");  # early exit
        $col_val = undef if $col_val eq lc ("none");
        $entry_ref->{$col_name} = $col_val;
    }
    # show new values, ask for confirmation before updating
    show_member ($col_name_ref, $entry_ref);
    return unless prompt ("\nUpdate this entry (y/n)? ") =~ /^y/i;

    # construct an UPDATE query, then issue it.
    my $query = "UPDATE member";
    my $delim = " SET "; # put "SET" before first column, "," before others
    foreach my $col_name (@{$col_name_ref})
    {
        next if $col_name eq "member_id";   # skip key field
        # quote() quotes undef as the word NULL (without quotes),
        # which is what we want.  Columns that are NOT NULL will
        # be assigned their default values.
        $query .= sprintf ("%s %s=%s", $delim, $col_name,
                            $dbh->quote ($entry_ref->{$col_name}));
        $delim = ",";
    }
    $query .= " WHERE member_id = " . $dbh->quote ($entry_ref->{member_id});
    $dbh->do ($query) or warn "Warning: entry not undated?\n"
}

A problem with edit_member.pl is that it doesn't do any input value validation. For most fields in the member table, there isn't much to validate?they're just string fields. But for the expiration column, input values really should be checked to make sure they look like dates. In a general-purpose data entry application, you'd probably want to extract information about a table to determine the types of all its columns. Then you could base validation constraints on those types. That's more involved than I want to go into here, so I'm just going to add a quick hack to the col_prompt() function to check the format of the input if the column is expiration. A minimal date value check can be done as follows:

sub col_prompt 
{
my ($col_name, $entry_ref) = @_;

loop:
    my $prompt = $col_name;
    if (defined ($entry_ref))
    {
        my $cur_val = $entry_ref->{$col_name};
        $cur_val = "NULL" if !defined ($cur_val);
        $prompt .= " [$cur_val]";
    }
    $prompt .= ": ";
    print STDERR $prompt;
    my $str = <STDIN>;
    chomp ($str);
    # perform rudimentary check on the expiration date
    if ($str && $col_name eq "expiration")  # check expiration date format
    {
        if ($str !~ /^\d+\D\d+\D\d+$/)
        {
            warn "$str is not a legal date, try again\n";
            goto loop;
        }
    }
    return ($str);
}

The pattern tests for three sequences of digits separated by non-digit characters. This is only a partial check because it doesn't detect values such as "1999-14-92" as being illegal. To make the script better, you could give it more stringent date checks or add other checks, such as requiring the first and last name fields to be given non-empty values.

An improvement might be to skip the update operation for an existing entry if the user made no changes. You could do this by saving the original values of the member entry columns and then writing the UPDATE statement to update only those columns that have changed. If there were none, the statement need not even be issued. Another improvement would be to notify the user if the record was already changed by someone else while the user was editing it. To do this, write the WHERE clause to include AND col_name =col_val for each original column value. This will cause the UPDATE to fail if someone else had changed the record, which provides feedback that two people are trying to change the entry at the same time.

There's another shortcoming of the edit_member.pl script that you might consider how to address: As written, the script opens a connection to the database before executing the prompt loop and doesn't close it until writing out the record after the loop. If the user takes a long time to enter or update the record, or just happens to do something else for a while, the connection can remain open for a long time. How would you modify edit_member.pl to hold the connection open only as long as necessary?

Finding Historical League Members with Common Interests

One of the duties of the Historical League secretary is to process requests from members who'd like a list of other members who share a particular interest within the field of U.S. history, such as the Great Depression or the life of Abraham Lincoln. It's easy enough to find such members when the directory is maintained in a word processor document by using the word processor's Find function. However, producing a list consisting only of the qualifying member entries is more difficult because it involves a lot of copy and paste. With MySQL, the job becomes much easier because we can just run a query like the following:

SELECT * FROM member WHERE interests LIKE '%lincoln%' 
ORDER BY last_name, first_name

Unfortunately, the results don't look very nice if we run this query from the mysql client. Let's put together a little DBI script, interests.pl, that produces better-looking output. The script first checks to make sure there is at least one argument named on the command line, because there is nothing to search for otherwise. Then, for each argument, the script runs a search on the interests column of the member table:

@ARGV or die "Usage: interests.pl keyword\n"; 
search_members (shift (@ARGV)) while @ARGV;

To search for the keyword string, we put '%' wildcard characters on each side and perform a pattern match so that the string can be found anywhere in the interests column. Then we print the matching entries:

sub search_members 
{
my $interest = shift;

    print "Search results for keyword: $interest\n\n";
    my $sth = $dbh->prepare (qq{
                SELECT * FROM member WHERE interests LIKE ?
                ORDER BY last_name, first_name
            });
    # look for string anywhere in interest field
    $sth->execute ("%" . $interest . "%");
    my $count = 0;
    while (my $hash_ref = $sth->fetchrow_hashref ())
    {
        format_entry ($hash_ref);
        ++$count;
    }
    print "Number of matching entries: $count\n\n";
}

The format_entry() function turns an entry into its printable representation. I won't show it here because it's essentially the same as the format_rtf_entry() function from the gen_dir.pl script, with the RTF control words stripped out. Take a look at the interests.pl script in the sampdb distribution to see the implementation.

Putting the Historical League Directory Online

In the next section, "Using DBI in Web Applications," we'll start writing scripts that connect to the MySQL server to extract information and write that information in the form of Web pages that appear in a client's Web browser. Those scripts generate HTML dynamically according to what the client requested. Before we reach that point, let's begin thinking about HTML by writing a DBI script that generates a static HTML document that can be loaded into a Web server's document tree. A good candidate for this task is to produce the Historical League directory in HTML format (after all, one of our goals was to put the directory online).

A simple HTML document has a structure something like the following:

<html>                             beginning of document 
<head>                             beginning of document head
<title>My Page Title</title>       title of document
</head>                            end of document head
<body bgcolor="white">             beginning of document body
                                     (white background)
<h1>My Level 1 Heading</h1>        a level 1 heading

... content of document body ...

</body>                            end of document body
</html>                            end of document

It's not necessary to write a completely new script to generate the directory in HTML format. Recall that when we wrote the gen_dir.pl script, we used an extensible framework so that we'd be able to plug in code for producing the directory in additional formats. Let's take advantage of that extensibility now by adding code for generating HTML output. To do this, we need to make the following modifications to gen_dir.pl:

  • Write document initialization and cleanup functions.

  • Write a function to format individual entries.

  • Add a switchbox element that identifies the format name and associates it with the functions that produce output in that format.

The HTML document outline just shown breaks down pretty easily into prolog and epilog sections that can be handled by the initialization and cleanup functions, as well as a middle part that can be generated by the entry-formatting function. The HTML initialization function generates everything up through the level 1 heading, and the cleanup function generates the closing </body> and </html> tags:

sub html_init 
{
    print "<html>\n";
    print "<head>\n";
    print "<title>U.S. Historical League Member Directory</title>\n";
    print "</head>\n";
    print "<body bgcolor=\"white\">\n";
    print "<h1>U.S. Historical League Member Directory</h1>\n";
}

sub html_cleanup
{
    print "</body>\n";
    print "</html>\n";
}

The real work, as usual, lies in formatting the entries. But even this isn't very difficult. We can copy the format_rtf_entry() function, make sure any special characters in the entry are encoded, and replace the RTF control words with HTML markup tags:

sub format_html_entry 
{
my $entry_ref = shift;

    # Convert <, >, ", and & to the corresponding HTML entities
    # (&lt;, &gt;, &quot, &amp;)
    foreach my $key (keys (%{$entry_ref}))
    {
        next unless defined ($entry_ref->{$key});
        $entry_ref->{$key} =~ s/&/&amp;/g;
        $entry_ref->{$key} =~ s/\"/&quot;/g;
        $entry_ref->{$key} =~ s/>/&gt;/g;
        $entry_ref->{$key} =~ s/</&lt;/g;
    }
    printf "<strong>Name: %s</strong><br />\n", format_name ($entry_ref);
    my $address = "";
    $address .= $entry_ref->{street}
                                if defined ($entry_ref->{street});
    $address .= ", " . $entry_ref->{city}
                                if defined ($entry_ref->{city});
    $address .= ", " . $entry_ref->{state}
                                if defined ($entry_ref->{state});
    $address .= " " . $entry_ref->{zip}
                                if defined ($entry_ref->{zip});
    print "Address: $address<br />\n"
                                if $address ne "";
    print "Telephone: $entry_ref->{phone}<br />\n"
                                if defined ($entry_ref->{phone});
    print "Email: $entry_ref->{email}<br />\n"
                                if defined ($entry_ref->{email});
    print "Interests: $entry_ref->{interests}<br />\n"
                                if defined ($entry_ref->{interests});
    print "<br />\n";
}

The function produces output that looks like this:

<strong>Name: Mike Artel</strong><br /> 
Address: 4264 Lovering Rd., Miami, FL 12777<br />
Telephone: 075-961-0712<br />
Email: mike_artel@venus.org<br />
Interests: Civil Rights,Education,Revolutionary War<br />
<br />

The reason for using <br /> rather than <br> is to write the document as valid XHTML, which is more strict than HTML. Some distinctions between HTML and XHTML are discussed briefly in the "Writing Web Output" section later in this chapter.

The last modification needed for gen_dir.pl is to add to the switchbox another element that points to the HTML-writing functions. The modified switchbox looks like the following, where the final element defines a format named html that points to the functions that produce the various parts of an HTML-format document:

# switchbox containing formatting functions for each output format 
my %switchbox =
(
    "banquet" =>                        # functions for banquet list
    {
        "init"      => undef,           # no initialization needed
        "entry"     => \&format_banquet_entry,
        "cleanup"   => undef            # no cleanup needed
    },
    "rtf" =>                            # functions for RTF format
    {
        "init"      => \&rtf_init,
        "entry"     => \&format_rtf_entry,
        "cleanup"   => \&rtf_cleanup
    },
    "html" =>                           # functions for HTML format
    {
        "init"      => \&html_init,
        "entry"     => \&format_html_entry,
        "cleanup"   => \&html_cleanup
    }
);

To make the directory available in HTML format, run the following command and install the resulting output file in your Web server's document tree:

% ./gen_dir.pl html > directory.html 

When you update the member table in the database, you can run the command again to update the online version. If you want to avoid running the command manually, another strategy is to set up a cron job that executes periodically to update the online directory automatically. Suppose that the gen_dir.pl script is installed in /u/paul/bin and the Historical League directory in the Web server document tree is /usr/local/apache/htdocs/ushl. Then I might use a crontab entry like the following to update the directory every morning at 4 a.m.:

0 4 * * * /u/paul/bin/gen_dir.pl > /usr/local/apache/htdocs/ushl/directory.html

Note: The user that this cron job runs as must have permission both to execute scripts that are located in my bin directory and to write files into the document tree directory.