Perl DBI Overview

This section provides background information for DBI?the information you'll need for writing your own scripts and for understanding scripts written by others. If you're already familiar with DBI, you may want to skip directly to the section "Putting DBI to Work."

DBI Data Types

In some ways, using the Perl DBI API is similar to using the C client library described in Chapter 6. When you use the C client library, you call functions and access MySQL-related data primarily by means of pointers to structures or arrays. When you use the DBI API, you also call functions and use pointers to structures, except that functions are called methods, pointers are called references, pointer variables are called handles, and the structures that handles point to are called objects.

DBI uses several kinds of handles. These tend to be referred to in DBI documentation by the conventional names shown in Table 7.1. The way you use these handles will become clear as we go along. Several conventional names for non-handle variables are used as well (see Table 7.2). This chapter doesn't actually use every one of these variable names, but it's useful to know them when you read DBI scripts written by other people.

Table 7.1. Conventional Perl DBI Handle Variable Names
Name Meaning
$dbh A handle to a database object
$sth A handle to a statement (query) object
$fh A handle to an open file
$h A "generic" handle; the meaning depends on context

Table 7.2. Conventional Perl DBI Non-Handle Variable Names
Name Meaning
$rc The return code from operations that return true or false
$rv The return value from operations that return an integer
$rows The return value from operations that return a row count
@ary An array (list) representing a row of values returned by a query

A Simple DBI Script

Let's start with a simple script, dump_members.pl, that illustrates several standard concepts in DBI programming, such as connecting to and disconnecting from the MySQL server, issuing queries, and retrieving data. This script produces output consisting of the Historical League's member list in tab-delimited format. The format is not so interesting in itself; at this point, it's more important see how to use DBI than to produce pretty output.

dump_members.pl looks like this:

#! /usr/bin/perl -w 
# dump_members.pl - dump Historical League's membership list

use strict;
use DBI;

my $dsn = "DBI:mysql:sampdb:cobra.snake.net";   # data source name
my $user_name = "sampadm";                      # user name
my $password = "secret";                        # password
# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password,
                        { RaiseError => 1, PrintError => 0 });

# issue query
my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"
    . "street, city, state, zip, phone FROM member ORDER BY last_name");
$sth->execute ();

# read results of query, then clean up
while (my @ary = $sth->fetchrow_array ())
{
    print join ("\t", @ary), "\n";
}
$sth->finish ();

$dbh->disconnect ();
exit (0);

To try out the script for yourself, either use the copy that's included in the sampdb distribution or create it using a text editor. (If you use a word processor, be sure to save the script as plain text. Don't save it in the word processor's native format, which is likely to be some proprietary binary format.) You'll probably need to change at least some of the connection parameters, such as the hostname, database name, username, or password. (That will be true for other scripts that name the connection parameters as well.) Later, in the "Specifying Connection Parameters" section, we'll see how to get parameters from an option file instead of putting them directly in the script.

Now let's go through the script a piece at a time. The first line contains the standard where-to-find-Perl indicator:

#! /usr/bin/perl -w 

This line is part of every script we'll discuss in this chapter; I won't mention it further.

It's a good idea to include in a script at least a minimal description of its purpose, so the next line is a comment to give anyone who looks at the script a clue about what it does:

# dump_members.pl - dump Historical League's membership list 

Text from a '#' character to the end of a line is considered a comment. It's a useful practice to sprinkle comments throughout your scripts to explain how they work.

Next we have a couple of use statements:

use strict; 
use DBI;

use strict tells Perl to require you to declare variables before using them. You can write scripts without putting in a use strict line, but it's useful for catching mistakes, so I recommend you always include it. For example, if you declare a variable $my_var but then later refer to it erroneously as $mv_var, the following message will appear when you run the script in strict mode:

Global symbol "$mv_var" requires explicit package name at line n 

When you see that, you think, "Huh? I never used any variable named $mv_var!" Then you look at line n of your script, see that you misspelled $my_var as $mv_var, and fix it. Without strict mode, Perl won't squawk about $mv_var; it simply creates a new variable by that name with a value of undef (undefined) and uses it without complaint. And you're left to wonder why your script doesn't work.

use DBI tells the Perl interpreter that it needs to pull in the DBI module. Without this line, an error occurs as soon as you try to do anything DBI-related in the script. You don't have to indicate which DBD-level driver module to use, though. DBI activates the right one for you when you connect to your database.

Because we're operating in strict mode, we must declare the variables the script uses by means of the my keyword (think of it as though the script is saying "I am explicitly indicating that these are my variables"). The next section of the script sets up the variables that specify connection parameters, and then uses them to connect to the database:

my $dsn = "DBI:mysql:sampdb:cobra.snake.net";   # data source name 
my $user_name = "sampadm";                      # user name
my $password = "secret";                        # password

# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password,
                        { RaiseError => 1, PrintError => 0 });

The connect() call is invoked as DBI->connect() because it's a method of the DBI class. You don't really have to know what that means; it's just a little object-oriented jargon to make your head hurt. (If you do want to know, it means that connect() is a function that "belongs" to DBI.) connect() takes several arguments:

  • The data source. (Also known as the data source name, or DSN.) Data source formats are determined by the requirements of the particular DBD module you want to use. For the MySQL driver, allowable formats include either of the following:

    DBI:mysql:db_name 
    DBI:mysql:db_name:host_name
    

    The capitalization of DBI doesn't matter, but mysql must be lowercase. db_name represents the name of the database you want to use, and host_name indicates the host where the server is running. If you omit the hostname, it defaults to localhost. (There actually are other allowable data source formats, which we'll discuss later in the "Specifying Connection Parameters" section.)

  • The username and password for your MySQL account. This is self-explanatory.

  • An optional argument indicating additional connection attributes. If it is given, this argument controls DBI's error-handling behavior. It should be passed as a reference to a hash that specifies connection attribute names and values. The weird-looking construct we've specified creates a reference to a hash that enables the RaiseError attribute and disables PrintError. These settings cause DBI to check for database-related errors and exit with an error message if it detects one. (That's why you don't see error-checking code anywhere in the dump_ members.pl script; DBI handles it all.) The "Handling Errors" section later in this chapter covers alternate methods of responding to errors.

If the connect() call succeeds, it returns a database handle, which we assign to $dbh. By default, connect() returns undef if it fails. However, because the script enables RaiseError, DBI will exit after displaying an error message if something goes wrong in the connect() call. (This is true for other DBI methods, too; I'll describe what they return to indicate an error, but they won't return at all if RaiseError is enabled.)

After connecting to the database, dump_members.pl issues a SELECT query to retrieve the membership list, and then executes a loop to process each of the rows returned. These rows constitute the result set. To perform a SELECT, you prepare the query first, and then execute it:

# issue query 
my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"
    . "street, city, state, zip, phone FROM member ORDER BY last_name");
$sth->execute ();

prepare() is invoked using the database handle; it passes the SQL statement to the driver for preprocessing before execution. Some drivers actually do something with the statement at this point. Others just remember it until you invoke execute() to cause the statement to be performed. The return value from prepare() is a statement handle, here assigned to $sth. The statement handle is used for all further processing related to the statement.

Notice that the query string is specified with no terminating semicolon. You no doubt have developed the habit (through long hours of interaction with the mysql program) of terminating SQL statements with a ';' character. However, it's best to break yourself of that habit when using DBI because semicolons often cause queries to fail with syntax errors. The same applies to adding \g to queries?don't. Those statement terminators are conventions of mysql and are not used when issuing queries in DBI scripts. The end of the query string implicitly terminates the query and no explicit terminator is necessary.

When you invoke a method without passing it any arguments, you can leave off the parentheses. The following two calls are equivalent:

$sth->execute (); 
$sth->execute;

I prefer to include the parentheses because it makes the call look less like a variable reference to me. Your preference may be different.

After you call execute(), the rows of the membership list are available for processing. In the dump_members.pl script, the row-fetching loop simply prints the contents of each row as a tab-delimited set of values:

# read results of query, then clean up 
while (my @ary = $sth->fetchrow_array ())
{
    print join ("\t", @ary), "\n";
}
$sth->finish ();

fetchrow_array() returns an array containing the column values of the current row, or an empty array when there are no more rows. Thus, the loop fetches successive rows returned by the SELECT statement and prints each one with tabs between column values. NULL values in the database are returned as undef values to the Perl script, but these print as empty strings, not as the word NULL. undef column values also have another effect when you run the script; they result in warnings like the following from the Perl interpreter:

Use of uninitialized value in join at dump_members.pl line n. 

These warnings are triggered by the use of the -w option on the first line of the script. If you remove the option and run the script again, the warnings will go away. However, -w is useful for discovering problems (such as printing uninitialized variables!), so a better way to eliminate the warnings is to detect and deal with undef values. Some techniques for this are discussed in the "Handling Queries That Return a Result Set" section later in this chapter.

In the print statement, note that the tab and newline characters (represented as the \t and \n sequences) are enclosed in double quotes. In Perl, escape sequences are interpreted only when they occur within double quotes, not within single quotes. If single quotes had been used, the output would be full of literal instances of \t and \n.

After the row-fetching loop terminates, the call to finish() indicates that the statement handle is no longer needed and that any temporary resources allocated to it can be freed. In this script, the call to finish() is for illustrative purposes only. It's not actually necessary to invoke it here because the row-fetching call will do so automatically when it encounters the end of the result set. finish() is more useful in situations where you fetch only part of the result set and do not reach its end (for example, if for some reason you fetch only the first row). The examples from this point on will not use finish() unless it's necessary.

Having printed the membership list, we're done, so we can disconnect from the server and exit:

$dbh->disconnect (); 
exit (0);

dump_members.pl illustrates a number of concepts that are common to most DBI programs, and at this point you could probably start writing your own DBI programs without knowing anything more. For example, to write out the contents of some other table, all you'd need to do is change the text of the SELECT statement that is passed to the prepare() method. In fact, if you want to see some applications of this technique, you can skip ahead immediately to the part of the "Putting DBI to Work" section that discusses how to generate the member list for the Historical League annual banquet program and the League's printed directory. However, DBI provides many other useful capabilities. The next sections cover some of these in more detail so that you can see how to do more than run simple SELECT statements in your Perl scripts.

Handling Errors

dump_members.pl turned on the RaiseError error-handling attribute when it invoked the connect() method so that errors would automatically terminate the script with an error message rather than just returning error codes. It's possible to handle errors in other ways. For example, you can check for errors yourself rather than having DBI do it.

To see how to control DBI's error-handling behavior, let's take a closer look at the connect() call's final argument. The two relevant attributes are RaiseError and PrintError:

  • If RaiseError is enabled (set to a non-zero value), DBI calls die() to print a message and to exit if an error occurs in a DBI method.

  • If PrintError is enabled, DBI calls warn() to print a message when a DBI error occurs, but the script continues executing.

By default, RaiseError is disabled and PrintError is enabled. In this case, if the connect() call fails, DBI prints a message but continues executing. Thus, with the default error-handling behavior that you get if you omit the fourth argument to connect(), you might check for errors as follows:

my $dbh = DBI->connect ($dsn, $user_name, $password) 
    or exit (1);

If an error occurs, connect() returns undef to indicate failure, and that triggers the call to exit(). You don't have to print an error message because DBI already will have printed one.

If you were to explicitly specify the default values for the error-checking attributes, the call to connect() would look like this:

my $dbh = DBI->connect ($dsn, $user_name, $password, 
                        { RaiseError => 0, PrintError => 1 })
    or exit (1);

That's more work to write out, but it's also more obvious to the casual reader what the error-handling behavior is.

If you want to check for errors yourself and print your own messages, disable both RaiseError and PrintError:

my $dbh = DBI->connect ($dsn, $user_name, $password, 
                        { RaiseError => 0, PrintError => 0 })
    or die "Could not connect to server: $DBI::err ($DBI::errstr)\en";

The variables $DBI::err and $DBI::errstr used in the code just shown are useful for constructing error messages. They contain the MySQL error code and error string, much like the C API functions mysql_errno() and mysql_error(). If no error occurred, $DBI::err will be 0 or undef, and $DBI::errstr will be the empty string or undef. (In other words, both variables will be false.)

If you want DBI to handle errors for you so that you don't have to check for them yourself, enable RaiseError and disable PrintError:

my $dbh = DBI->connect ($dsn, $user_name, $password, 
                        { RaiseError => 1, PrintError => 0 });

This is by far the easiest approach, and it is how almost all scripts presented in this chapter are written. The reason for disabling PrintError when enabling RaiseError is to prevent the possibility of having error messages being printed twice. (If both attributes are enabled, the DBI handlers for both may be called under some circumstances.)

Enabling RaiseError may not be appropriate if you want to execute some sort of cleanup code of your own when the script exits, although in this case you may be able to do what you want by redefining the $SIG{__DIE__} signal handler. Another reason you might want to avoid enabling the RaiseError attribute is that DBI prints technical information in its messages, like this:

disconnect(DBI::db=HASH(0x197aae4)) invalidates 1 active statement. Either 
destroy statement handles or call finish on them before disconnecting.

That's useful information for a programmer, but it might not be the kind of thing you want to present to the everyday user. In that case, it can be better to check for errors yourself so that you can display messages that are more meaningful to the people you expect to be using the script. Or you might consider redefining the $SIG{__DIE__} handler here, too. That may be useful because it allows you to enable RaiseError to simplify error handling but replace the default error messages that DBI presents with your own messages. To provide your own __DIE__ handler, do something like the following before executing any DBI calls:

$SIG{__DIE__} = sub { die "Sorry, an error occurred\n"; }; 

You can also declare a subroutine in the usual fashion and set the signal handler value using a reference to the subroutine:

sub die_handler 
{
    die "Sorry, an error occurred\n";
}
$SIG{__DIE__} = \&die_handler;

As an alternative to passing error-handling attributes literally in the connect() call, you can define them using a hash and pass a reference to the hash. Some people find that breaking out the attribute settings this way makes scripts easier to read and edit, but operationally both approaches are the same. The following example shows how to use an attribute hash:

my %attr = 
(
    PrintError => 0,
    RaiseError => 0
);
my $dbh = DBI->connect ($dsn, $user_name, $password, \%attr)
    or die "Could not connect to server: $DBI::err ($DBI::errstr)\n";

The following script, dump_members2.pl, illustrates how you might write a script when you want to check for errors yourself and print your own messages. dump_members2.pl processes the same query as dump_members.pl but explicitly disables PrintError and RaiseError and then tests the result of every DBI call. When an error occurs, the script invokes the subroutine bail_out() to print a message and the contents of $DBI::err and $DBI::errstr before exiting:

#! /usr/bin/perl -w 
# dump_members2.pl - dump Historical League's membership list

use strict;
use DBI;

my $dsn = "DBI:mysql:sampdb:cobra.snake.net";   # data source name
my $user_name = "sampadm";                      # user name
my $password = "secret";                        # password
my %attr =                                      # error-handling attributes
(
    PrintError => 0,
    RaiseError => 0
);

# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password, \%attr)
    or bail_out ("Cannot connect to database");

# issue query
my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email,"
    . "street, city, state, zip, phone FROM member ORDER BY last_name")
    or bail_out ("Cannot prepare query");
$sth->execute ()
    or bail_out ("Cannot execute query");

# read results of query
while (my @ary = $sth->fetchrow_array ())
{
    print join ("\t", @ary), "\n";
}
!$DBI::err
    or bail_out ("Error during retrieval");

$dbh->disconnect ()
    or bail_out ("Cannot disconnect from database");
exit (0);

# bail_out() subroutine - print error code and string, then exit
sub bail_out
{
my $message = shift;

    die "$message\nError $DBI::err ($DBI::errstr)\n";
}

bail_out() is similar to the print_error() function we used for writing C programs in Chapter 6, except that bail_out() exits rather than returning to the caller. bail_out() saves you the trouble of writing out the values of $DBI::err and $DBI::errstr every time you want to print an error message. Also, by encapsulating error message printing into a subroutine, you can change the format of your error messages uniformly throughout your script simply by making a change to the subroutine.

The dump_members2.pl script has a test following the row-fetching loop. Because the script doesn't automatically exit if an error occurs in fetchrow_array(), it's prudent to determine whether the loop terminated because the result set was read completely (normal termination) or because an error occurred. The loop terminates either way, of course, but if an error occurs, output from the script will be truncated. Without an error check, the person running the script wouldn't have any idea that anything was wrong. If you're checking for errors yourself, be sure to test the result of your fetch loops.

Handling Queries That Return No Result Set

Statements that do not return rows, such as DELETE, INSERT, REPLACE, and UPDATE, are relatively easy to process compared to statements that do return rows, such as SELECT, DESCRIBE, EXPLAIN, and SHOW. To process a non-SELECT statement, pass it to do() by using the database handle. The do() method prepares and executes the query in one step. For example, to create a new member entry for Marcia Brown with an expiration date of June 3, 2005, you can do the following:

$rows = $dbh->do ("INSERT INTO member (last_name,first_name,expiration)" 
                    . " VALUES('Brown','Marcia','2005-6-3')");

The do() method returns a count of the number of rows affected, undef if something goes wrong, and ?1 if the number of rows is unknown for some reason. Errors can occur for various reasons. (For example, the query may be malformed or you may not have permission to access the table.) For a non-undef return, watch out for the case in which no rows are affected. When this happens, do() doesn't return the number 0; instead, it returns the string "0E0" (Perl's scientific notation form of zero). "0E0" evaluates to 0 in a numeric context but is considered true in conditional tests so that it can be distinguished easily from undef. If do() returned 0, it would be more difficult to distinguish between the occurrence of an error (undef) and the "no rows affected" case. You can check for an error using either of the following tests:

if (!defined ($rows)) 
{
    print "An error occurred\n";
}
if (!$rows)
{
    print "An error occurred\n";
}

In numeric contexts, "0E0" evaluates as 0, so the following code will correctly print the number of rows for any non-undef value of $rows:

if (!$rows) 
{
    print "An error occurred\n";
}
else
{
    $rows += 0; # force conversion to number if value is "0E0"
    print "$rows rows affected\n";
}

You could also print $rows using a %d format specifier with printf() to force an implicit conversion to a number:

if (!$rows) 
{
    print "An error occurred\n";
}
else
{
    printf "%d rows affected\n", $rows;
}

The do() method is equivalent to using prepare() followed by execute(). This means that the preceding INSERT statement could be issued as follows rather than by invoking do():

$sth = $dbh->prepare ("INSERT INTO member (last_name,first_name,expiration)" 
                    . " VALUES('Brown','Marcia','2005-6-3')");
$rows = $sth->execute ();

Handling Queries That Return a Result Set

This section provides more information on several options that you have for executing row-fetching loops for SELECT queries (or for other SELECT-like queries that return rows, such as DESCRIBE, EXPLAIN, and SHOW). It also discusses how to get a count of the number of rows in a result, how to handle result sets for which no loop is necessary, and how to retrieve an entire result set all at once.

Writing Row-Fetching Loops

The dump_members.pl script retrieved data using a standard sequence of DBI methods: prepare() lets the driver preprocess the query, execute() begins executing the query, and fetchrow_array() fetches each row of the result set.

prepare() and execute() are fairly standard parts of processing any query that returns rows. However, for fetching the rows, fetchrow_array() is actually only one choice from among several available methods (see Table 7.3).

Table 7.3. DBI Row-Fetching Methods
Method Name Return Value
fetchrow_array() Array of row values
fetchrow_arrayref() Reference to array of row values
fetch() Same as fetchrow_arrayref()
fetchrow_hashref() Reference to hash of row values, keyed by column name

The following examples show how to use each row-fetching method. The examples loop through the rows of a result set, and for each row, print the column values separated by commas. There are more efficient ways to write the code in some cases, but the examples are written the way they are for illustrative purposes (to show the syntax for accessing individual column values), not for efficiency.

fetchrow_array() is used as follows:

while (my @ary = $sth->fetchrow_array ()) 
{
    my $delim = "";
    for (my $i = 0; $i < @ary; $i++)
    {
        $ary[$i] = "" if !defined ($ary[$i]);   # NULL value?
        print $delim . $ary[$i];
        $delim = ",";
    }
    print "\n";
}

Each call to fetchrow_array() returns an array of row values, or an empty array when there are no more rows. The inner loop tests each column value to see if it's defined and sets it to the empty string if not. This converts NULL values (which are represented by DBI as undef) to empty strings. It may seem that this is an entirely superfluous action; after all, Perl prints nothing for both undef and the empty string. The reason for the test is that if the script is run with the -w option, Perl will issue a Use of uninitialized value warning message if you attempt to print an undef value. Converting undef to the empty string eliminates the warnings. You'll see a similar construct used elsewhere throughout this chapter.

If you prefer to print another value for undef values, such as the string "NULL", just change the test a little bit:

while (my @ary = $sth->fetchrow_array ()) 
{
    my $delim = "";
    for (my $i = 0; $i < @ary; $i++)
    {
        $ary[$i] = "NULL" if !defined ($ary[$i]);   # NULL value?
        print $delim . $ary[$i];
        $delim = ",";
    }
    print "\n";
}

When working with an array of values, you can simplify the code a bit by using map to convert all the undef array elements at once:

while (my @ary = $sth->fetchrow_array ()) 
{
    @ary = map { defined ($_) ? $_ : "NULL" } @ary;
    print join (",", @ary) . "\n";
}

map processes each element of the array using the expression within the braces and returns an array of the resulting values.

An alternative to assigning the return value of fetchrow_array() to an array variable is to fetch column values into a set of scalar variables. This allows you to work with variable names that are more meaningful than $ary[0], $ary[1], and so forth. Suppose you want to retrieve member name and email values into variables. Using fetchrow_array(), you could select and fetch rows like this:

my $sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email" 
                        . " FROM member ORDER BY last_name");
$sth->execute ();
while (my ($last_name, $first_name, $suffix, $email) = $sth->fetchrow_array ())
{
    # do something with variables
}

When you use a list of variables this way, you must make sure that the order of the columns selected by the query matches the order of the variables into which you fetch the values. DBI has no idea of the order in which columns are named by your SELECT statement, so it's up to you to assign variables correctly. You can also cause column values to be assigned to individual variables automatically when you fetch a row, using a technique known as parameter binding. This is discussed further in the "Placeholders and Parameter Binding" section later in this chapter.

If you fetch a single value into a variable, be careful how you write the assignment. If you write the beginning of your loop as follows, it will work correctly:

while (my ($val) = $sth->fetchrow->array ()) ... 

The value is fetched in list context, so the test will fail only when there are no more rows. But if instead you write the test like this, it will fail in mysterious ways:

while (my $val = $sth->fetchrow->array ()) ... 

The difference here is that the value is fetched in scalar context, so if $val happens to be zero, undef, or the empty string, the test will evaluate as false and terminate the loop, even though you have not yet reached the end of the result set.

The second row-fetching method, fetchrow_arrayref(), is similar to fetchrow_array(), but instead of returning an array containing the column values for the current row, it returns a reference to the array, or undef when there are no more rows. Use it as follows:

while (my $ary_ref = $sth->fetchrow_arrayref ()) 
{
    my $delim = "";
    for (my $i = 0; $i < @{$ary_ref}; $i++)
    {
        $ary_ref->[$i] = "" if !defined ($ary_ref->[$i]);   # NULL value?
        print $delim . $ary_ref->[$i];
        $delim = ",";
    }
    print "\n";
}

You access array elements through the array reference $ary_ref. This is something like dereferencing a pointer, so you use $ary_ref->[$i] rather than $ary[$i]. To convert the reference to an array, use the @{$ary_ref} construct.

fetchrow_arrayref() is unsuitable for fetching variables into a list. For example, the following loop does not work:

while (my ($var1, $var2, $var3, $var4) = @{$sth->fetchrow_arrayref ()}) 
{
    # do something with variables
}

As long as fetchrow_arrayref() actually fetches a row, the loop functions properly. But when there are no more rows, fetchrow_arrayref() returns undef, and @{undef} isn't legal. (It's like trying to de-reference a NULL pointer in a C program.)

The third row-fetching method, fetchrow_hashref(), is used like this:

while (my $hash_ref = $sth->fetchrow_hashref ()) 
{
    my $delim = "";
    foreach my $key (keys (%{$hash_ref}))
    {
        $hash_ref->{$key} = "" if !defined ($hash_ref->{$key}); # NULL value?
        print $delim . $hash_ref->{$key};
        $delim = ",";
    }
    print "\n";
}

Each call to fetchrow_hashref() returns a reference to a hash of row values keyed on column names or undef when there are no more rows. In this case, column values don't come out in any particular order because members of Perl hashes are unordered. However, DBI keys the hash elements using the column names, so $hash_ref gives you a single variable through which you can access any column value by name. This means you can pull out the values (or any subset of them) in any order you want, and you don't have to know the order in which the columns were retrieved by the SELECT query. For example, to access the name and email fields, you can do the following:

while (my $hash_ref = $sth->fetchrow_hashref ()) 
{
    my $delim = "";
    foreach my $key ("last_name", "first_name", "suffix", "email")
    {
        $hash_ref->{$key} = "" if !defined ($hash_ref->{$key}); # NULL value?
        print $delim . $hash_ref->{$key};
        $delim = ",";
    }
    print "\n";
}

fetchrow_hashref() is especially useful when you want to pass a row of values to a function without requiring the function to know the order in which columns are named in the SELECT statement. In this case, you would call fetchrow_hashref() to retrieve rows and write a function that accesses values from the row hash using column names.

Keep in mind the following caveats when you use fetchrow_hashref():

  • If you need every bit of performance, fetchrow_hashref() is not the best choice because it's less efficient than fetchrow_array() or fetchrow_arrayref().

  • By default, the column names are used as key values with the same lettercase as the column names written in the SELECT statement. In MySQL, column names are not case sensitive, so the query will work the same way no matter what lettercase you use to write column names. But Perl hash key names are case sensitive, which may cause you problems. To avoid potential lettercase mismatch problems, you can tell fetchrow_hashref() to force column names into a particular lettercase by passing it a NAME_lc or NAME_uc attribute:

    $hash_ref = $sth->fetchrow_hashref ("NAME_lc"); # use lowercase names 
    $hash_ref = $sth->fetchrow_hashref ("NAME_uc"); # use uppercase names
    
  • The hash contains one element per unique column name. If you're performing a join that returns columns from multiple tables with overlapping names, you won't be able to access all the column values. For example, if you issue the following query, fetchrow_hashref() will return a hash having only one element, name:

    SELECT a.name, b.name FROM a, b WHERE a.name = b.name 
    

    To avoid this problem, use aliases to make sure each column has a distinct name. For example, if you rewrite the query as follows, fetchrow_hashref() will return a reference to a hash with two elements, name and name2:

    SELECT a.name, b.name AS name2 FROM a, b WHERE a.name = b.name 
    
Determining the Number of Rows Returned by a Query

How can you tell the number of rows returned by a SELECT or SELECT-like query? One way is to count the rows as you fetch them. In fact, this is the only portable way in DBI to know how many rows a SELECT query returned. Using the MySQL driver, you can call the rows() method using the statement handle after invoking execute(). But this is not portable to other database engines, and the DBI documentation explicitly discourages using rows() for SELECT statements. And even for MySQL, rows() doesn't return the correct result until you've fetched all the rows if you've set the mysql_use_result attribute. So you may as well just count the rows as you fetch them. (See Appendix G for more information about mysql_use_result.)

Fetching Single-Row Results

It's not necessary to run a loop to get your results if the result set consists of a single row. Suppose you want to write a script, count_members.pl, that tells you the current number of Historical League members. The code to perform the query looks like this:

# issue query 
my $sth = $dbh->prepare ("SELECT COUNT(*) FROM member");
$sth->execute ();

# read results of query
my $count = $sth->fetchrow_array ();
$sth->finish ();
$count = "can't tell" if !defined ($count);
print "$count\n";

The SELECT statement will return only one row, so no loop is required; we call fetchrow_array() just once. In addition, because we're selecting only one column, it's not even necessary to assign the return value to an array. When fetchrow_array() is called in a scalar context (where a single value rather than a list is expected), it returns one column of the row or undef if no row is available. DBI does not define which element of the row fetchrow_array() returns in scalar context, but that's all right for the query just shown. It retrieves only a single value, so there is no ambiguity about what value is returned.

This code invokes finish() to free the result set, even though the set consists of just one row. (fetchrow_array() frees a result set implicitly when it reaches the end of the set, but that would happen here only if you called it a second time.)

Another type of query for which you expect at most a single record is one that contains LIMIT 1 to restrict the number of rows returned. A common use for this is to return the row that contains the maximum or minimum value for a particular column. For example, the following query prints the name and birth date of the president who was born most recently:

my $query = "SELECT last_name, first_name, birth" 
        . " FROM president ORDER BY birth DESC LIMIT 1";
my $sth = $dbh->prepare ($query);
$sth->execute ();

# read results of query
my ($last_name, $first_name, $birth) = $sth->fetchrow_array ();
$sth->finish ();
if (!defined ($last_name))
{
    print "Query returned no result\n";
}
else
{
    print "Most recently born president: $first_name $last_name ($birth)\n";
}

Other types of queries for which no fetch loop is necessary are those that use MAX() or MIN() to select a single value. But in all these cases, an even easier way to get a single-row result is to use the database handle method selectrow_array(), which combines prepare(), execute(), and row fetching into a single call. It returns an array (not a reference) or an empty array if the query returned no row or an error occurred. The previous example can be rewritten as follows using selectrow_array():

my $query = "SELECT last_name, first_name, birth" 
        . " FROM president ORDER BY birth DESC LIMIT 1";
my ($last_name, $first_name, $birth) = $dbh->selectrow_array ($query);
if (!defined ($last_name))
{
    print "Query returned no result\n";
}
else
{
    print "Most recently born president: $first_name $last_name ($birth)\n";
}
Working with Complete Result Sets

When you use a fetch loop, DBI doesn't provide any way to process the rows in any order other than that in which they are returned by the loop. Also, after you fetch a row, the previous row is lost unless you take steps to maintain it in memory. These behaviors aren't always suitable. For example, they're undesirable if you need to make multiple passes through the rows to perform a statistical calculation. (You might go through the result set once to assess some general numeric characteristics of your data, and then step through the rows again performing a more specific analysis.)

It's possible to access your result set as a whole in a couple different ways. You can perform the usual fetch loop and save each row as you fetch it, or you can use a method that returns an entire result set all at once. Either way, you end up with a matrix containing one row per row in the result set and as many columns as you selected. You can process elements of the matrix in any order you want, as many times as you want. The following discussion describes both approaches.

One way to use a fetch loop to capture the result set is to use fetchrow_array() and save an array of references to the rows. The following code does the same thing as the fetch-and-print loop in dump_members.pl except that it saves all the rows, and then prints the matrix. It illustrates how to determine the number of rows and columns in the matrix and how to access individual members of the matrix:

my @matrix = (); # array of array references 

while (my @ary = $sth->fetchrow_array ())   # fetch each row
{
    push (@matrix, [ @ary ]); # save reference to just-fetched row
}

# determine dimensions of matrix
my $rows = scalar (@matrix);
my $cols = ($rows == 0 ? 0 : scalar (@{$matrix[0]}));

for (my $i = 0; $i < $rows; $i++)           # print each row
{
    my $delim = "";
    for (my $j = 0; $j < $cols; $j++)
    {
        $matrix[$i][$j] = "" if !defined ($matrix[$i][$j]); # NULL value?
        print $delim . $matrix[$i][$j];
        $delim = ",";
    }
    print "\n";
}

When determining the dimensions of the matrix, the number of rows must be determined first because calculation of the number of columns is contingent on whether or not the matrix is empty. If $rows is 0, the matrix is empty and $cols becomes 0 as well. Otherwise, the number of columns can be calculated as the number of elements in the first row by using the syntax @{$matrix[0]} to access the row as a whole.

The preceding example fetches each row as an array and then saves a reference to it. You might suppose that it would be more efficient to call fetchrow_arrayref() instead to retrieve row references directly:

my @matrix = (); # array of array references 

while (my $ary_ref = $sth->fetchrow_arrayref ())
{
    push (@matrix, $ary_ref); # save reference to just-fetched row
}

That doesn't work because fetchrow_arrayref() reuses the array to which the reference points. The resulting matrix is an array of references, each of which points to the same row?the final row retrieved. Therefore, if you want to construct a matrix by fetching a row at a time, use fetchrow_array() rather than fetchrow_arrayref().

As an alternative to using a fetch loop, you can use one of the DBI methods that return the entire result set. For example, fetchall_arrayref() returns a reference to an array of references, each of which points to the contents of one row of the result set. That's a mouthful, but in effect, the return value is a reference to a matrix. To use fetchall_arrayref(), call prepare() and execute(), and then retrieve the result as follows:

# fetch all rows into a reference to an array of references 
my $matrix_ref = $sth->fetchall_arrayref ();

You can determine the dimensions of the array and access its elements as follows:

# determine dimensions of matrix 
my $rows = (!defined ($matrix_ref) ? 0 : scalar (@{$matrix_ref}));
my $cols = ($rows == 0 ? 0 : scalar (@{$matrix_ref->[0]}));

for (my $i = 0; $i < $rows; $i++)           # print each row
{
    my $delim = "";
    for (my $j = 0; $j < $cols; $j++)
    {
        $matrix_ref->[$i][$j] = "" if !defined ($matrix_ref->[$i][$j]); # NULL?
        print $delim . $matrix_ref->[$i][$j];
        $delim = ",";
    }
    print "\n";
}

fetchall_arrayref() returns a reference to an empty array if the result set is empty. The result is undef if an error occurs, so if you don't have RaiseError enabled, be sure to check the return value before you start using it.

The number of rows and columns are determined by whether the matrix is empty. If you want to access an entire row $i of the matrix as an array, use the syntax @{$matrix_ref->[$i]}.

It's certainly simpler to use fetchall_arrayref() to retrieve a result set than to write a row-fetching loop, although the syntax for accessing array elements becomes a little trickier. A method that's similar to fetchall_arrayref() but that does even more work is selectall_arrayref(). This method performs the entire prepare(), execute(), fetch loop sequence for you. To use selectall_arrayref(), pass your query directly to it using the database handle:

# fetch all rows into a reference to an array of references 
my $matrix_ref = $dbh->selectall_arrayref ($query);

# determine dimensions of matrix
my $rows = (!defined ($matrix_ref) ? 0 : scalar (@{$matrix_ref}));
my $cols = ($rows == 0 ? 0 : scalar (@{$matrix_ref->[0]}));

for (my $i = 0; $i < $rows; $i++)           # print each row
{
    my $delim = "";
    for (my $j = 0; $j < $cols; $j++)
    {
        $matrix_ref->[$i][$j] = "" if !defined ($matrix_ref->[$i][$j]); # NULL?
        print $delim . $matrix_ref->[$i][$j];
        $delim = ",";
    }
    print "\n";
}
Checking for NULL Values

When you retrieve information from a database, you may need to distinguish between column values that are NULL and those that are zero or empty strings. This is easy to do because DBI returns NULL column values as undef. However, you must be sure to use the correct test. If you try the following code fragment, it prints "false!" all three times:

$col_val = undef; if (!$col_val) { print "false!\n"; } 
$col_val = 0;     if (!$col_val) { print "false!\n"; }
$col_val = "";    if (!$col_val) { print "false!\n"; }

What that demonstrates is that the form of the test is unable to distinguish between undef, 0, and the empty string. The next fragment prints "false!" for both tests, indicating that the test cannot distinguish undef from the empty string:

$col_val = undef; if ($col_val eq "") { print "false!\n"; } 
$col_val = "";    if ($col_val eq "") { print "false!\n"; }

This fragment prints the same output, showing that the second test fails to distinguish 0 from the empty string:

$col_val = ""; 
if ($col_val eq "") { print "false!\n"; }
if ($col_val == 0)  { print "false!\n"; }

To distinguish between undef (NULL) column values and non-undef values, use defined(). After you know a value doesn't represent NULL, you can distinguish between other types of values using appropriate tests?for example:

if (!defined ($col_val)) { print "NULL\n"; } 
elsif ($col_val eq "")   { print "empty string\n"; }
elsif ($col_val == 0)    { print "zero\n"; }
else                     { print "other\n"; }

It's important to perform the tests in the proper order because both the second and third comparisons are true if $col_val is an empty string. If you reverse the order of those comparisons, you'll incorrectly interpret empty strings as zero.

Quoting Issues

Thus far, we have constructed queries in the most basic way possible by using simple quoted strings. That causes a problem at the Perl lexical level when your quoted strings contain quoted values. You can also have problems at the SQL level when you want to insert or select values that contain quotes, backslashes, or binary data. If you specify a query as a Perl quoted string, you must escape any occurrences of the quoting character that occur within the query string itself:

$query = 'INSERT INTO absence VALUES(14,\'2002-9-16\')'; 
$query = "INSERT INTO absence VALUES(14,\"2002-9-16\")";

Both Perl and MySQL allow you to quote strings using either single or double quotes, so you can sometimes avoid escaping by mixing quote characters:

$query = 'INSERT INTO absence VALUES(14,"2002-9-16")'; 
$query = "INSERT INTO absence VALUES(14,'2002-9-16')";

However, the two types of quotes are not equivalent in Perl. Variable references are interpreted only within double quotes. Therefore, single quotes are not very useful when you want to construct queries by embedding variable references in the query string. For example, if the value of $var is 14, the following two strings are not equivalent:

"SELECT * FROM member WHERE id = $var" 
'SELECT * FROM member WHERE id = $var'

The strings are interpreted as follows; clearly, the first string is more like something you'd want to pass to the MySQL server:

"SELECT * FROM member WHERE id = 14" 
'SELECT * FROM member WHERE id = $var'

An alternative to quoting strings with double quotes is to use the qq{} construct, which tells Perl to treat everything between qq{ and } as a double-quoted string. (Think of double-q as meaning "double-quote.") For example, the following two lines are equivalent:

$date = "2002-9-16"; 
$date = qq{2002-9-16};

You can construct queries without thinking so much about quoting issues when you use qq{} because you can use quotes (single or double) freely within the query string without having to escape them. In addition, variable references are interpreted. Both properties of qq{} are illustrated by the following INSERT statement:

$id = 14; 
$date = "2002-9-16";
$query = qq{INSERT INTO absence VALUES($id,'$date')};

You don't have to use '{' and '}' as the qq delimiters. Other forms, such as qq() and qq//, will work, too, as long as the closing delimiter doesn't occur within the string. I prefer qq{} because the '}' character is less likely than ')' or '/' to occur within the text of the query and be mistaken for the end of the query string. For example, ')' occurs within the INSERT statement just shown, so qq() would not be a useful construct for quoting the query string.

The qq{} construct can cross line boundaries, which is useful if you want to make the query string stand out from the surrounding Perl code:

$id = 14; 
$date = "2002-9-16";
$query = qq{
    INSERT INTO absence VALUES($id,'$date')
};

This is also useful if you simply want to format your query on multiple lines to make it more readable. For example, the SELECT statement in the dump_members.pl script looks like this:

$sth = $dbh->prepare ("SELECT last_name, first_name, suffix, email," 
    . "street, city, state, zip, phone FROM member ORDER BY last_name");

With qq{}, it could be written as follows instead:

$sth = $dbh->prepare (qq{ 
            SELECT
                last_name, first_name, suffix, email,
                street, city, state, zip, phone
            FROM member
            ORDER BY last_name
        });

It's true that double-quoted strings can cross line boundaries, too. But I find that qq{ and } stand out better than two lone '"' characters and make the statement easier to read. This book uses both forms; see which you prefer.

The qq{} construct takes care of quoting issues at the Perl lexical level so that you can get quotes into a string easily without having Perl complain about them. However, you must also think about SQL-level syntax. Consider the following attempt to insert a record into the member table:

$last = "O'Malley"; 
$first = "Brian";
$expiration = "2005-9-1";
$rows = $dbh->do (qq{
    INSERT INTO member (last_name,first_name,expiration)
    VALUES('$last','$first','$expiration')
});

The resulting string that do() sends to MySQL looks like this:

INSERT INTO member (last_name,first_name,expiration) 
VALUES('O'Malley','Brian','2005-9-1')

That is not legal SQL because a single quote occurs within a single-quoted string. We encountered a similar quoting problem in Chapter 6. There we dealt with the issue by using mysql_real_escape_string(). DBI provides a similar mechanism?for each quoted value you want to use literally in a statement, call the quote() method and use its return value instead. The preceding example is more properly written as follows:

$last = $dbh->quote ("O'Malley"); 
$first = $dbh->quote ("Brian");
$expiration = $dbh->quote ("2005-9-1");
$rows = $dbh->do (qq{
    INSERT INTO member (last_name,first_name,expiration)
    VALUES($last,$first,$expiration)
});

Now the string that do() sends to MySQL looks like the following, with the quote that occurs within the quoted string properly escaped:

INSERT INTO member (last_name,first_name,expiration) 
VALUES('O\'Malley','Brian','2005-9-1')

Note that when you refer to $last and $first in the query string, you do not add any surrounding quotes; the quote() method supplies them for you. If you add quotes yourself, your query will have too many of them, as shown by the following example:

$value = "paul"; 
$quoted_value = $dbh->quote ($value);

print "... WHERE name = $quoted_value\n";
print "... WHERE name = '$quoted_value'\n";

These statements produce the following output:

... WHERE name = 'paul' 
... WHERE name = ''paul''

In the second case, the string contains too many quotes.

Placeholders and Parameter Binding

In the preceding sections, we've constructed queries by putting values to be inserted or used as selection criteria directly into the query string. It's not necessary to do this. DBI allows you to place special markers called placeholders into a query string, and then supply the values to be used in place of those markers when the query is executed. One reason for doing this is that you get the character-quoting benefits of the quote() method without having to invoke quote() explicitly. Another reason is for improved performance when you're executing a query over and over within a loop.

As an illustration of how placeholders work, suppose you're beginning a new semester at school and you want to clear out the student table for your grade book and then initialize it with the new students by using a list of student names contained in a file. Without placeholders, you can delete the existing table contents and load the new names as follows:

$dbh->do (qq{ DELETE FROM student } );  # delete existing rows 
while (<>)                              # add new rows
{
    chomp;
    $_ = $dbh->quote ($_);
    $dbh->do (qq{ INSERT INTO student SET name = $_ });
}

This approach requires that you handle special characters in the data values yourself by calling quote(). It's also inefficient, because the basic form of the INSERT query is the same each time, and do() calls prepare() and execute() each time through the loop. It's more efficient to call prepare() a single time to set up the INSERT statement before entering the loop and invoke only execute() within the loop. That avoids all invocations of prepare() but one. DBI allows us to do this as follows:

$dbh->do (qq{ DELETE FROM student } );  # delete existing rows 
my $sth = $dbh->prepare (qq{ INSERT INTO student SET name = ? });
while (<>)                              # add new rows
{
    chomp;
    $sth->execute ($_);
}

In general, if you find yourself calling do() inside a loop, it's better to invoke prepare() prior to the loop and execute() inside it. Note the '?' character in the INSERT query. That's the placeholder. When execute() is invoked, you pass the value to be substituted for the placeholder when the query is sent to the server. DBI will automatically quote special characters in the value, so there is no need to call quote().

Some things to note about placeholders:

  • Do not enclose the placeholder character in quotes within the query string. If you do, it will not be recognized as a placeholder.

  • Do not use the quote() method to specify placeholder values, or you will get extra quotes in the values you're inserting.

  • You can have more than one placeholder in a query string, but be sure to pass as many values to execute() as there are placeholder markers.

  • Each placeholder must specify a single