8.2 Issues in Writing Data to Databases

In this section, we discuss issues that emerge in web database applications when multiple users access an application. Typically, a few users are inserting, updating, or deleting data, while most are running queries. This environment requires careful code design: without it, data can unexpectedly or unreliably change. This may lead to database inconsistencies and confused users.

Some of the problems we describe in this section can be solved with restrictive system requirements, knowledge of how the DBMS behaves, and careful script development. Others solutions require an understanding of database theory. We discuss both types of solution in the next section.

8.2.1 Transactions and Concurrency

Problems can occur when users read and write to a web database at the same time, that is, concurrently. The management of groups of SQL statements that read and write, or transactions , is one important area of the theory and practice of relational databases. Here are four of the more common problems of concurrent read and write transactions:

Lost update problem

User A reads from the database, recording a value. User B reads the same value, then updates the value immediately. User A then updates the value, overwriting the update written by User B.

Consider an example. Imagine that a winestore manager wants to order one dozen more bottles of a popular wine, but only if there are less than two dozen bottles currently in stock. The manager runs a query to sum the total stock for that wine from the inventory. The result is that there are fifteen bottles left, so the manager decides to place an order. However, he heads off to fill his coffee cup first, leaving the system displaying the query result.

A second stock manager arrives at her desk with the same intention: to order more of this popular wine if there are less than two dozen bottles. The result of the query is the same: fifteen bottles. The second manager orders a dozen bottles, and updates the inventory to 27, knowing the bottles will arrive in the afternoon. The problem occurs when the first manager returns: he doesn't rerun the query and he too orders 12 bottles and updates the inventory to 27. Now the system has record of 27 bottles, but 24 will arrive in the afternoon to take the actual stock total to 39!

Dirty read problem

User A reads a value from the database, changes the value, and writes it back to the database. User B then reads the value, changes the value, and writes it back to the database. User A then decides not to confirm the changes for some reason and undoes the changes he made. The problem is that User B has read and used the changed value, resulting in a dirty read problem.

Consider an example. A manager decides to add a 3% surcharge to a particular wine inventory, so she reads and updates the cost of that wine. Another manager decides to apply a 10% discount to all wines made by a particular winery, which happens to include the wine just surcharged. After all this, the first manager realizes she has made a mistake: the wrong wine was updated! Unfortunately, the second manager has already used this incorrect value as input into his update, and the change can't be undone correctly.

Incorrect summary problem

User A updates values while User B reads and summarizes the same values. Values summarized may be read before or after each individual update, resulting in unpredictable results.

Consider an example in the online winestore where a manager wants to produce a management stock report. The report details wine sales, winery sales, wine region sales, and total sales. The reporting process has four steps: first, the sales of each wine are tallied; second, the total sales of wines for each winery are tallied; third, the total sales of wines for each region are tallied; and, last, the overall total sales of wines is determined. The report uses four queries and takes a few minutes to run.

Now, imagine that during this process, a customer purchases a bottle of Paradise Pinot Noir wine from the Paradise Enough winery. Specifically, let's imagine this happens after the total sales of the Paradise Pinot Noir wine are tallied but before the Paradise Enough winery sales are tallied. The result is that the tally of the Pinot Noir's sales doesn't include this purchase, but the tally of Paradise Enough winery sales does. The result is an inconsistency: adding together all of the wine sales won't give the same value that's reported for the winery.

Unrepeatable read problem

A value is read in by User A, updated by User B, and subsequently reread by User A for verification. Despite not modifying the value, User A encounters two different values, that is, the read operation is unrepeatable.

Consider an example. Imagine a user of an online winestore wants to buy the last bottle of an expensive, rare wine that's in stock. He browses the database and finds the wine. There is only bottle left, and he quickly adds this to his shopping cart; in our implementation, this creates new rows in two tables in the database. Now, he decides to finalize the purchase and is presented with a summary of the shopping cart.

However, while the user fumbles about finding his password to log in, another user enters the system. She quickly locates the same wine, sees that there is only one bottle left, adds it to her shopping cart, logs in to the system, and purchases the wine. When our first user finally logs in to finalize the order, all the details look fine, but the wine has actually been sold. Our database operation to deduct from the inventory reports an error because the stock value is already zero (the value has changed during the transaction), and we end up reporting the error to our original (now very unhappy and confused!) user.

Fortunately, most of these problems can be solved through locking or careful design of scripts that carry out database transactions. However, you might choose not to solve some problems because they restrict the system requirements or add unnecessary complexity. We discuss locking in the next section.

8.2.2 Locking to Achieve Concurrency in MySQL

It has been shown that a simple scheme called locking (actually, it's correctly known as two-phase locking ) solves the four transaction problems identified in the last section.

8.2.2.1 When and how to lock tables

Locking is needed only when multiple steps must be performed together, and when two or more operations can be going on at the same time. If scripts are being implemented that write to the database but aren't multi-step operations susceptible to the problems described in the previous section, locks aren't needed.

Specifically, the following situations do not require a lock:

Simple queries that insert rows, delete rows, or update rows, and that don't use results of a previous SELECT or data entered by the user as input. For example, updating a customer's details, adding a new phonebook entry, or unconditionally deleting a row do not require a lock.
Single-user applications or applications where only one user can alter the data do not require locks regardless of what queries are used.

The following situations do require locks:

Multi-user applications require locks, but only if either of the next two points are true.
A script first reads a value from a database and later writes that value to the database. For example, to create a row without using MySQL proprietary features, you first need to find the highest value used for the primary key using a SELECT and then INSERT a new row with the next available key value.
A script first writes a value to a database and later reads that value from the database. For example, to update and display an inventory, you might first add an extra quantity with an UPDATE and then read it back with a SELECT to check the total and show it to the user.

Locking may not be required for all parts of a web database application: parts of the application can still be safely used without violating any locking conditions.

With its default settings, each MySQL table has two associated lock variables. If a user sets or holds a lock variable for a particular table, no other user can perform particular actions on that table. There are two kinds of locks for each table: read locks, when a user is only reading from a table, and write locks, when a user is both reading and writing to a table.

Having locks in a DBMS leads to four rules of use:

If a user wants to write to a table, and she is performing a transaction susceptible to a concurrency problem, she must obtain a write lock on that table.
If a user only wants to read from a table, and she is performing a transaction susceptible to a concurrency problem, she must obtain a read lock on that table.
If a user requires a lock, she must lock all tables used in the transaction in a single LOCK statement.
A user must release all locks when a database transaction is complete using the UNLOCK statement.

When a user holds a write lock on a table, no other users can read or write to that table. When a user holds a read lock on a table, other users can also read or hold a read lock, but no user can hold a write lock on that table, or write to that table.

SELECT, UPDATE, INSERT, or DELETE operations that don't use LOCK TABLES are held up if locks are held in other transactions that would logically prevent their operation. For example, if a user holds a write lock on a table, no other user can issue a SELECT, UPDATE, INSERT, DELETE, or LOCK operation on that table.

The following segment of an interaction with the MySQL command interpreter illustrates the use of locks in a summarization task that requires locking:

mysql> LOCK TABLES items READ, temp_report WRITE;



mysql> SELECT sum(price) FROM items WHERE cust_id=1;

+------------+

| sum(price) |

+------------+

|     438.65 |

+------------+

1 row in set (0.04 sec)



mysql> UPDATE temp_report SET purchases=438.65

       WHERE cust_id=1;

mysql> UNLOCK TABLES;

In this example, a temporary table called temp_report is updated with the result of a SELECT operation on an items table. If locks aren't used, the items table can be modified by another user, possibly altering the summary value of $438.65 used as input to the UPDATE operation. There are two locks obtained for this transaction: first, a read lock on items because we don't need to change items but we don't want another user to make a change to it; and, second, a write lock on temp_report because we want to change the table, and we don't want other users to read or write to the report while we make changes. The UNLOCK TABLES operation releases all locks held; locks can't be progressively released.

MySQL doesn't permit us to lock only one of the two tables used in the transaction above. The following rules apply to locks:

If a lock is held, all other tables that are to be used must also be locked. Failing to do so results in a MySQL error.
If aliases are used in queries, the alias must be locked. For example, in the following query:
```
SELECT * from customer c where c.custid=1
```
the alias must be locked with one of:
```
LOCK TABLES customer c READ
```
or:
```
LOCK TABLES customer c WRITE
```
If different aliases for the same table are used, each different alias must be locked. Aliases are discussed in Chapter 15.

In many cases, locking can be avoided through careful query design:

Use MySQL's auto_increment feature to create new primary key values. Alternatively, use PEAR DB's DB::nextId( ) method that we discuss later in this chapter.
Use mysql_insert_id( ) (as opposed to using the max( ) function in a SELECT query) to find the value of a newly-created primary key. Again, PEAR DB's DB::nextId( ) method can be alternatively used.
Use advanced features of SQL to combine two queries into one; these features are discussed in Chapter 15. For example, you can use a single nested query to discover the total value in our previous example and then use that to create a new row in the temporary report.
Perform updates that are relative. For example, UPDATE customer SET discount = discount*1.1.

8.2.2.2 The LOCK TABLES and UNLOCK TABLES statements in MySQL

The LOCK TABLES statement is used to lock the listed tables in either READ or WRITE mode. As discussed earlier, all tables that are accessed in the transaction must be locked in either READ or WRITE mode, and must be listed in a single LOCK TABLES statement.

A script that issues a LOCK TABLES statement is suspended until all locks listed are successfully obtained. There is no time limit in waiting for locks. If the lock is held by another user or an operation is running on the table already, a request is placed at the back of either the write- or read-lock queue for the table, depending on the lock required. The write-lock queue has priority over the read-lock queue, so a user who wants a write lock obtains it when it becomes available, regardless of how long another user has been waiting in the read-lock queue.

MySQL gives priority to database modifications over read queries. This can lead to a problem called starvation , where a transaction never completes because it can't obtain its required read locks. However, most web database applications read from databases much more than they write, and locks are required in only a few situations, so starvation is very uncommon in practice.

If low-priority writing is essential to an application, a LOW_PRIORITY option can be prefixed before the WRITE clause. If a transaction is queued for a LOW_PRIORITY WRITE , it receives the lock only when the read lock queue is empty and no other users are reading from the table. Again, consideration of possible starvation is important.

Locks can't be progressively obtained through several LOCK TABLES statements. Indeed, issuing a second LOCK TABLES is the same as issuing an UNLOCK TABLES to release all locks and then issuing the second LOCK TABLES. There are good reasons for this strict rule, related to a locking problem called deadlock, which we don't discuss here. However, MySQL is deadlock-free because it enforces the risk-free use of the LOCK TABLES and UNLOCK TABLES statements.

If an unlocked table needs to be accessed or locking must be avoided for a particular table, a second server connection can be opened and used.

MySQL has a feature called INSERT DELAYED for insertion that is described in the MySQL manual.

Don't use locking with INSERT DELAYED for insert operations. The INSERT DELAYED process is carried out by the MySQL server at a later time and the locks held by the user can't be used by the server.

8.2.2.3 Locking for performance

Locking is primarily designed to ensure that concurrent transactions can execute safely. However, locking is also a useful performance tool to optimize the performance of important transactions.

Consider, for example, a situation where we urgently require a complex report that uses a slow query. With other users running queries and using system resources, this query may run even slower. A solution is to use LOCK TABLES with the WRITE option to stop other users running queries or database updates, and to have exclusive access to the database for the query duration. This permits better optimization of the query processing by the server, dedication of all the system resources to the query, and faster disk access.

The downside of locking for performance is the reduction in concurrent access to the database. Users may be inconvenienced by slow responses or timeouts from the web database application. Locking for performance should be used sparingly.

8.2.3 Locking Tables in Web Database Applications

Example 8-13 shows a PHP script that requires locking to ensure that the value returned from the SELECT query can't change before the INSERT operation. The script adds a row to the phonebook table and does exactly same thing as Example 8-9. However, it doesn't use the MySQL proprietary auto_increment modifier and so it needs to read the maximum primary key value that's in use and then write a new row based on that value.

Without the auto_increment modifier and with no locking, it's possible that two rows could be created with the same phonebook_id. This can happen if two or more users run the script at the same time and get the same result from the SELECT query. Both users would then attempt to INSERT a new row with the same primary key value and, if this happens, MySQL will report an error because the primary key value must be unique. Locking solves the problem because it stops users running the queries in the script at the same time.

Example 8-13. Creating a phonebook entry using locking

<?php

require 'db.inc';

require_once "HTML/Template/ITX.php";



function formerror(&$template, $message, &$errors)

{

  $errors = true;

  $template->setCurrentBlock("error");

  $template->setVariable("ERROR", $message);

  $template->parseCurrentBlock("error");

}



if (!($connection = @ mysql_connect("localhost", "fred", "shhh")))

   die("Could not connect to database");



$firstname = mysqlclean($_POST, "firstname", 50, $connection);

$surname = mysqlclean($_POST, "surname", 50, $connection);

$phone = mysqlclean($_POST, "phone", 20, $connection);



$template = new HTML_Template_ITX("./templates");

$template->loadTemplatefile("example.8-10.tpl", true, true);



$errors = false;



if (empty($firstname))

  formerror($template, "The first name field cannot be blank.", $errors);



if (empty($surname))

  formerror($template, "The surname field cannot be blank.", $errors);



if (empty($phone))

  formerror($template, "The phone field cannot be blank", $errors);



// Now the script has finished the validation, show any errors

if ($errors)

{

  $template->show( );

  exit;

}



// If we made it here, then the data is valid

if (!mysql_select_db("telephone", $connection))

  showerror( );



// Lock the table

$query = "LOCK TABLES phonebook WRITE";

if (!(@ mysql_query ($query, $connection)))

   showerror( );



// Find the maximum phonebook_id value that's in use

$query = "SELECT max(phonebook_id) FROM phonebook";

if (!($result = @ mysql_query ($query, $connection)))

   showerror( );



$row = @ mysql_fetch_array($result);



// Set the new value for the primary key

$phonebook_id = $row["max(phonebook_id)"] + 1;



// Insert the new phonebook entry

$query = "INSERT INTO phonebook VALUES

          ({$phonebook_id}, '{$surname}', '{$firstname}', '{$phone}')";

if (!(@ mysql_query ($query, $connection)))

   showerror( );



// Unlock the table

$query = "UNLOCK TABLES";

if (!(@ mysql_query ($query, $connection)))

   showerror( );



// Show the phonebook receipt

header("Location: example.8-5.php?status=T&phonebook_id={$phonebook_id}");

?>

The locking of the phonebook table is performed before the SELECT query, and the UNLOCK TABLES statement is issued after the INSERT. As you can see, the lock and unlock statements are executed just like any other query using mysql_query( ).

8.2.3.1 Locking methods that don't work in web database applications

There are several locking paradigms that don't work in a web database application because of the statelessness of HTTP. Each approach fails because there is either no guarantee or no possibility that the locked tables will be unlocked. If tables are locked indefinitely, other transactions can't proceed, and the DBMS will most likely need to be shut down and restarted.

Be careful with locking in web database applications. Remember the basic rule that all locks should be unlocked by the same script during the same execution of the script.

All web scripts that require locking should have the sequence 1) lock, 2) query, 3) update, delete, or insert, and 4) unlock. There must be no user interaction or intervening calls to other scripts that require input.

The following must be avoided in web database applications:

Failing to issue an UNLOCK TABLES on a locked persistent database connection (such as one that opened with mysql_pconnect( )). The locks aren't released when the script terminates.
It isn't necessary to issue an UNLOCK TABLES if a nonpersistent connection is used (such as one opened with mysql_connect( )). Locks are automatically released when the script finishes and the connection closes. However, it is good practice to include the UNLOCK TABLES statement.
Locking one or more tables during the first execution of a script, leaving them locked, and then querying or updating during a second or subsequent execution of the script. Remember that each database connection in a script is independent and is treated as a different user by MySQL.
Retrieving a value such as the next available primary key value, presenting this to the user, waiting for the user to enter further details, and then adding a row to the database with that identifier. Remember that another user may add a row while the first user is entering the required details, and locks should never be carried across several scripts or different executions of the same script.

8.2.3.2 Locking with an auxiliary table

Locking limits concurrency in your web database application. If tables are locked, then other users won't be able to run the same script at the same time and other scripts may also not be able to proceed. For example, suppose you write lock the phonebook table we've used in our examples throughout this chapter. With the table locked, any other query on the phonebook table in any script won't proceed until you unlock the table; this means, for example, while you insert one row, no other users can search for a phone number. Sometimes, you want to avoid this and this section shows you how.

One technique you can use to minimize locking of your frequently used tables is to add an additional table to the database. This additional table stores and manages the next available primary key values for all other tables in the database. The additional table is then locked, queried, updated, and unlocked each time a new primary key value is needed; the main tables in the database are then never locked when data is inserted. In the remainder of this section, we show you how to do this using the MySQL function library; the next section shows you how to do the same thing using PEAR DB.

Let's consider an example. Suppose you want to add new rows to the phonebook table without locking it and without using the proprietary MySQL auto_increment modifier. You first create an additional table in the telephone database using the following CREATE TABLE statement:

CREATE TABLE identifiers (phonebook_id int(5));

As we show you next, this table only contains one row and therefore there's no need to declare or use a primary key.

The new identifiers table stores one row that contains the next available value of the phonebook_id primary key attribute from the phonebook table. To set this up, you add the row to the table and set the phonebook_id attribute to the next available value. Let's suppose your phonebook table is empty, and so the next primary key value for phonebook_id is 1. Here's the INSERT statement you use to set up the table:

INSERT INTO identifiers VALUES (1);

Now you can use the identifiers table to read and write a primary key value for the phonebook table. Having done this, you use the primary key value to create a new row without locking the phonebook table. Here's how you do it using the MySQL command interpreter:

mysql> LOCK TABLES identifiers WRITE;

Query OK, 0 rows affected (0.00 sec)



mysql> SELECT phonebook_id FROM identifiers;

+--------------+

| phonebook_id |

+--------------+

|            1 |

+--------------+

1 row in set (0.00 sec)



mysql> UPDATE identifiers SET phonebook_id = phonebook_id + 1;

Query OK, 1 row affected (0.00 sec)

Rows matched: 1  Changed: 1  Warnings: 0



mysql> UNLOCK TABLES;

Query OK, 0 rows affected (0.00 sec)



mysql> INSERT INTO phonebook VALUES (1, "Williams", 

       "Lucy", "61388763452");

Query OK, 1 row affected (0.01 sec)

The locking, querying, modifying, and unlocking process proceeds similarly to our example in the previous section, except that it doesn't use the phonebook table. Instead, the new row is inserted into the phonebook table without a lock using the value discovered with the SELECT query from the identifiers table, thereby maximizing concurrency (but requiring three SQL queries instead of two). Example 8-14 shows a rewritten version of Example 8-13 that uses this approach.

Example 8-14. Maintaining the phonebook table using an external identifiers table

<?php

require 'db.inc';

require_once "HTML/Template/ITX.php";



function formerror(&$template, $message, &$errors)

{

  $errors = true;

  $template->setCurrentBlock("error");

  $template->setVariable("ERROR", $message);

  $template->parseCurrentBlock("error");

}



if (!($connection = @ mysql_connect("localhost", "fred", "shhh")))

   die("Could not connect to database");



$firstname = mysqlclean($_POST, "firstname", 50, $connection);

$surname = mysqlclean($_POST, "surname", 50, $connection);

$phone = mysqlclean($_POST, "phone", 20, $connection);



$template = new HTML_Template_ITX("./templates");

$template->loadTemplatefile("example.8-10.tpl", true, true);



$errors = false;



if (empty($firstname))

  formerror($template, "The first name field cannot be blank.", $errors);



if (empty($surname))

  formerror($template, "The surname field cannot be blank.", $errors);



if (empty($phone))

  formerror($template, "The phone field cannot be blank", $errors);



// Now the script has finished the validation, show any errors

if ($errors)

{

  $template->show( );

  exit;

}



// If we made it here, then the data is valid

if (!mysql_select_db("telephone", $connection))

  showerror( );



// Lock the identifiers table

$query = "LOCK TABLES identifiers WRITE";

if (!(@ mysql_query ($query, $connection)))

   showerror( );



// Find the maximum phonebook_id value that's in use

$query = "SELECT phonebook_id FROM identifiers";

if (!($result = @ mysql_query ($query, $connection)))

   showerror( );



$row = @ mysql_fetch_array($result);



$phonebook_id = $row["phonebook_id"];



// Update the phonebook_id identifier

$query = "UPDATE identifiers SET phonebook_id = phonebook_id + 1";

if (!($result = @ mysql_query ($query, $connection)))

   showerror( );



// Unlock the table

$query = "UNLOCK TABLES";

if (!(@ mysql_query ($query, $connection)))

   showerror( );



// Insert the new phonebook entry

$query = "INSERT INTO phonebook VALUES

          ({$phonebook_id}, '{$surname}', '{$firstname}', '{$phone}')";

if (!(@ mysql_query ($query, $connection)))

   showerror( );



// Show the phonebook receipt

header("Location: example.8-5.php?status=T&phonebook_id={$phonebook_id}");

?>

To extend this scheme for a database containing several tables, there are two possible approaches: first, add an additional attribute (or more than one attribute if the primary key isn't on only one attribute) to the identifiers table for each additional table; or, second, add an additional identifier table for each additional table. The first approach is the simplest (and the one we recommend) but it does have the potential disadvantage that concurrency could be limited by excessive locking of the identifiers table if too many tables are maintained by using it. The second approach maximizes concurrency but is probably only necessary for high-throughput applications.

8.2.3.3 Managing identifiers with PEAR DB

In the previous section, we showed you how to maintain identifiers using an additional table. PEAR DB allows you to do the same thing using its DB::nextId( ) method and this is useful if you want to write database independent code. We show you how to use it in this section. The PEAR DB sequence methods are also briefly described in Chapter 7.

A sequence is a value associated with a name and it's typically used to create primary key values. A sequence is always initialized to 1, and increments each time you access it with DB::nextId( ). For example, suppose you want to maintain the primary key value for the phonebook_id from the phonebook table that we've used in our examples in this chapter. To do this, you can use the DB::nextID( ) method as shown in Example 8-15:

// Get a new primary key value for phonebook_id

$phonebook_id = $connection->nextId("phonebook_id");

When this is called for the first time, DB::nextId( ) creates a new sequence named phonebook_id, assigns it the value 1, and returns the value. When you call it for the second time, it returns 2, and so on. It performs exactly the same function as our identifiers table approach in Example 8-14.

Example 8-15. Using PEAR DB to maintain primary key values

<?php

require "db.inc";

require_once "HTML/Template/ITX.php";

require_once "DB.php";



function formerror(&$template, $message, &$errors)

{

  $errors = true;

  $template->setCurrentBlock("error");

  $template->setVariable("ERROR", $message);

  $template->parseCurrentBlock("error");

}



$dsn = "mysql://fred:shhh@localhost/telephone";



$connection = DB::connect($dsn, false);

if (DB::isError($connection))

  die($connection->getMessage( ));



$firstname = mysqlclean($_POST["firstname"], 50, $connection);

$surname = mysqlclean($_POST["surname"], 50, $connection);

$phone = mysqlclean($_POST["phone"], 20, $connection);



$template = new HTML_Template_ITX("./templates");

$template->loadTemplatefile("example.8-10.tpl", true, true);



$errors = false;



if (empty($firstname))

  formerror($template, "The first name field cannot be blank.", $errors);



if (empty($surname))

  formerror($template, "The surname field cannot be blank.", $errors);



if (empty($phone))

  formerror($template, "The phone field cannot be blank", $errors);



// Now the script has finished the validation, show any errors

if ($errors)

{

  $template->show( );

  exit;

}



// Get a new primary key value for phonebook_id

$phonebook_id = $connection->nextId("phonebook_id");

if (DB::isError($connection))

   die($connection->getMessage( ));



// Insert the new phonebook entry

$query = "INSERT INTO phonebook VALUES

          ({$phonebook_id}, {$surname}, {$firstname}, {$phone})";

$result = $connection->query($query);

if (DB::isError($result))

   die($result->getMessage( ));



// Show the phonebook receipt

header("Location: example.8-5.php?status=T&phonebook_id={$phonebook_id}");

?>

Behind the scenes, PEAR DB maintains a sequence in a table of the same name. When you create a sequence, it creates a table and an attribute and initializes the attribute to 1. When you call DB::nextId( ), it adds 1 and returns the value. PEAR DB correctly looks after safe concurrent access.

If you call DB::nextID( ) without its optional second parameter or with the second parameter set to true, a sequence with the name supplied as the first parameter is created if it doesn't exist. You can also manually create a sequence using DB::createSequence( ) and you can remove it using DB::dropSequence( ).