Performing Transactions

A transaction is a set of SQL statements that are executed as a unit without interruption. One use for transactions is to make sure that the records involved in an operation are not modified by other clients while you're working with them. MySQL automatically performs locking for single SQL statements to keep clients from interfering with each other. (For example, two clients cannot update the same record in a table simultaneously.) But automatic single-statement locking is not always sufficient to guarantee that a database operation achieves its intended result, because some operations are performed over the course of several statements. In this case, different operations may interfere with each other. A transaction groups statements into a single execution unit to prevent concurrency problems that might otherwise occur in a multiple-client environment.

Transaction support also includes commit and rollback capabilities, which allows you to require that the statements must execute as a unit or not at all. That is, if the transaction succeeds, you know that all the statements within it executed successfully. If any part of the transaction fails, any statements executed up to that point within it are undone, leaving the database in the state it was in prior to the point at which the transaction began.

Transactional systems typically are characterized as providing ACID properties. ACID is an acronym for Atomic, Consistent, Isolated, and Durable, referring to four properties that transactions should have:

  • Atomicity. The statements a transaction consists of form a logical unit. You can't have just some of them execute.

  • Consistency. The database is consistent before and after the transaction executes. In other words, the transaction doesn't make a mess of your database.

  • Isolation. One transaction has no effect on another.

  • Durability. When a transaction executes successfully to completion, its effects are recorded permanently in the database.

Some of MySQL's table types are non-transactional (ISAM, MyISAM, and HEAP), and some are transactional (BDB and InnoDB). This section describes the types of problems that can occur if you don't pay attention to transactional issues, as well as how to address them using both non-transactional and transactional approaches.

Why Transactions Are Useful

The following example illustrates how concurrency problems can occur when multiple clients attempt to make changes to a database using operations that each require several statements. Suppose you're in the garment sales business and your cash register software automatically updates your inventory levels whenever one of your salesmen processes a sale. The sequence of events shown here outlines the operations that take place when multiple sales occur. For the example, assume that the initial shirt inventory level is 47.

  1. Salesman A sells three shirts and registers the sale. The register software begins to update the database by selecting the current shirt count (47):

    SELECT quantity FROM inventory WHERE item = 'shirt'; 
    
  2. In the meantime, Salesman B has sold two shirts and registered the sale. The software at the second register also begins to update the database:

    SELECT quantity FROM inventory WHERE item = 'shirt'; 
    
  3. The first register computes the new inventory level to be 47?3 = 44 and updates the shirt count accordingly:

    UPDATE inventory SET quantity = 44 WHERE item = 'shirt'; 
    
  4. The second register computes the new inventory level to be 47?2 = 45 and updates the count:

    UPDATE inventory SET quantity = 45 WHERE item = 'shirt'; 
    

At the end of this sequence of events, you've sold five shirts (that's good), but the inventory level says 45 (that's bad, because it should be 42). The problem is that if you look up the inventory level in one statement and update the value in another statement, you have a multiple-statement operation. The action taken in the second statement is dependent on the value retrieved in the first. If separate multiple-statement operations occur during overlapping time frames, the statements from each operation intertwine and interfere with each other. To solve this problem, it's necessary that the statements for a given operation execute without interference from other operations. A transactional system ensures this by executing each salesman's statements as a unit. As a result, Salesman B's statements won't execute until those for Salesman A have completed.

Another issue that occurs in database processing with multiple-statement operations is that, unless handled properly, an error occurring partway through the operation can leave your database in a halfway-updated (inconsistent) state. The typical example of this involves a financial transfer where money from one account is placed into another account. Suppose Bill writes a check to Bob for $100.00 and Bob cashes the check. Bill's account should be decremented by $100.00 and Bob's account incremented by the same amount:

UPDATE account SET balance = balance - 100 WHERE name = 'Bill'; 
UPDATE account SET balance = balance + 100 WHERE name = 'Bob';

If a crash occurs between the two statements, the operation is incomplete. If transactional capabilities are not available to you, you have to figure out the state of ongoing operations at crash time by examining the update log manually to determine how to undo them or complete them. The rollback capabilities of transaction support allow you to handle this situation properly by undoing the effect of the statements that executed before the error occurred. (You may still have to determine which transactions weren't entered and re-issue them, but at least you don't have to worry about half-transactions making your database inconsistent.)

Non-Transactional Approaches to Transactional Problems

In a non-transactional environment, some transactional issues can be dealt with and some cannot. The following discussion covers what you can and cannot achieve without using transactions. Consider once again the shirt inventory scenario described earlier. To deal with the concurrency issues inherent in that situation, you can take a couple of approaches:

  • Lock the tables explicitly. You can group statements and execute them as a unit by surrounding them with LOCK TABLES and UNLOCK TABLES statements. Lock all the tables that you need to use, issue your queries, and release the locks. This prevents anyone else from changing the tables while you have them locked. Using table locking, the inventory update scenario might be handled as follows:

    1. Salesman A sells three shirts and registers the sale. The register software begins the inventory process by acquiring a table lock and retrieving the current shirt count (47):

      LOCK TABLES inventory WRITE; 
      SELECT quantity FROM inventory WHERE item = 'shirt';
      

      A WRITE lock is necessary here because the ultimate goal of the operation is to modify the inventory table, which involves writing to it.

    2. In the meantime, Salesman B has sold two shirts and registered the sale. The software at the second register also begins to update the database by acquiring a lock:

      LOCK TABLES inventory WRITE; 
      

      In this case, this statement will block because Salesman A already holds a lock on the table.

    3. The first register computes the new inventory level to be 47?3 = 44, updates the shirt count, and releases the lock:

      UPDATE inventory SET quantity = 44 WHERE item = 'shirt'; 
      UNLOCK TABLES;
      
    4. When the first register releases the lock, the second register's lock request succeeds, and it can proceed to retrieve the current shirt count (44):

      SELECT quantity FROM inventory WHERE item = 'shirt'; 
      
    5. The second register computes the new inventory level to be 44?2 = 42, updates the shirt count, and releases the lock:

      UPDATE inventory SET quantity = 42 WHERE item = 'shirt'; 
      UNLOCK TABLES;
      

      Now the statements from the two operations don't get mixed up and the inventory level is set properly.

      If you're using multiple tables, you must lock all of them before you execute the grouped queries. If you only read from a particular table, however, you need only a read lock on it, not a write lock. (This lets other clients read the tables while you're using them, but prevents clients from writing to them.) Suppose you have a set of queries in which you want to make some changes to the inventory table, and you also need to read some data from a customer table. In this case, you need a write lock on the inventory table and a read lock on the customer table:

      LOCK TABLES inventory WRITE, customer READ; 
      ... use the tables here ...
      UNLOCK TABLES;
      
  • Use relative updates, not absolute updates. For the inventory updating method that uses explicit table locking, the operation involves looking up the current inventory level with one statement, computing the new value based on the number of shirts sold, and then updating the level to the new value with another statement. Another way to keep operations performed by multiple clients from interfering with each other is to reduce each operation to a single statement. This eliminates inter-statement dependencies that arise in multiple-statement operations. Not every operation can be handled by a single statement, but for the inventory update scenario, this strategy works well. It's possible to perform each inventory update in one step simply by modifying the shirt count relative to its current value:

    1. Salesman A sells three shirts and the register software decrements the shirt count by three:

      UPDATE inventory SET quantity = quantity - 3 WHERE item = 'shirt'; 
      
    2. Salesman B sells two shirts and the register software decrements the shirt count by two:

      UPDATE inventory SET quantity = quantity - 2 WHERE item = 'shirt'; 
      

      With this method, each modification to the database no longer requires multiple statements and thus eliminates concurrency issues. This means there is no need to use explicit table locks. If an operation you want to perform is similar to this, there may be no need for transactions at all.

These non-transactional approaches can be applied successfully to many types of problems, but they have certain limitations:

  • Not every operation can be written in terms of relative updates. Sometimes you must use multiple statements, in which case concurrency issues have to be considered and dealt with.

  • You may be able to keep clients from interfering with each other by locking tables for the duration of a multiple-statement operation, but what happens if an error occurs in the middle of the operation? In this case, you'd want the effects of the earlier statements to be undone so that the database isn't left in a half-modified and inconsistent state. Unfortunately, although table locking can help you address concurrency issues, it provides no assistance in recovering from errors.

  • The locking strategy requires you to lock and unlock your tables yourself. If you revise the operation to be performed in such a way that the set of tables affected changes, you must remember to modify the LOCK TABLES statement accordingly. A database system with transaction support would determine which locks are necessary and acquire them automatically.

Transactional capabilities help you deal with all these issues. A transaction handler executes a set of statements as a unit and manages concurrency issues by preventing clients from getting in the way of each other. It also allows rollback in the case of failure to keep half-executed operations from damaging your database, and it automatically acquires any locks that are necessary.

Using Transactions to Ensure Safe Statement Execution

To use transactions, you must use a transactional table type. The ISAM, MyISAM, and HEAP table types will not work; you must use either BDB or InnoDB tables. The BDB and InnoDB handlers first appeared in binary distributions in MySQL 3.23.17 and 3.23.29, respectively, and were added to source distributions as of MySQL 3.23.34. However, it's best to use more recent distributions if possible, to take advantage of the improvements that have been made since then. If you're not sure whether your server includes the BDB or InnoDB table handlers, see the "Determining Which Table Types Your Server Supports" section earlier in this chapter.

By default, MySQL runs in auto-commit mode, which means that changes made by individual statements are committed to the database immediately to make them permanent. In effect, each statement is its own transaction. To perform transactions explicitly, disable auto-commit mode and then tell MySQL when to commit or roll back changes.

One way to perform a transaction is to issue a BEGIN statement to disable auto-commit mode, execute the statements that make up the transaction, and end the transaction with a COMMIT statement to make the changes permanent. If an error occurs during the transaction, cancel it by issuing a ROLLBACK statement instead to undo the changes. BEGIN suspends the current auto-commit mode, so after the transaction has been committed or rolled back, the mode reverts to its state prior to the BEGIN. (If auto-commit was enabled beforehand, ending the transaction puts you back in auto-commit mode. If it was disabled, ending the current transaction causes you to begin the next one.)

The following example illustrates this approach. First, create a table to use:

mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) TYPE = INNODB; 

The statement creates an InnoDB table, but you can use BDB if you like. Next, initiate a transaction with BEGIN, add a couple of rows to the table, commit the transaction, and see what the table looks like:

mysql> BEGIN; 
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

You can see that the rows have been recorded in the table. If you had started up another instance of mysql and selected the contents of t after the inserts but before the commit, the rows would not show up. They would not become visible to the other mysql process until the COMMIT statement had been issued by the first process.

If an error occurs during a transaction, you can cancel it with ROLLBACK. Using the t table again, you can see this by issuing the following statements:

mysql> BEGIN; 
mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace';
ERROR 1062: Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

The second INSERT attempts to place a row into the table that duplicates an existing name value. The statement fails because name has a UNIQUE index. After issuing the ROLLBACK, the table has only the two rows that it contains prior to the failed transaction. In particular, the INSERT that was performed just prior to the point of the error has been undone and its effect is not recorded in the table.

Issuing a BEGIN statement while a transaction is in process commits the current transaction implicitly before beginning a new one.

Another way to perform transactions is to manipulate the auto-commit mode directly using SET statements:

SET AUTOCOMMIT = 0; 
SET AUTOCOMMIT = 1;

Setting AUTOCOMMIT to zero disables auto-commit mode. The effect of any following statements become part of the current transaction, which you end by issuing a COMMIT or ROLLBACK statement to commit or cancel it. With this method, auto-commit mode remains off until you turn it back on, so ending one transaction also begins the next one. You can also commit a transaction by re-enabling auto-commit mode.

To see how this approach works, begin with the same table as for the previous examples:

mysql> DROP TABLE t; 
mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) TYPE = INNODB;

Then disable auto-commit mode, insert some records, and commit the transaction:

mysql> SET AUTOCOMMIT = 0; 
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

At this point, the two records have been committed to the table, but auto-commit mode remains disabled. If you issue further statements, they become part of a new transaction, which may be committed or rolled back independently of the first transaction. To verify that auto-commit is still off and that ROLLBACK will cancel uncommitted statements, issue the following queries:

mysql> INSERT INTO t SET name = 'Gromit'; 
mysql> INSERT INTO t SET name = 'Wallace';
ERROR 1062: Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
+---------+
| name    |
+---------+
| Wallace |
| William |
+---------+

To restore auto-commit mode, use the following statement:

SET AUTOCOMMIT = 1; 

Transactions also end under the following circumstances:

  • In addition to statements like SET AUTOCOMMIT, BEGIN, COMMIT, and ROLLBACK that affect transactions explicitly, certain other statements do so implicitly because they cannot be part of a transaction. If you issue any of these while a transaction is in progress, the server commits the transaction first before executing the statement. Statements that cause a commit are as follows:

    ALTER TABLE 
    CREATE INDEX
    DROP DATABASE
    DROP INDEX
    DROP TABLE
    LOAD MASTER DATA
    LOCK TABLES
    RENAME TABLE
    TRUNCATE TABLE
    UNLOCK TABLES (if tables currently are locked)
    
  • If the client connection ends or is broken during a transaction before a commit occurs, the server rolls back the transaction automatically.

Transactions are useful in all kinds of situations. For example, suppose you're working with the score table that is part of the grade-keeping project and you discover that the grades for two students have gotten mixed up and need to be switched. The grades as entered incorrectly are as follows:

mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9); 
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          8 |        5 |    18 |
|          9 |        5 |    13 |
+------------+----------+-------+

To fix this, student 8 should be given a score of 13 and student 9 a score of 18. That can be done easily with two statements:

UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8; 
UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;

However, it's necessary to ensure that both statements succeed as a unit?a problem to which transactional methods can be applied. To use BEGIN, do the following:

mysql> BEGIN; 
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
mysql> COMMIT;

To accomplish the same thing by manipulating the auto-commit mode explicitly instead, do this:

mysql> SET AUTOCOMMIT = 0; 
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
mysql> COMMIT;
mysql> SET AUTOCOMMIT = 1;

Either way, the result is that the scores are swapped properly:

mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9); 
+------------+----------+-------+
| student_id | event_id | score |
+------------+----------+-------+
|          8 |        5 |    13 |
|          9 |        5 |    18 |
+------------+----------+-------+