Optimistic Locking

With many applications, clients need to fetch the data to browse through it, make modifications to one or more rows, and then post the changes back to the database in SQL Server. These human-speed operations are slow in comparison to machine-speed operations, and the time lag between the fetch and post might be significant. (Consider the user who goes to lunch after retrieving the data.)

For these applications, you would not want to use normal locking schemes such as SERIALIZABLE or HOLDLOCK to lock the data so it can't be changed from the time the user retrieves it to the time he applies any updates. This would violate one of the cardinal rules of not allowing user interaction within transactions. You also would lose all control over the duration of the transaction. In a multiuser, OLTP environment, this could significantly affect concurrency and overall application performance due to blocking on locks and locking contention.

So how do you implement such an application? How do you allow users to retrieve information without holding locks on the data and still ensure that when they apply the updates, no other process has modified the information since it was initially retrieved?

Optimistic locking is a technique used in such situations where reading and modifying data processes are widely separated in time. Optimistic locking helps a client avoid overwriting another client's changes to a row without holding locks in the database.

Optimistic Locking Using the Timestamp Datatype

SQL Server provides a special datatype called timestamp. A timestamp is an eight-byte binary datatype. SQL Server automatically generates the value for a timestamp whenever a row that contains a column of this type is inserted or updated. Other than guaranteeing that the value is unique and monotonically increasing, the value is not meaningful; you cannot look at the individual bytes and make any sense out of them. Despite the name of the datatype, the value has no relation to the time that the record was modified. The purpose of the timestamp datatype is to serve as a version number in optimistic locking schemes.

The following two conditions must be met to utilize optimistic locking:

The table must have a primary key so that each row can be uniquely identified.
The table must have a column defined with the timestamp datatype.

In an application that uses optimistic locking, the client reads one or more records from the table, being sure to retrieve the current value of the timestamp column for each row. Any locks are released after the data has been read. At some later time, when the client wants to update a row, it must ensure that no other client has changed the same row in the intervening time. (Because no locks exist, it is the client's responsibility to make sure that the other client's changes are preserved.) The UPDATE statement must include a WHERE clause that compares the timestamp value held in the client application with the current timestamp value for the record in the database. If the timestamp values match?that is, if the value that was read is the same as the value in the database?then no changes to that row have occurred since it was originally retrieved. As such, the change attempted by the application can proceed. If the timestamp value in the client application does not match the value in the database, then that particular row has been changed since the original retrieval of the record. As a result, the state of the row that the application is attempting to modify is not the same as the row that currently exists in the database. As a result, the transaction should not be allowed to take place to avoid the "phantom values" problem.

To ensure that the client application does not overwrite the changes made by another process, the client needs to prepare the T-SQL UPDATE statement in a special way, using the timestamp column as a versioning marker. The following pseudo-code represents the general structure of such an update:

UPDATE theTable 
   SET theChangedColumns = theirNewValues
   WHERE primaryKeyColumns = theirOldValues
     AND timestamp = itsOldValue

Because the WHERE clause includes the primary key, the UPDATE can only apply to exactly one row or to no rows; it cannot apply to more than one row because the primary key is unique. The second part of the WHERE clause is what provides the optimistic "locking." If another client has updated the row, the timestamp will no longer have its old value (remember that the server changes the timestamp value automatically with each update), and the WHERE clause will not match any rows. The client needs to check to see whether any rows were updated. If the number of rows affected by the update statement is 0, the row has been modified since it was originally retrieved. The application can then choose to reread the data or do whatever recovery it deems appropriate.

This approach has one problem. How does the application know that it didn't match the row because the timestamp was changed, or because the primary key had changed or the row had been deleted altogether?

In SQL Server 2000, the tsequal() function is available that can be used in the WHERE clause to compare the timestamp value retrieved by the client application with the timestamp value in the database. If the two match, the update will proceed. If not, the update will fail because the where clause cannot be satisfied.

The following pseudo-code illustrates how this works:

The client reads a row:

select * from data_table where primary_key_field = <value>

The client prepares an UPDATE statement with new data values for this row.

The client submits the following UPDATE statement with the additional search clause using the tsequal() function:

update data_table set data_field_1 = foo 
   where primary_key_field = <value>
     and tsequal (timestamp, <retrieved timstamp value>)

The difference between this approach and the one presented earlier is that rather than simply not matching any rows when the timestamp value is different, the tsequal function evaluates to FALSE and causes the update statement to fail with an error message similar to the following:

Server: Msg 532, Level 16, State 1, Line 1 
The timestamp (changed to 0x00000000000001f7) shows that the row has been
 updated by another user.
The statement has been terminated.

At this point, the application will receive the error message and know for sure that the reason the update didn't take place is because the timestamp values didn't match. If no rows were found and updated, then it clearly would be due to the original row having been deleted or the primary key having been changed. If the row is found and the timestamp values match, then tsequal() evaluates to TRUE and the update proceeds normally.

NOTE

Curiously enough, even though the tsequal() function has been around for as long as I can remember and it still works as it always has, it is not documented in SQL Server Books Online except as a reserved keyword. Although it appears safe to continue to use the tsequal() function, it is possible that it is no longer supported or will be dropped in future releases of SQL Server. For this reason, use tsequal() with caution because your code might no longer work as expected in future releases.