Batch Updates

The DataSet object is designed with the assumption that optimistic concurrency is the default. As a container of disconnected data, the DataSet object accepts any sort of update to the rows it contains, so you can add new rows to any child table and update or delete existing rows. All these changes are persisted in memory and never get passed on to the database until the batch update is conducted. The batch update process starts when you call the Update method on a data adapter object and pass the DataSet as the argument. Prior to that, no change is applied to the database. When the batch update is completed, the DataSet changes are automatically committed to ensure consistency between the in-memory cache and the underlying database.

A batch update is the submission of an entire set of changes to the database. More exactly, the batch update repeats the user actions (typically, SQL commands) that produced the changes but targets the database rather than the DataSet object. In a multiuser environment, the batch update poses a potential conflict and design issue for you to resolve: when users concurrently access data, they can change the original database values while the rows are processed in memory. The batch update in such an environment is problematic if in the time elapsed between your fetch and the batch update, another user changes one of the rows you plan to modify. If your update is somehow dependent upon the current row value, then you might want to reconsider, and even reject, your most recent updates. If your update can simply overwrite any existing value (this depends on the application’s logic) then batch update is not problematic at all.

At update time, conflicts detected between the original values and the current values of certain database rows might introduce significant overhead that can invalidate the batch update. On the other hand, when you mix together data disconnection, batch updates, and a distributed, multiuser environment, you serve up exciting recipes for an effective Web architecture. Unfortunately, since we live in an imperfect world, you pay a price for them: row conflicts show up, but they are a natural and almost inevitable drawback.

Locking Rows

To update a database row in a multiuser environment, you must first lock it to prevent other users from accessing it. You can follow one of two lock models, each representing a different design philosophy: optimistic or pessimistic concurrency. Each lock model is not necessarily appropriate for every application. In some cases, system requirements force you to opt for one particular policy.

The Pessimistic Lock Model

Pessimistic concurrency is a defensive approach that tends to eliminate at the root any occurrence of conflicts. You set the lock on the row in a way that prevents other users from performing any action that can affect the current user. Other users will be given full access to the row only when the lock is released. The pessimistic model is mostly used in environments where the contention for data is quite heavy and the consistency of the application is mined with conflicts. In such systems, rolling back transactions—often distributed transactions—is typically more complicated than preventing conflicts by applying more durable locks. In a pessimistic model, a user planning to modify a row keeps the row locked until the change is committed, and the row is unavailable to other users. The pessimistic model promotes a transactional vision of the read and write actions.

The Optimistic Lock Model

Optimistic concurrency considers the reading and writing of information as two distinct, non-transactional actions. Typically, when a user reads a row, no lock is applied. This allows more users, at any given time, to concurrently read the contents of the row. The lock is set only when one user needs to write information, so you have no guarantee that the content of the row is exactly the same as it was when read earlier. This is just where the notion of optimism fits in. The optimistic lock policy assumes that the row has not been changed in the meantime. This assumption is reasonable for systems with a relatively low contention for data. If you can determine with some certainty that no user changed the row under your nose, the batch update is an effective technique. It reduces the hits on the database server, doesn’t lock records for very long, and contributes to establishing a much more responsive and forgiving application.

The application using the optimistic lock strategy is more responsive because all changes take place in memory prior to committing the update. Modifying a row is just a matter of overwriting a memory location. The application is more forgiving towards the user because you can design it to show a summary of the changes, giving the user a chance to accept or reject them in memory. In any case, optimistic concurrency leads to rather disconnected applications that use the database only as a downstream storage medium.

Depending on the ultimate goal of the application, conflicts might or might not be critical. When they are, you can apply a few techniques to detect and prevent them. I’ll discuss these later in the chapter.

Applying In-Memory Changes

Figure 7-2 shows the grid of data that users of this chapter’s sample application will see. The grid is based on the one I discussed in Chapter 4. It allows for new rows to be added and deleted, and supports in-place row editing. The user adds a new row by moving to the last page and clicking the link button at the bottom of the page. The user can edit and delete each row by clicking the two rightmost buttons.

Figure 7-2

The editable grid whose content can be loaded from either the database or an XML resume file.

In Chapter 4, while discussing the features of the DataGrid control’s in-place editing feature, I repeatedly mentioned the batch update as an alternative technique. It can be used in lieu of direct SQL commands that run on a per-row basis as soon as each modification takes place. In this chapter, you’ll see in action a fully editable DataGrid control that uses a double level of cache to park changes temporarily, waiting for the final submission to the database.

Inserting New Rows

The three basic operations you can execute on the data are INSERT, UPDATE, and DELETE. Each operation is performed on the DataSet object that is currently in the ASP.NET cache. In the sample application, the DataSet object is retrieved using the GetDataFromCache method. This method is based on the following code:

String keyUserName = Session["USERNAME"].ToString();
DataSet ds = (DataSet) Cache[keyUserName];

I discussed the details of insertion in Chapter 4, but I’ll review them here. You first add a new row to the data source that the DataGrid control is bound to, and then you update its content by using typical in-place editing. You can also give default values to some of the row’s columns.

DataSet ds = GetDataFromCache();
DataTable dt = ds.Tables["Employees"];
DataRow dr = dt.NewRow();
dt.Rows.Add(dr);


	You can configure the data adapter object to automatically inherit schema information from the native database tables. Either by setting the MissingSchemaAction property to AddWithKey or by invoking the FillSchema method, the adapter loads and applies some settings from the source tables to the DataTable objects. Not all column settings can be imported, however. You can automatically set the properties AllowDBNull, ReadOnly, Unique, and AutoIncrement, but not the auto-increment seed, step, and default values. You’ll learn about auto-increment columns later in this chapter.

When adding new rows to in-memory tables, you should carefully consider the case of auto-numbered columns. I’ll say more on this point in a moment.

Updating Existing Rows

Updating a row is as easy as retrieving the DataRow object involved and changing the value of its columns. You can accomplish this by using the DataTable object’s Select method on a primary key value. The following is the typical code you use:

DataRow[] rgRows;
rgRows = dt.Select("employeeid=" + grid.DataKeys[e.Item.ItemIndex]); 
DataRow dr = rgRows[0];

The Select method retrieves an array of rows that match the specified criteria. If the condition involves a primary key, the array contains at most one row. If the value for the primary key is obtained through the DataGrid control’s DataKeys collection, exactly one DataRow object will be selected.

Deleting Existing Rows

From the standpoint of the application, deleting an existing row is not much different from updating it. The application must retrieve the correct DataRow object and call the Delete method on it. However, beware of record deletion in in-memory tables. Deletion can be logical or physical. Logical deletion takes place using the Delete method of the DataRow object. Physical removal means that the row is removed from the DataTable object. It is conducted using the DataRowCollection object’s Remove method. A logically deleted row can be restored using the RejectChanges method. You can call RejectChanges for the specific DataRow object as well as for all the rows in a DataTable object and for all tables in a DataSet object. A physically removed row no longer exists in the table.


	A row removed from the Rows collection of the DataTable object is not destroyed at all. It is only detached from the table and is in the same condition as a newly created row not yet added to any table. This means that hypothetically, you can add the DataRow object to the same table again as long as the object does not go out of scope. Adding the DataRow object again to the table’s Rows collection is sufficient. So what does it mean to delete both the logical and the physical? Only logically deleted rows participate in the batch update process and have the corresponding database row deleted. Removed rows are simply ignored and the corresponding database row remains intact.

Auto-Increment Fields

Auto-increment fields are database columns whose values are automatically determined by the database at insertion time. These values are progressive and increased according to the specified step attribute each time a new row is added. The initial value is known as the seed value of the field. Because of the nature of the auto-increment feature, you can’t know the value in advance when you insert a new row into an in-memory table, but leaving the field blank or filling it with potentially duplicated values can cause subtle troubles for the application.

The value you assign to an auto-increment field should be ignored when you submit that row to the database. More exactly, you should avoid specifying an explicit value for that field when preparing an INSERT command; otherwise, you are at risk of getting a runtime error from the database. Databases, in fact, do not allow explicit values for auto-increment fields. (In order to disable the auto-increment feature for SQL Server, you must set the IDENTITY_INSERT attribute to OFF.)

The ADO.NET DataColumn object supports the AutoIncrement property, with seed and step attributes, so why not enable this property when adding rows to an in-memory DataTable object? When you do, your code doesn’t have to strive to maintain the uniqueness of the identity column. To enable the auto-increment feature automatically for the column that represents the table’s identity, set the MissingSchemaAction property to AddWithKey prior to filling the DataSet object:

SqlDataAdapter da = new SqlDataAdapter(strCommand, conn);
da.MissingSchemaAction = MissingSchemaAction.AddWithKey;
da.Fill(ds, "Employees");

For example, the employeeid column of the Northwind Employees table is set as the identity column, and this setting is automatically imported in the resulting DataTable object that contains rows from that table. Unfortunately, the data adapter is not able to automatically import the seed and the step values of the column. You need to set these properties separately:

ds.Tables["Employees"].Columns["employeeid"].AutoIncrementSeed = 1000;
ds.Tables["Employees"].Columns["employeeid"].AutoIncrementStep = 1;

The seed determines the first value that will be generated. As you can guess, this seed value is not persistent across sessions, nor is it compared against the values already in the DataTable object. So a typical problem you’ll face is choosing a seed value that does not collide with other rows. Since the seed is session-specific, you can safely choose either of these approaches: you can use a seed value that is clearly out of the current range (say, 1000), or you can calculate the maximum value in the current range of rows. The following code shows how to compute the maximum value of the employeeid column. Figure 7-3 shows the sample application at work.

DataTable dt = ds.Tables["Employees"];
int nSeed = (int) dt.Compute("Max(employeeid)", "");
dt.Columns["employeeid"].AutoIncrementSeed = nSeed + 1;

Figure 7-3

The newly added row features an automatic new value for the auto-increment column.

Bear in mind that the auto-increment value has a very short life. You need it only to unambiguously identify the row within the boundaries of the application. The value is discarded as soon as the change is submitted to the database. If you read the table back, you can easily find a completely different value in the identity column. Although they appear similar, the auto-increment feature of the database and the DataColumn object are completely separate and independent.

Delete and Remove

As mentioned earlier, you can delete a row from an in-memory table in two ways: by using the Remove method of the DataRowCollection object, which deletes a DataRow object from the collection of rows of a given table; or by using the Delete method of the DataRow object, which simply marks the row for deletion. When a row is marked for deletion, its RowState property is set to Deleted. You can view all the deleted rows at any time by setting a row state filter on the table. (More on this in a moment.)


	To avoid confusion, for the remainder of the chapter, deleted rows refer to rows logically deleted using the Delete method. Removed rows refer to rows physically removed from the DataSet object using the Remove method.

Marked rows are actually removed from the table only when the application calls the AcceptChanges method for the row. If the application calls RejectChanges, the row reverts to its pre-deletion state. When you logically delete a row that has just been added to the table (that is, the row’s RowState property is set to Added), the row is removed from the table instead of being marked as deleted.

Logical and physical deletion affects the batch update process. When you submit changes to the database, the data adapter selects the rows involved in the update by looking at their RowState properties. The values of rows marked as added are transmitted to the database via INSERT commands. Likewise, rows marked as modified are mapped onto the source rows via UPDATE commands, and deleted rows are removed from the database using DELETE commands. This is a key point. Only rows marked as deleted are taken into account during the batch update, so only rows logically deleted in memory have the rows corresponding to the database physically removed from the database.

The DataSet object—the all-encompassing ADO.NET container object—maintains original and current values for each row of each table. When a row is removed from a table, its row state information is lost and, consequently, the row is no longer available for the batch update. Calling the AcceptChanges method commits the state of the involved rows, turning any Added or Modified flag to Unchanged and removing any row marked as deleted. In light of this behavior, AcceptChanges must be called only at the end of the batch update process. Calling AcceptChanges prior to calling the Update method of the data adapter clears all the changes before they are applied to the database. Not calling AcceptChanges after the update is completed maintains pending changes in the DataSet object, which is re-issued next time. When the batch update operation ends successfully, the Update method automatically calls AcceptChanges and returns.

// ds stores the application's data
dataAdapter.Update(ds);

Notice, though, that Update can only call AcceptChanges on the DataSet you provide for the batch update. Sometimes the DataSet used for batch update is only a small subset of the original object where all the application’s data are stored—for example, a subset containing only the changed rows. In this case, you should explicitly accept changes on the original DataSet, too. So the correct sequence for an effective batch update is the one outlined in the following pseudo-code:

// Get changes in the original DataSet
dsChanges = ds.GetChanges();
// Performs the batch update for the given table.
// Changes are committed on the copy but not on the original! 
dataAdapter.Update(dsChanges, strTable);
// Clears any pending change in memory
ds.AcceptChanges();

States of a Row

So far the DataRow object’s RowState property is emerging as the key player in the batch update process. The RowState property influences the return value of the HasChanges method on the DataSet object. HasChanges returns a Boolean value that indicates whether any constituent table has pending changes. Related to HasChanges is the GetChanges method, which returns a DataSet object that includes only the rows (and the tables) with pending changes. By tracking the current state, the RowState property indicates whether a row has pending changes. The feasible values for RowState are summarized in Table 7-1.

The AcceptChanges method has the power to commit all the changes and accept the current values as the new original values of the table, clearing pending changes. RejectChanges rolls back all the pending changes.

Part III: Interoperability