DataGrid Controls and Data Persistence

ASP.NET applications are modeled after the Web Forms model, which is the typical Visual Basic form-based, client/server model of interaction delivered on the Web. The ASP.NET run time shields you from the structural differences between the two models. It takes care of serializing and deserializing the state of the form. Any server-side processing takes place in an environment that maintains the state of the client browser. When you use DataGrid controls, you need to retrieve and process the control’s data source every time you execute postback code on the server. As mentioned earlier in the chapter, the DataGrid control does not cache its data source in the control’s view state. This makes sense because the data source can be too large to be effectively transferred back and forth between the Web server and the browser. Bear in mind that all the information Web controls store in their ViewState properties makes the size of the HTML page larger. This information is then posted back from the browser to the server when a postback event occurs. Basically, you have two options for repopulating the DataGrid control’s DataSource property:

Cache the data source, as a whole or in part, on the Web server and read it back
Reload all the records from the physical data storage (typically, a database)

When you cache the data source, data is retrieved from storage only once, stored in a cache, and subsequent postback events read from that cache. You can use in-memory global objects such as Session and Cache, or alternatively you can use XML files stored on the Web server or another accessible share.

If you plan to reload the records each time a postback event occurs, consider that using a DataReader class is more efficient than using a data adapter. Don’t forget to close both the reader and the connection as soon as possible, as I previously explained.

Scalability? What Was That?

The way in which you decide to retrieve the grid’s data source might seriously affect the overall scalability of the application. However, years of real-world experience should have taught you scalability is affected by many factors. Scalability is precious like a diamond and, like a diamond, can have many facets that contribute to its value. Scalability can be described as the system’s ability to maintain, or improve, its responsiveness as the number of clients grows. The theory of queuing states that a queue forms itself when the frequency of the requests tend to overtake the system’s response time. For the sake of the application, you cannot take measures to reduce the user requests, but you can try to lessen the response time.

You normally adjust the scalability level of Web applications by mixing together, in application-specific doses, heterogeneous and even contradictory measures such as the following:

Limiting the number of calls to the database
Delegating as many tasks as possible to the database
Limiting the occupation of the Web server’s memory
Using relatively simple and stateless components

Writing fast and optimized code would also certainly help a lot! For a good result, each ingredient is extremely important and, with the right doses, even otherwise lethal ingredients are acceptable. Limiting the number of calls to the database implies that you are not delegating data processing to it and are therefore placing a load on the Web server. Limiting the server memory occupation implies that you don’t cache data and, consequently, that you have to call the database whenever data is required. And the list could go on and on. Scalability is a sort of philosopher’s stone, and programmers, like medieval alchemists, can only try remedies again and again, learning from their errors and fine-tuning their skills.

Let’s review the most common options you have for persisting the DataGrid control’s content on the server.

Using the Session Object

In ASP and ASP.NET, the Session object is a global repository for data and objects that belong to the session. The visibility of the data is limited to the pages invoked within the session. Using the Session object is critical any way you look at it. It guarantees quick and prompt access to data and returns ready-to-use objects, but all the session data is duplicated per each active session and connected user. In general, you should be extremely careful when it comes to using the Session object in production code. But my advice about being careful does not mean that Microsoft would have been better off dropping the Session object. Using Session is still the fastest way to access session-specific data, but try to keep the amount of data stored in Session under strict control.

The ASP.NET Session object has two major advantages over its ASP counterpart. First, any .NET object can now be safely stored in a Session slot. This overcomes the thread-affinity problem you might have experienced with ASP and Visual Basic COM components. Second, the Session object is the programming interface for a module—the Session Manager—that can work in-process and out-of-process, and can even rely on SQL Server for data storage. This is probably the best reason to opt for Session: it now works well with Web farm architectures.

Using the Cache Object

The majority of ASP.NET applications will take advantage of the Cache object for all of their caching needs. The Cache object is new to ASP.NET and provides unique and powerful features. It is a global, thread-safe object that does not store information on a per-session basis. In addition, the Cache object is designed to ensure it does not tax the server’s memory. If low memory does become an issue, the Cache object will automatically purge its least recently used items based on a priority defined by the developer. Like the familiar Application object, the Cache object does not share its state across the machines of a Web farm. In terms of the programming interface, using the Cache object is not at all different from using Session or Application objects.

What really differentiates the Cache object from Application is its ability to automatically remove least-used items when the memory is low. To help the built-in scavenging routines of the Cache object, you can assign some of your cache items with a priority and even a decay factor that lowers the priority of the items that have limited use. When working with the Cache object, you should never assume that an item is there when you need it. Always be ready to handle exceptions caused by null or invalid values. If your application needs to be notified of an item’s removal, then register for the OnRemove event. You can do this by creating an instance of the CacheItemRemovedCallback delegate and passing it to the Cache object’s Insert or Add method.

In addition, some of the items stored in the Cache object can be bound to the timestamp of one or more files or other cached items. When any of these linked resources change, the cached item becomes obsolete and is removed from the cache. By using a try/catch block you can catch the invalid item and reload it from persistent storage.

Aside from Web farms, in resource-constrained scenarios you might want to consider alternatives to Cache. Even when you have large data sets to store on a per-session basis, storing and reloading them from memory will be much faster than any other approach. With many users connected at the same time, each storing large blocks of data, you might want to consider helping the Cache object do its job better. An application-specific, layered caching system built around the Cache object is an option to evaluate. In this case, hot and sensitive data will go into the Cache and be efficiently managed by the ASP.NET run time. The rest of the data could be cached in a slower, but memory-free, storage such as session-specific XML files.

Using XML Files

ADO.NET classes, and the DataSet class in particular, are tightly integrated with XML. This means that saving the content of DataSet to a disk-based XML document is a snap. Also, rebuilding a living instance of a DataSet object from a persistent XML file is not particularly hard. If you don’t want to re-read the DataGrid control’s data out of a database every time it’s needed and if you don’t want to load the data only once and leave it stored to Session or Cache, persisting the data to XML files is an interesting option to consider. The more memory the Web server has, the more quickly it can serve new requests.

Earlier in the chapter, I showed a page that based persistence of the DataGrid control’s content on the Session object. Let’s rewrite the code to use XML files instead.

<script runat="server">
void Page_Load(Object sender, EventArgs e)
{
    if (!IsPostBack)
    {
        DataFromSourceToMemory("MyDataSet");
        UpdateDataView();
    }
}


void DataFromSourceToMemory(String strDataSessionName)
{
    // Gets rows from the data source
    DataSet oDS = PhysicalDataRead();
    
    // Stores it in the session cache
    SerializeDataSource(oDS);
}

When the page is first loaded, the control calls DataFromSourceToMemory which, in turn, performs a physical read from the storage medium and then, instead of storing information to Session, calls a new routine named SerializeDataSource.

void SerializeDataSource(DataSet ds)
{
    String strFile = Server.MapPath(Session.SessionID + ".xml");
    XmlTextWriter xtw = new XmlTextWriter(strFile, null);
    ds.WriteXml(xtw);
    xtw.Close();
}

The code creates a new XML file whose name is based on the session ID. The file is populated by using the DataSet object’s WriteXml method. Reading information back is easy, too. The task is accomplished by the DeserializeDataSource function.

private DataSet DeserializeDataSource()
{
    String strFile = Server.MapPath(Session.SessionID + ".xml");
    XmlTextReader xtr = new XmlTextReader(strFile);
    DataSet ds = new DataSet();
    ds.ReadXml(xtr);
    xtr.Close();
    return ds; 
}

Instead of restoring DataSet from Session, your code would use DeserializeDataSource.

Using Data Readers and Adapters

Another, more radical choice for data persistence is re-reading the data from the database whenever needed. This choice is good when you have to deal with a very large data set and plan to implement a custom pagination service. Now might be a good time to refresh your memory about the differences between ADO.NET data adapters and data readers. You should use adapters when you want to work on your data in a disconnected manner. You should use readers if you don’t plan to persist the retrieved data.

A data adapter populates a DataSet structure, which provides the main tools you need to work in the absence of a database. It provides for filters, sorting, indexing, searching, cloning, and even in-memory relations. Given its arsenal of programming tools, the DataSet class is not optimized for a simple read of a few records. The DataSet class can be persisted, either on disk or in memory (it does not matter), and survive across multiple page requests.

But if your goal is to read only the records that fill up one grid page, you are much better off using any of the data reader classes. Not only does ADO.NET provide a data reader for each supported .NET–managed data provider, but every .NET data provider is expected to expose a data adapter and a data reader class. So you use SqlDataReader when your target database is SQL Server version 7 and later versions. You use OleDbDataReader when your target is an OLE DB provider. You use OdbcDataReader when you are working against the ODBC .NET–managed data provider. The data reader works like a firehose—it provides an open channel through which read-only records flow, and the client reads them, one after the next, in a forward direction.

The Paradox of Pagination

Let’s review what happens when you decide to retrieve records to populate a DataGrid control at every postback event using a data adapter object. You typically use a CreateDataSource function that returns a DataSet object, and you assign the result to the grid and rebind. If you don’t cache DataSet in any way, you end up loading all the records available in the data source for each page scrolled. Get the point? Now do you see the paradox? You load, say, 100 records to display only the 10 or fewer that fit in the page real estate. And this happens for each page and for each postback event triggered. You probably architect things this way to exploit the DataGrid control’s built-in pagination but, in doing so, you pay for it without ever using it. The only concrete gain is that the DataGrid control selects for you the 10 or fewer records to display for the current page—and that is at least something.

Part III: Interoperability