What Is the DataSet Object Really For?

What Is the DataSet Object Really For?

The DataSet object represents an in-memory cache of data and is the critical object that gets passed between the middle tier and the client application (for example, a Microsoft Windows Forms application) or between the middle-tier and the Web Services layer. DataSet objects easily and efficiently serialize themselves into and out of XML. This means that data, as well as the related schema information, can be moved between tiers in a loosely coupled manner.

A Windows Forms application that receives a DataSet object can take full advantage of it. Data is cached on the client and, unless you need a current copy of the data, you can sort, page, and filter the data without accessing the database server. Add to this functionality the object’s complex programming interface, and you go straight to the conclusion that DataSet is the right object in the right place at the right time. How does all this affect Web applications?

Implications for Web Applications

The DataSet object is still great for storing the state that survives across page requests—for example, cached catalog information, the contents of a shopping cart, or a pageable sales report. However, this storage ability does not guarantee enhanced scalability and better performance. When accessing data over the Web, you have an extra layer to consider.

Which client processes disconnected data through the DataSet object? The HTML page displayed by the browser is one possible client. The middle-tier component (in the simplest case, an ASP.NET page) that connects to the database is another possible client. If the presentation layer is based on a browser, you can’t really send a DataSet object embedded with HTML code. I’m sure you could come up with some workarounds, but they won’t be exciting prospects until .NET languages are natively supported by Microsoft Internet Explorer.

Disconnection for Web applications means that you cache data on the Web server and keep your middle tier working off line and disconnected from the database server. The advantages of this technique are more relevant if your system architecture provides two distinct server machines for the database and the Web server.

Caching on the Web server is critical, and you have several options to choose from: using the Session object (in-process, out-of-process, or SQL Server–based); the Application object; the powerful Cache object (new to ASP.NET); or the same DataSet object, by performing brute-force serialization of it into the page view state. You can also cache to XML files to avoid taxing the Web server’s memory on a per-user basis. No solution works in every case, and each is more of a compromise than the perfect workaround. The characteristics and requirements of your application are the only way to determine which approach you should take.

Let me sum up your options. If you plan to use server-side caching, start by choosing a caching mechanism and pay careful attention to the advanced services of the built-in ASP.NET caching service. With regard to the data container object, you probably need to look at only the DataSet object. Whatever caching mechanism you implement, use DataSet to collect data. It is unique in its ability to group related information.

DataSet and the DataGrid Control

As mentioned in previous chapters, the use of DataSet can be critical when you use it with certain controls such as the DataGrid control. Using it with the Data- Grid control can be dangerous to the health of your application because the DataGrid control does not cache the DataSet object it is bound to. When you return to the Web server for a postback event, you have to reload DataSet and rebind with the DataGrid control. At that point, you have two mutually exclusive but equally safe options:

  • If you need fresh data, use the more lightweight DataReader object and use custom pagination to minimize the number of records loaded.

  • If you don’t care about fresh data (because your database is not volatile or your users are not interested in last-minute changes), cache the DataSet object. For the majority of applications, the Cache object is the best place to park data. To make it user-specific, name the cache slots you use after the session ID.

Like the Application object, the Cache object does not share its state across a Web farm, so the Web farm scenario is the hardest one for application-wide caching. Relying on the built-in database caching mechanism is probably the most efficient approach. Consider that databases, and SQL Server 2000 in particular, feature very advanced and super-optimized query engines. Queries that execute frequently, such as those used to page back and forth, are recognized and cached internally, so getting fresh results for a small number of rows takes little time. No matter how smart you are, I have a strong feeling that outperforming the cache system of SQL Server 2000 wouldn’t be that easy for you!

Towards Disconnected Applications

Applications called to manage data that is not particularly volatile or that has a low degree of decay can happily cache the DataSet object on the Web server. By doing so, these applications can use a DataGrid control to sort and page through the data.

In this chapter, I’ll go beyond the basics of data disconnection and focus on a couple of additional aspects that address more general techniques and cover a broad range of applications. One of these techniques is transparently loading data either from files or by using a data adapter object. The other is modifying the cached data in memory and submitting all the changes to the database in a single batch. This is known as batch update and is an old acquaintance of ADO developers.

The application we’ll examine uses a double level of caching. One level is in-memory caching, used to fill out the DataGrid control that is showing the data. This kind of caching is implemented by using the Cache object and is tied to the application’s lifetime. A second level of caching keeps the application disconnected from the data source across multiple executions of the application. Basically, you hibernate the state of the application and resume it on the next execution. Thanks to the DiffGram structure of the DataSet object, you can persist and reload all the changes you make to DataSet since the last connection to the database or the last time you committed your work.