An Offline Buffer for Data

The sample application uses two levels of cache. The topmost cache is represented by the ASP.NET Cache object, which contains the data needed for pagination and all the changes entered at a given time. The second level of cache is an XML file that is expected to persist across two consecutive invocations of the application. (Because the file contains a serialized DataSet object, it is an XML file.) Like the Cache slot, the file is named for the connected user.

When you must serialize a DataSet object to XML, you have three basic options: save the object with schema information, save it without schema information, or save it in DiffGram format. (By default, the DataSet object is saved without schema information.) The available options are grouped in the XmlWriteMode enumeration. The following code shows how to save a DataSet object to a given disk file as a DiffGram:

ds.WriteXml(strFile, XmlWriteMode.DiffGram);

You can save directly to a path name or to a stream. Reading back the XML text to rebuild the DataSet object is a bit trickier. In this case, you use the ReadXml method, which has a prototype very similar to WriteXml. The only difference between them is the range and the semantics of the values you use to control how the DataSet object is set up. ReadXml accepts only the flags defined in the XmlReadMode enumeration.

Loading a DataSet from XML

The ReadXml method reads the contents of the XML document and fills the DataSet object with data. In doing so, it also creates the relational schema of the DataSet object according to two conditions: whether the reading mode has been specified, and whether a relational schema is already described in the XML source. The XmlReadMode argument defaults to Auto, which means that the ReadXml method examines the XML data and determines the most appropriate option to follow.

When the XML data is a DiffGram, the DiffGram format (described in the next section) is used to extrapolate data. When the XML document does contain an inline schema, this schema is used to determine the structure of the DataSet object’s child elements. This behavior evaluates to the ReadSchema option. If no schema is found, the schema information is inferred using the InferXMLSchema method on the DataSet object. As you can guess, when you set an explicit XmlReadMode method while loading a DataSet object, your performance can only be improved.

The DiffGram Format

A DiffGram is an XML serialization format that includes both the original value and the current value of each row. In particular, it contains the list of rows with the original values plus a final section where all the changes are grouped. Each row is given a unique identifier that is used to track changes between the two sections of the DiffGram. This relationship looks a lot like a foreign key relationship. The next listing outlines the structure of a DiffGram, where row 1 has been deleted, row 3 modified, and a new row inserted.

<diffgr:diffgram>
  <DataSetName>
    <Employees diffgr:id="Employees1" msdata:rowOrder="0">...</Employees>
    <Employees diffgr:id="Employees3" msdata:rowOrder="2">...</Employees>
    <Employees diffgr:id="Employees4" msdata:rowOrder="3" 
        diffgr:hasChanges="modified">...</Employees>
    <Employees diffgr:id="Employees5" msdata:rowOrder="4">...</Employees>
    <Employees diffgr:id="Employees6" msdata:rowOrder="5">...</Employees>
    <Employees diffgr:id="Employees7" msdata:rowOrder="6">...</Employees>
    <Employees diffgr:id="Employees8" msdata:rowOrder="7">...</Employees>
    <Employees diffgr:id="Employees9" msdata:rowOrder="8">...</Employees>
    <Employees diffgr:id="Employees10" msdata:rowOrder="9" 
        diffgr:hasChanges="inserted">...</Employees>
  </DataSetName>
  <diffgr:before>
    <Employees diffgr:id="Employees2" msdata:rowOrder="1">...</Employees>
    <Employees diffgr:id="Employees4" msdata:rowOrder="3">...</Employees>
   </diffgr:before>
</diffgr:diffgram>

The <diffgr:diffgram> root node has two children. The first is the DataSet object with its current contents, including newly added rows and modified rows but not deleted rows. This data tree takes its name from the DataSet object. You get and set the DataSet object’s name by using the DataSetName property.

The second child is the tree rooted in the <diffgr:before> node. This tree contains enough information to restore the original state of the DataSet object. For example, it still contains any row that has been deleted as well as the original contents of any modified row. All columns affected by the change are tracked in the <diffgr:before> subtree. Although this approach is certainly quite verbose and redundant, it enables a slightly better performance when restoring original values in the in-memory DataSet object. To overwrite an entire row, the previously mentioned ItemArray property on the DataRow object is a faster approach than locating a column and then updating it.

Identifying Rows

The diffgr:id attribute is used establish a link between current and original rows. The diffgr:hasChanges attribute helps find out quickly which records were deleted, inserted, or only modified. The structure of each row node depends on a few factors. In most cases it will look like the next code listing. Each column is represented by a node element, with the contents of the column being the node text.

<Employees diffgr:id="Employees1" msdata:rowOrder="0">
  <EmployeeID>1</EmployeeID> 
  <LastName>Davolio</LastName> 
  <FirstName>Nancy</FirstName> 
  <Title>Sales Representative</Title> 
  <TitleOfCourtesy>Ms.</TitleOfCourtesy> 
  <BirthDate>1948-12-08T00:00:00.0000000+01:00</BirthDate> 
  <HireDate>1992-05-01T00:00:00.0000000+02:00</HireDate> 
  <Address>507 - 20th Ave. E. Apt. 2A</Address> 
  <City>Seattle</City> 
  <Region>WA</Region> 
  <PostalCode>98122</PostalCode> 
  <Country>USA</Country> 
  <HomePhone>(206) 555-9857</HomePhone> 
  <Extension>5467</Extension> 
  <Notes>...</Notes> 
  <ReportsTo>2</ReportsTo> 
</Employees>

You can control the way in which the contents of each column is written to XML by using the ColumnMapping property of the DataColumn object. The property takes values from the MappingType enumeration. The default setting is Element, which means that, as shown in the preceding code, each column has its own node. You can also require the column’s value to be persisted as an attribute on the main row node (the <Employees> node in the preceding example). The value to pick up from the MappingType enumerator is Attribute. Finally, you can hide the column by using the Hidden setting, as shown in the next bit of code:

DataTable dt = ds.Tables["Employees"];
dt.Columns["Notes"].ColumnMapping = MappingType.Hidden;
dt.Columns["ReportsTo"].ColumnMapping = MappingType.Attribute;

These settings have a slightly different effect according to whether you apply the DiffGram format or plain XML serialization. If plain XML serialization is applied, the hidden column is left out of the resulting XML document. If you write DiffGrams, the hidden text is included in the document even though it is marked as hidden. The column mappings in the previous code change the <Employees> node, as shown here:

<Employees diffgr:id="Employees1" 
    msdata:rowOrder="0" 
    msdata:hiddenNotes="Education includes a BA in psychology from Colorado 
    State University in 1970. She also completed "The Art of the Cold 
    Call." Nancy is a member of Toastmasters International." 
    ReportsTo="2">

The Notes column is marked as hidden using a dynamically generated attribute named hiddenNotes. The ReportsTo column instead is stored as an attribute on the parent row node.

About Null Values

To optimize the code, the DataSet object drops column fields with null values during the serialization process. Null values in a column are not a problem per se, and dropping the null column fields doesn’t affect the usability of the DataSet object. You can successfully rebuild the DataSet object from XML, and data bound controls can easily manage null values. This behavior is hard coded in the DataSet class XML serializer and cannot be configured. If this behavior happens to be a problem for you, consider the following workaround. When making the query, use the T-SQL ISNULL function to automatically turn any null value into a blank but usable value. For example, you replace this query:

SELECT employeeid, lastname, firstname, region FROM Employees

And you use the following one that automatically converts any null values in the region column:

SELECT employeeid, lastname, firstname, ISNULL(region, '') AS region 
    FROM Employees

The example code nullvalues.aspx demonstrates XML serialization with and without null values and is available on the companion CD.

Part III: Interoperability