17.5 Using the Data Objects to Edit XML

Editing relational tables through XML objects is rarely a good idea. You immediately introduce synchronization problems and lose the ability to enforce referential and identity integrity. There's a more useful side of the ADO.NET and XML, however: using the ADO.NET data objects to work with an ordinary XML file.

For example, your program might need to store user-specific settings. One choice for a location to store these settings is the registry, but this isn't always appropriate, particularly if users are regularly logging in to the program from different workstations. Another approach might be to use a central database server. However, you may not have the necessary database server in place, the program may need to support local (disconnected use), or you may not want to introduce additional network traffic. In these cases, you can adopt a different approach and assign a small XML configuration file to each user. To read and write the configuration file, use the DataSet.ReadXml( ) and DataSet.WriteXml( ) methods.

To evaluate this technique, here are a few considerations:

  • This approach is best if you want to interact with the data using a table and row-based syntax. You don't interact natively with the XML.

  • This approach has the same limitations as any other file-based storage. If you need to write large amounts of data, it's slow, and there is no way to manage concurrent updates. If you need these features, you need a full-fledged RDBMS.

  • This approach works best if you create an XML schema to describe your data format. Otherwise, type conversion errors and other schema inference problems are possible.

  • This approach works only if your XML document follows a DataSet-like organization of elements. This leads to problems for XML documents with deeply nested structures or XML documents that duplicate some elements to represent many-to-many relationships.

17.5.1 Inferring XML Structure

When using the ADO.NET objects to process an ordinary XML file, you should always create an associated schema. Otherwise, ADO.NET will attempt to infer a schema based on the structure of the XML document. This schema may not always be appropriate.

If you load an XML document that has additional information (attributes or elements) beyond what the schema allows, ADO.NET will not throw an exception. However, ADO.NET will ignore the extra information. When you save the file later, it will not contain the additional information.

If you don't provide a schema, you can still load XML data into a DataSet. However, you're likely to discover a number of ADO.NET idiosyncrasies. The process ADO.NET uses to create a "best guess" schema by reading an XML document is called schema inference.

The DataSet follows a rigid set of inference rules:

  • Elements with attributes become tables. Attributes are inferred as columns.

  • Elements with child elements become tables. Elements that have no attributes or child elements and don't repeat are inferred as columns.

  • Elements that repeat are inferred as a single table.

  • If the document or root element has no attributes and no child elements that would be inferred as columns, it is inferred as a DataSet. Otherwise, the document element is inferred as a table.

  • For elements that are inferred as tables and contain text, but have no child elements, a new column named TableName_Text is created. The text of each of the elements is placed in this column. If an element is inferred as a table and has text and child elements, any contained text is ignored.

  • If ADO.NET infers that there is a table element nested with another table element, it automatically creates a nested DataRelation between the two tables. A new, primary key column named TableName_Id is added to both tables and used by the DataRelation. A ForeignKeyConstraint is created between the two tables using the TableName_Id column.

The last point is particularly interesting. With a nested table, ADO.NET always generates a new primary key column in the parent and a new foreign key column in the child, even if you already have elements that could serve this purpose.

Finally, note that schema inference has another major limitation: it doesn't detect data types. Even if you have numeric ID values, they are interpreted as strings.

One easy way to create a schema for your custom XML files is to write a small utility that defines a new DataSet. You can then use the WriteXml( ) method and the WriteXmlSchema( ) method to generate the matching schema.

    Part I: ADO.NET Tutorial
    Part II: ADO.NET Core Classes
    Part III: API Quick Reference