Two approaches to modeling the columns in an RDBMS as an XML DTD structure are
Element approach: Define the table name as the root element. It is nested by its columns, which are also defined as elements. An example is
<! ELEMENT Person (Name, Sex, Age)> <! ELEMENT Name (#PCDATA)> <! ELEMENT Sex (#PCDATA)> <! ELEMENT Age (#PCDATA)>
Attribute approach: The columns are defined as attributes of the root element. The previous example becomes
<! ELEMENT Person> <! ATTLIST Person Name CDATA #REQUIRED Sex CDATA #REQUIRED Age CDATA #REQUIRED>
In a relational database, data and structures are defined. Columns represent data. Tables and relationships form structure. This can be managed well in searching for data and for database navigation. XML attributes refer to the data. XML elements and subelements build the structure. In addition, attributes do not have the concept of ordering. This is similar to columns in a table. No matter how one changes the position of a column in a table, the data content inside a table does not change. For the first approach, tables and columns are both defined as element types. It may be ambiguous to a parser to decide the role of an element. The flexibility of searching for child elements is less than the attribute approach. This is because an element does have ordering meaning. Hence, it cannot fully represent the location-independence of data from an RDBMS concept.
Performance is another critical issue. There are two technologies in parsing XML documents: DOM and SAX. Our application can only use the native XML database Java DOM API. The principle of DOM is to pull the document into memory first and then present it as a tree. The process of converting the document to a tree structure involves traversal through the document. For example, the steps for retrieving the invoice price of the second item from an invoice are: (1) Go to parent element Invoice; (2) go to second Invoice_item child of Invoice; (3) get the price value from this Invoice_item.
If the element approach is used for the sample database, more steps are involved: (1) Go to parent element Invoice; (2) go to second Invoice_item child; (3) go to invoice price child of the second Invoice_item; (4) get the text value portion of invoice price.
Coding may be simpler if the attribute approach is used. Also when using attributes, there is the option of using enumerated types such that the value of a column can be constrained by a defined value.
Document size is another issue. For the element approach, a starting tag must be defined first, followed by the content, and then followed by an end tag. This is not necessary for the attribute approach. The syntax is "attribute_name = attribute_value". As the database size increases, the difference could be significant. The element approach costs time in parsing documents. Hence, the performance will be affected, since a mass of records must be processed. In addition, more disk space is required to store tags.
In defining the relationships between elements, containment is used for one-to-one and one-to-many cases. The ID/IDREF pointer approach is not recommended because XML is designed with the concept of containment. Using pointers costs more processing time, because DOM and SAX do not provide efficient methods to handle ID/IDREF relationships. Furthermore, navigating from an ID field to an IDREF field may not be easy. This becomes more difficult for IDREFS, since all IDREFS fields need to be tokenized. Each token is examined for all possible ID fields. Hence, containment is introduced to build relationships at the start. The pointer approach is used for those relationships that can go either way.
Top |