One of the first attempts to develop a benchmark for XML databases was XMach-1 (Böhme and Rahm 2001). It is a multiuser benchmark for the performance evaluation of XML data management systems. The benchmark application is Web based. The XML data types are text documents, schema-less data, and structured data. In their paper, the authors specify the benchmark database first. Then they determine the performance of an application by defining a set of SQL queries. Their most important performance metric is defined as Xqps (XML queries per second) for measuring throughput. By identifying the performance factors for XML data storage and query processing, they claim that XMach-1 can be used to evaluate both native and XML-enabled database products.
Another benchmark, the "XML Benchmark Project," was recently completed (Schmidt, Waas, Kersten, Florescu, Manolescu et al. 2001). According to the authors, their set of queries captures the essential factors of managing XML data: path expression, NULL values query processing, full-text search, hierarchical data processing, reconstruction, ordering, aggregation, join, indexing, and coercion between data types. In their results, they conclude that there is no single best generic mapping from text documents to database.
Other research papers have mainly concentrated on mapping schemes for storing XML data in relational databases and studying the performance of relational databases to process XML data (e.g., Florescu and Kossmann 1999a). F. Tian et al. (Tian et al. 2000) compare both design and performance of storing XML data using different strategies. Their evaluation includes loading databases and reconstructing XML documents, selection query processing, indexing, scan selection, join query processing, and containment query processing. Their conclusion is that the use of a DTD is necessary for achieving good performance and complex-structured data presentation. C. Zhang et al. (Zhang et al. 2000) compare the performance of two relational database products using XML for an information retrieval system.
Top |