The Michigan benchmark that is described in this chapter is a micro-benchmark that can be used to tune the performance of XML query-processing systems. In formulating this benchmark, we paid careful attention to the techniques that we use in generating the data and the query specification, so as to make it very easy for a benchmark user to identify any performance problems. The data generation process uses random numbers sparingly and still captures key characteristics of XML data sets, such as varying fanout and depth of the data tree. The queries are carefully chosen to focus on individual query operations and to demonstrate any performance problems related to the implementation of the algorithms used to evaluate the query operation. With careful analysis of the benchmark results, engineers can diagnose the strengths and weaknesses of their XML databases and quantitatively examine the impact of different implementation techniques, such as data storage structures, indexing methods, and query evaluation algorithms. The benchmark can also be used to examine the effect of scaling up the database size on the performance of the individual queries. In addition, the benchmark can also be used to compare the performance of various primitive query operations across different systems. Thus, this benchmark is a simple and effective tool to help engineers improve the performance of XML query-processing engines.
In designing the benchmark, we paid careful attention to the key criteria for a successful domain-specific benchmark that have been proposed by J. Gray (Gray 1993). These key criteria are relevant, portable, scalable, and simple. The proposed Michigan benchmark is relevant to testing the performance of XML engines because proposed queries are the core basic components of typical application-level operations of XML application. The Michigan benchmark is portable because it is easy to implement the benchmark on many different systems. In fact, the data generator for this benchmark data set is freely available for download from the Michigan benchmark's Web site (http://www.eecs.umich.edu/db/mbench). It is scalable through the use of a scaling parameter. It is simple since it contains only one data set and a set of simple queries, each with a distinct functionality test purpose.
We are continuing to use the benchmark to evaluate a number of native XML data management systems and traditional (object) relational database systems. We plan to publish the most up-to-date benchmark results at the Web site for this benchmark.