Data Warehousing and Large Database Query Strategies

In a data warehousing environment, your data is more than likely contained in a star or snowflake schema. These schemas are characterized by having a few fact tables with many rows that contain your detailed transactional data. Several smaller dimension tables might exist that are related to the fact table via foreign keys. The dimension tables often have few rows.

SQL Server has introduced several optimizer enhancements to improve query performance in a star or snowflake schema environment.

Cartesian Product of Dimension Tables

One way in which the SQL Server query optimizer has been enhanced is to support performing a cross join on dimension tables (creating a Cartesian product) before joining to the fact table. This means that the fact table has to participate in fewer joins, which will reduce the overall cost of the operation. This might sound strange, but it makes sense.

Consider the following example:

SELECT count(*) 
   FROM fact f, dimension1 d1, dimension2 d2
   WHERE f.key1 = d1.key1
   AND f.key2 = d2.key2

In this example, the fact table is being joined with both the dimension1 and dimension2 tables. Assume that the fact table contains 1,000,000 rows and that each dimension table contains 10 rows. The SQL Server optimizer will cross join dimension1 and dimension2 first. This creates a Cartesian product of 100 rows. This resultset is then joined with the fact table to produce the final result. As a result, the fact table has to be scanned only once. The logic and reasoning behind this approach is easy to see.

Semijoin Reduction

Another optimizer enhancement is a concept known as semijoin reduction. In this scenario, the optimizer will use index intersection on multiple indexes from the fact table. The fact table contains many foreign keys. If indexes exist on each of the foreign keys, SQL Server uses those indexes effectively when processing a query.

The optimizer joins each dimension table with an appropriate index of the fact table. Next, the optimizer intersects the results of those joins. Finally, the optimizer fetches rows from the fact table. By doing this, the optimizer avoids performing any full row fetch of the fact table until it is absolutely necessary.

    Part III: SQL Server Administration
    Part IV: Transact-SQL
    Part V: SQL Server Internals and Performance Tuning
    Part VI: Additional SQL Server Features