10.3 Tuned Queries that Return Few Rows, Slowly

Rarely, a fully tuned query that returns few rows, even without aggregation, runs slowly. Since tuned queries drive from the best filter, this implies that such a query must reach many rows at some point in the execution plan before later filters reduce the rowcount to something moderate. The rules of thumb this book describes are designed to make this a rare event; in my own experience, it has happened just once or twice per year of tuning, on average.

10.3.1 Why Queries Sometimes Read Many Rows to Return Few

Modest-rowcount queries that are slow even after tuning can be traced to important filters that cannot all be reached before at least one large table is reached. Consider, for example, Figure 10-1. If the root detail table, M, has 50,000,000 rows, then the detail join ratios show that A1 and A2 have 10,000,000 rows each, and B1 and B2 have 100,000 rows each. Robust, index-driven execution plans with join order (B1, A1, M, A2, B2) or (B2, A2, M, A1, B1) are equally attractive, by symmetry. For concreteness, consider just the first join order. In that plan, you reach 100 rows in B1, 10,000 rows in A1, and 50,000 rows in M, A2, and B2 before you discard all but 50 rows for failing to meet the filter condition on B2.

Figure 10-1. A slow query that returns few rows

If you relax the robustness requirement, you can preread the 100 rows that meet the filter in B2 and join those to the earlier-reached rows by a hash join. You could even prejoin those 100 rows to A2 with nested loops and perform a hash join between rows reached with nested loops to (B1, A1, M) and rows reached with nested loops that join the combination (B2, A2) (assuming you could get your optimizer to produce such an exotic plan). This would reduce the rowcount you would need to reach A2 to 10,000, while keeping the rowcount to reach B2 at 100. However, none of these strategies eliminates the need to read 50,000 rows from the largest, least well-cached table, M, and that will surely take longer than you would like for a query that returns just 50 rows.

The root of this problem is a join diagram that has selective filters distributed across two or more branches under some large table. The result is that the ultimate level of filtration (a net filter ratio equal to the product of the individual filter ratios) cannot even be approached until after the database has already reached the large table. The highly selective combination of filters is reached too late in the execution plan to avoid excessive reads.

10.3.2 Optimizing Queries with Distributed Filters

To make optimum use of distributed filters, you need to somehow bring them closer together in the query diagram, preferably into a single table. Consider Figure 10-1 again. Assume that each of the filters on B1 and B2 is an equality condition on some column of that filter's table. A normalized database design places the filter column for B1 where it is because it encodes a property that you need to specify only once per B1 entity. Furthermore, this is a property that you could not infer from knowledge of entities for any master tables that join one-to-many to B1. If you placed this column in A1, that would be a denormalization, defined as a property you could infer from knowledge of A1's matching master entity stored in B1. Such a denormalization would require redundant data to be stored in the A1 table, since all 100 A1 entities (on average) that join to any particular B1 row must have the same value for this inherited column. However, in principle, all properties of master-table entities are inherited properties of the detail entities that match those master table entities.

The form of denormalization I describe is not the only form of denormalization there is. Whole books are written on the subject of normalizing databases, but this simplest form of denormalization is the only form pertinent to this discussion.

For example, if Customer_ID is a property of Orders, it is also an inherited property of the Order_Details that match those orders. With denormalization, you can always push properties up the join tree to nodes above the nodes that own the properties under normalized design. Such inheritance of filter properties need not stop at the first level. For example, Customer_Name, a normalized property of Customers, could be inherited up two levels, through Orders, all the way to Order_Details.

This possibility of pushing filter columns up the join tree as high as you wish points to the ultimate solution to performance problems of distributed filters. To avoid distributed-filter problems, keep pushing the most selective filter conditions upward until they come together in a single node, which will inherit a combined filter that has the product of the original filter selectivities. In the extreme case, all filters rise all the way to the root detail node, and the query will read only the few rows it ultimately returns from that table and join downward to the same number of rows from the downstream master tables. In the problem shown in Figure 10-1, table M acquires two denormalized columns, from B1 and B2, respectively. The combined filter on these two columns has a selectivity of 0.000001 (0.001 x 0.001), or one row out of 1,000,000, as shown in Figure 10-2. The optimum execution plan for that query diagram reads 50 rows from M and does nested loops, through primary-key indexes, to the 50 rows from each of the other tables, A1, A2, B1, and B2. This is a very fast plan.

Figure 10-2. Promoting filters to higher tables, using denormalization

It might seem odd to mention this powerful technique for combining filters so late in the book. Many texts assume that denormalization is a ubiquitous necessity, but I have not found this to be the case at all. On most applications I have tuned, I have not needed to add a single denormalization for tuning purposes, and tuning purposes are the only good reason for denormalization. If you follow the techniques of the rest of this book, you too should have to use denormalization only rarely, only when you identify a specific important query that cannot otherwise be made fast enough.

Many queries can be made faster through denormalization, but that is not the point. The point is that, even without denormalization, almost all queries can be made fast enough, with correct tuning. An improvement of a few milliseconds does not justify the costs of denormalization. Most performance-driven denormalizations deliver little more than a tiny improvement versus the best optimization possible without denormalization.

If done perfectly, with rigorously designed database triggers that automatically keep the denormalized data perfectly in sync, denormalization can be functionally safe. However, it is not free; it takes up disk space and slows inserts, updates, and deletes that must fire the triggers that keep the data in sync. Given the unavoidable costs of denormalization, I recommend it only when you find a specific, high-priority SQL statement that cannot be made fast enough by other means, which is a very rare case .

Time goes, you say? Ah no! Alas, Time stays, we go.

Henry Austin Dobson, The Paradox of Time