5.4 Simplified Query Diagrams

You will see that much of the detail on a full query diagram is unnecessary for all but the rarest problems. When you focus on the essential elements, you usually need only the skeleton of the diagram and the approximate filter ratios. You occasionally need join ratios, but usually only when either the detail join ratio is less than about 1.5 or the master join ratio is less than 0.9. Unless you have reason to suspect these uncommon values for the master-detail relationship, you can save yourself the trouble of even measuring these values. This, in turn, means that less data is required to produce the simplified join diagrams. You won't need table rowcounts for tables without filters. In practice, many-way joins usually have filters only on at most 3-5 tables, so this makes even the most complex query easy to diagram, without requiring many statistics-gathering queries.

Stripping away the usually unnecessary detail that I've just described, you can simplify Figure 5-5 to Figure 5-7.

Figure 5-7. Query diagram for Figure 5-5, simplified

Note that the detail join ratio from C to O in Figure 5-5 is less than 1.5, so continue to show it even in the simplified diagram in Figure 5-7.

When it comes to filters, even approximate numbers are often unnecessary if you know which filter is best and if the other competing filters do not share the same parent detail node. In this case, you can simply indicate the best filter with a capital F and lesser filters with a lowercase f. Further simplify Figure 5-7 to Figure 5-8.

Figure 5-8. Query diagram for Figure 5-7, fully simplified

Note that the detail join ratio from C to O is less than 1.5, so continue to show it even in the fully simplified diagram in Figure 5-8.

Although I've removed the filter ratios from Figure 5-8, you should continue to place an asterisk next to any unique filters (filters guaranteed to return no more than one row). You should also indicate actual filter values for lesser filters that share the same parent detail node. For example, if you have lesser filters on nodes B and C in Figure 5-9, show their actual filter ratios, as illustrated, since they share the parent detail node A.

Figure 5-9. Fully simplified query diagram, showing filter ratios for a shared parent

In practice, you can usually start with a simplified query diagram and add detail only as necessary. If the execution plan (which I explain how to derive in Chapter 6) you find from the simplified problem is so fast that further improvement is pointless, you are finished. (You might be surprised how often this happens.) For example, a batch query that runs in just a few seconds a few times per day is fast enough that further improvement is pointless. Likewise, you need not further tune any online query that runs in under 100 milliseconds that the end user community, as a whole, runs fewer than 1,000 times per day. If after this first round of tuning you think further improvement would still be worth the trouble, you can quickly check the feasibility of more improvement by checking whether you missed important join ratios. The fastest way to do this is to ask whether the single best filter accounts for almost all of the overall reduction in rowcount versus a wholly unfiltered query of just the most detailed table. Assuming that the join diagram maps out as an upside-down tree, the default expectation is that the whole query, without filters, would return the same number of rows as the most detailed table at the root of the join tree (at the top, that is).

With filters, you expect that each filter reduces that rowcount returned from the most detailed table (at the root of the join tree) by the filter ratio. If the best filter ratio times the rowcount of the most detailed table accounts for close to the number of rows that the whole query returns, you know you have not missed any important filter, and the simplified diagram suffices. On the other hand, if the most detailed table's rowcount times the best filter (or what you thought was the best filter!) would yield far more rows than the actual query yields, then you might have missed an important source of row reduction and you should gather more statistics. If the product of all filter ratios (calculated or guessed) times the rowcount of the most detailed table does not come close to the whole-query rowcount, you should suspect that you need further information. In particular, you might have hidden join filters, which are join ratios that unexpectedly turn out to be much less than 1.0; recognizing these and using them for a better plan can yield important further gains.