6.2 Standard Heuristic Join Order

Here is the heuristic for finding the best robust execution plan, including join order:

  1. Drive first to the table with the best (nearest to zero) filter ratio.

  2. With nested loopsdriving through the full, unique, primary-key indexesdrive down as long as possible, first to the best (nearest to zero) remaining filters.

  3. Only when necessary, follow nested loops up diagram links (against the direction of the arrow) through full, nonunique, foreign-key indexes.

These steps might not be perfectly clear now. Don't worry. The rest of this chapter explains each of these steps in detail. The heuristic is almost easier to demonstrate than to describe.

When the driving table turns out to be several levels below the top detail table (the root table, so-called because it lies at the root of the join tree), you will have to return to Step 2 after every move up the tree in Step 3. I describe some rare refinements for special cases, but by and large, finding the optimum plan is that simple, once you have the query diagram!

After tuning thousands of queries from real-world applications that included tens of thousands of queries, I can state with high confidence that these rules are just complex enough. Any significant simplification of these rules will leave major, common classes of queries poorly tuned, and any addition of complexity will yield significant improvement only for relatively rare classes of queries.

Later in the book, I discuss these rarer cases and what to do with them, but you should first understand the basics as thoroughly as possible.

There is one subtlety to consider when Steps 2 and 3 mention following join links up or down: the tables reached in the plan so far are consolidated into a single virtual node, for purposes of choosing the next step in the plan. Alternatively, it might be easier to visualize the tables reached so far in the plan as one cloud of nodes. From the cloud of already-reached nodes, it makes no difference to the rest of the plan how already-reached table nodes are arranged within that cloud, or in what order you reached them. The answer to the question "Which table comes next?" is completely independent of the order or method you used to join any earlier tables. Which tables are in the cloud affects the boundaries of the cloud and matters, but how they got there is ancient history, so to speak, and no longer relevant to your next decision.

When you put together an execution plan following the rules, you might find yourself focused on the most-recently-joined table, but this is a mistake. Tables are equally joined upward or downward from the cloud if they are joined upward or downward from any member of the set of already-joined tables, not necessarily the most-recently-joined table. You might even find it useful to draw the expanding boundaries of the cloud of so-far-reached tables as you proceed through the steps, to clarify in your mind which tables lie just outside the cloud. The relationship of remaining tables to the cloud clarifies whether they join to the cloud from above or from below, or do not even join to the cloud directly, being ineligible to join until you join further intermediate tables. I further illustrate this point later in the chapter.