9.4 Unfiltered Joins

The method so far generally assumes that you are tuning queries against large tables, since these dominate the set of SQL that needs tuning. For such queries, especially when they are online queries, you can generally count on finding at least one selective filter that leads to an attractive driving table. Occasionally, especially for large batch queries and online queries of small tables, you will tune unfiltered joinsjoins of whole tables without restriction. For example, consider Figure 9-5.

Figure 9-5. An unfiltered three-way join

How do you optimize such a query, with no guidance based on filter selectivities? For nested-loops plans, it hardly matters which join order you choose, as long as you follow the join tree. However, these are precisely the queries that most reward hash joins, or sort-merge joins when hash joins are unavailable. Assuming hash joins are available, the database should read all three tables with full table scans and should hash the smaller tables, A1 and A2, caching these hashed tables in memory if possible. Then, during the single pass through the largest table, M, each hashed row is rapidly matched with the in-memory matching hashed rows of A1 and A2. The cost of the query, ideally, is roughly the cost of the three full table scans. The database can't do better than that, even theoretically, given that you need all the rows from all three tables. Cost-based optimizers are generally good at finding optimum plans for queries such as these, without manual help.

When either A1 or A2 is too large to cache in memory, consider the more robust nested-loops plan driving from table M, and check how much slower it turns out to be. Indexed lookups, one row at a time, will likely be much slower, but they will eventually succeed, while the hash join runs the risk of exhausting temporary disk space if A1 or A2 is too big to hold in memory.