6.4 A Special Case

Usually, following the tuning-process steps without deviation works fine, but the problem shown in Figure 6-4 turns out to offer, potentially, one last trick you could apply, especially with Oracle.

This section first describes an especially efficient trick that sometimes helps on Oracle. However, take heart if you need something like this outside of Oracle: at the end of the section, I describe a less efficient version of the trick for other databases.


6.4.1 The Oracle Solution

Return to Figure 6-4 and consider a special case: all tables other than M are relatively small and well cached, but M is very large and therefore poorly cached and much more expensive to access than the other tables. Furthermore, M is a special combinations table to express a many-to-many relationship between A1 and A2. An example of such a combinations table would be a table of actor/movie combinations for a movie-history database, linking Movies and Actors, when the relationship between these is many-to-many. In such a combinations table, it is natural to use a two-part primary key made up of the IDs of the linked tablesin this case, Movie_ID and Actor_ID. As is commonly the case, this combinations table has an index on the combination of foreign keys pointing to the tables A2 and A1. For this example, assume the order of the keys within that index has the foreign key pointing to A2 first, followed by the foreign key pointing to A1 .

Consider the costs of accessing each table in the plan I found earlier as the original solution to Figure 6-4. You find low costs for the tables up to M, then a much higher cost for M because you get many more rows from that table than the previous tables and accessing those rows leads to physical I/O. Following M, the database joins to just as many rows in A1 (since M has no filter), but these are much cheaper per row, because they are fully cached. Then, the filter in A1 drops the rowcount back down to a much lower number for the remaining tables in the plan. Therefore, you find a cost almost wholly dominated by the cost of accessing M, and it would be useful to reduce that cost.

As it happens, in this unusual case, you find an opportunity to get from the foreign key in M pointing to A2 to the foreign key pointing to A1, stored in the same multicolumn index in M, without ever touching the table M! The database will later need to read rows from the table itself, to get the foreign key pointing to A3 and probably to get columns in the query SELECT list. However, you can postpone going to the table until after the database reaches filtered tables A1 and C1. Therefore, the database will need to go only to 18% as many rows in M (0.3 / 0.6, picking up the filter ratios 0.3 on A1 and 0.6 on C1) as it would need to read if the database went to the table M as soon as it went to the index for M. This greatly reduces the query cost, since the cost of reading rows in table M, itself, dominates in this particular case.

No database makes it particularly easy to decouple index reads from table reads; a table read normally follows an index read immediately, automatically, even when this is not optimal. However, Oracle does allow for a trick that solves the problem, since Oracle SQL can explicitly reference rowids. In this case, the best join order is (B4, C5, C4, A2, B3, C2, C3, D1, D2, MI, A1, B1, C1, MT, A3, B5, C6, B2). Here, MI is the index on M(FkeyToA2,FkeyToA1), inserted into the join order where M was originally. MT is the table M, accessed later in the plan through the ROWID from MI and inserted into the join order after picking up the filters on A1 and C1. The trick is to refer to M twice in the FROM clause, once for the index-only access and once for a direct ROWID join, as follows, assuming that the name of the index on M(FkeyToA2,FkeyToA1) is M_DoubleKeyInd:

Select /*+ ORDERED INDEX(MI M_DoubleKeyInd) */ MT.Col1, MT.Col2,...
...
FROM B4, C5, C4, A2, B3, C2, C3, D1, D2, 
M MI, A1, B1, C1, M MT, A3, B5, C6, B2
WHERE ...
AND A2.Pkey=MI.FKeyToA2
AND A1.Pkey=MI.FKeyToA1
AND MI.ROWID=MT.ROWID
AND...

So, two joins to M are really cheaper than one in this unusual case! Note the two hints in this version of the query:


ORDERED

Specifies that the tables are to be joined in the order they occur in the FROM clause


INDEX(MI M_DoubleKeyInd)

Guarantees use of the correct index at the point in the order where you want index-only access for MI

Other hints might be necessary to get the rest of the plan just right. Also note the unusual rowid-to-rowid join between MI and MT, and note that the only references to MI are in the FROM clause and in the WHERE clause conditions shown. These references require only data (the two foreign keys and the rowid) stored in the index. This is crucial: Oracle avoids going to the table M as soon as it reaches the index on the primary key to M only because MI refers solely to the indexed columns and the rowids that are also stored in the index. Columns in the SELECT clause and elsewhere in the WHERE-clause conditions (such as a join to A3, not shown) all come from MT. Because of all this, the optimizer finds that the only columns required for MI are already available from the index. It counts that join as index-only. The direct-ROWID join to MT occurs later in the join order, and any columns from the M table are selected from MT.

However, the technique I've just described is not usually needed, for several reasons:

  • Combination indexes of the two foreign keys you need, in the right order you need, happen rarely.

  • Usually, the root detail table does not add enough cost, in both relative and absolute terms, to justify the trouble.

  • The benefits rarely justify creating a whole new multicolumn index if one is not already there.

6.4.2 Solving the Special Case Outside of Oracle

If you cannot refer directly to rowids in your WHERE clause, can this trick still work in some form? The only part of the trick that depends on rowids is the join between MI and MT. You could also join those table aliases on the full primary key. The cost of the early join to MI would be unchanged, but the later join to MT would require looking up the very same index entry you already reached in the index-only join to MI for those rows that remain. This is not as efficient, but note that these redundant hits on the index will surely be cached, since the execution plan touches the same index entries moments before, leaving only the cost of the extra logical I/O. Since the database throws most of the rows out before it even reaches MT, this extra logical I/O is probably much less than the savings on access to the table itself.