For this exercise, imagine that the base development is performed on Oracle, with later testing to check that the same SQL functions correctly and performs well on DB2 and SQL Server. You learned of this SQL because it performed more slowly than expected on Oracle, so you already suspect it leads to a poor execution plan on at least that database. You will need to check execution plans on the other databases, which have not yet been tested.
Place the SQL in a file named tmp.sql and run the script ex.sql, as described in Chapter 3. The result is as follows:
PLAN -------------------------------------------------------------------------------- SELECT STATEMENT SORT ORDER BY NESTED LOOPS NESTED LOOPS NESTED LOOPS NESTED LOOPS NESTED LOOPS TABLE ACCESS FULL 4*CUSTOMERS TABLE ACCESS BY INDEX ROWID 1*ORDERS INDEX RANGE SCAN ORDER_CUSTOMER_ID TABLE ACCESS BY INDEX ROWID 2*ORDER_DETAILS INDEX RANGE SCAN ORDER_DETAIL_ORDER_ID TABLE ACCESS BY INDEX ROWID 5*SHIPMENTS INDEX UNIQUE SCAN SHIPMENT_PKEY TABLE ACCESS BY INDEX ROWID 6*ADDRESSES INDEX UNIQUE SCAN ADDRESS_PKEY TABLE ACCESS BY INDEX ROWID 3*PRODUCTS INDEX UNIQUE SCAN PRODUCT_PKEY
You notice that your database is set up to use the rule-based optimizer, so you switch to cost-based optimization, check that you have statistics on the tables and indexes, and check the plan again, finding a new result:
PLAN -------------------------------------------------------------------------------- SELECT STATEMENT SORT ORDER BY HASH JOIN TABLE ACCESS FULL 3*PRODUCTS HASH JOIN HASH JOIN HASH JOIN HASH JOIN TABLE ACCESS FULL 4*CUSTOMERS TABLE ACCESS FULL 1*ORDERS TABLE ACCESS FULL 2*ORDER_DETAILS TABLE ACCESS FULL 5*SHIPMENTS TABLE ACCESS FULL 6*ADDRESSES
Neither execution plan is close to the optimum plan. Instead, both the rule-based and the cost-based optimization plans drive from full table scans of large tables. The database ought to reach the driving table on a highly selective index, so you know that an improvement is certainly both necessary and possible.
Place the SQL in tmp.sql and run the following command according to the process described in Chapter 3:
cat head.sql tmp.sql tail.sql | db2 +c +p -t
The result is an error; DB2 complains that it sees inconsistent column types in the condition on Phone_Number. You discover that the Phone_Number column is of the VARCHAR type, which is incompatible with the number type of the constant 6505551212.
Unlike Oracle, DB2 does not implicitly convert character-type columns to numbers when SQL compares inconsistent datatypes. This is just as well, in this case, since such a conversion might deactivate an index on Phone_Number, if there is one. You might even suspect, already, that this is precisely what has caused poor performance in the Oracle baseline development environment.
You fix the problem in the most obvious way, placing quotes around the phone number constant to make it a character type:
SELECT C.Phone_Number, C.Honorific, C.First_Name, C.Last_Name, C.Suffix, C.Address_ID, A.Address_ID, A.Street_Addr_Line1, A.Street_Addr_Line2, A.City_Name, A.State_Abbreviation, A.ZIP_Code, OD.Deferred_Ship_Date, OD.Item_Count, P.Prod_Description, S.Shipment_Date FROM Orders O, Order_Details OD, Products P, Customers C, Shipments S, Addresses A WHERE OD.Order_ID = O.Order_ID AND O.Customer_ID = C.Customer_ID AND OD.Product_ID = P.Product_ID AND OD.Shipment_ID = S.Shipment_ID AND S.Address_ID = A.Address_ID AND C.Phone_Number = '6505551212' AND O.Business_Unit_ID = 10 ORDER BY C.Customer_ID, O.Order_ID Desc, S.Shipment_ID, OD.Order_Detail_ID;
Placing this new version of the SQL in tmp.sql, you again attempt to get the execution plan:
$ cat head.sql tmp.sql tail.sql | db2 +c +p -t DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. OPERATOR_ID TARGET_ID OPERATOR_TYPE OBJECT_NAME COST ----------- --------- ------------- ------------------ ----------- 1 - RETURN - 260 2 1 NLJOIN - 260 3 2 NLJOIN - 235 4 3 NLJOIN - 210 5 4 TBSCAN - 185 6 5 SORT - 185 7 6 NLJOIN - 185 8 7 NLJOIN - 135 9 8 FETCH CUSTOMERS 75 10 9 IXSCAN CUST_PH_NUMBER 50 11 8 FETCH ORDERS 70 12 11 IXSCAN ORDER_CUST_ID 50 13 7 FETCH ORDER_DETAILS 75 14 13 IXSCAN ORDER_DTL_ORD_ID 50 15 4 FETCH PRODUCTS 50 16 15 IXSCAN PRODUCT_PKEY 25 17 3 FETCH SHIPMENTS 75 18 17 IXSCAN SHIPMENT_PKEY 50 19 2 FETCH ADDRESSES 75 20 19 IXSCAN ADDRESS_PKEY 50 20 record(s) selected. DB20000I The SQL command completed successfully. $
That's more like it, just the execution plan you chose when you analyzed the SQL top-down, except for the minor issue of reaching Products before Shipments, which will have virtually no effect on the runtime. Since the type inconsistency involving Phone_Number might require correcting on SQL Server and Oracle, you need to try this modified version immediately on the other databases.
Suspecting that you already have the solution to slow performance for this query, you fire up SQL Server's Query Analyzer and use set showplan_text on to see a concise view of the execution plan of the statement modified with C.Phone_Number = '6505551212' to correct the type inconsistency. A click on the Query Analyzer's Execute-Query button results in the following output:
StmtText ------------------------------------------------------------------------------- |--Bookmark Lookup(...(...[Products] AS [P])) |--Nested Loops(Inner Join) |--Bookmark Lookup(...(...[Addresses] AS [A])) | |--Nested Loops(Inner Join) | |--Sort(ORDER BY:([O].[Customer_ID] ASC, [O].[Order_ID] DESC, (wrapped line) [OD].[Shipment_ID] ASC, [OD].[Order_Detail_ID] ASC)) | | |--Bookmark Lookup(...(...[Shipments] AS [S])) | | |--Nested Loops(Inner Join) | | |--Bookmark Lookup(...(...[Order_Details] AS [OD])) | | | |--Nested Loops(Inner Join) | | | |--Filter(WHERE:([O].[Business_Unit_ ID]=10)) | | | | |--Bookmark Lookup(...(... [Orders] AS [O])) | | | | |--Nested Loops(Inner Join) | | | | |--Bookmark Lookup(... (... (wrapped line) [Customers] AS [C])) | | | | | |--Index Seek(... (... (wrapped line) [Customers].[Customer_Phone_Number] (wrapped line) AS [C]), SEEK:([C].[Phone_Number]='6505551212') ORDERED) | | | | |--Index Seek(...(... (wrapped line) [Orders].[Order_Customer_ID] AS [O]), (wrapped line) SEEK:([O].[Customer_ID]=[C].[Customer_ID]) ORDERED) | | | |--Index Seek(...(... (wrapped line) [Order_Details].[Order_Detail_Order_ID] (wrapped line) AS [OD]), SEEK:([OD].[Order_ID]=[O].[Order_ID]) ORDERED) | | |--Index Seek(...(...[Shipments].[Shipment_PKey] (wrapped line) AS [S]), SEEK:([S].[Shipment_ID]=[OD].[Shipment_ID]) ORDERED) | |--Index Seek(...(...[Addresses].[Address_PKey] (wrapped line) AS [A]), SEEK:([A].[Address_ID]=[S].[Address_ID]) ORDERED) |--Index Seek(...(...[Products].[Product_PKey] (wrapped line) AS [P]), SEEK:([P].[Product_ID]=[OD].[Product_ID]) ORDERED) (19 row(s) affected)
Good news! The corrected SQL leads to exactly the optimum plan here. Just out of curiosity, you check the execution plan for the original SQL, and you find the same result! Evidently, SQL Server is doing the data conversion on the constant, avoiding disabling the index.