Diagram the following query:
SELECT ... FROM Customers C, ZIP_Codes Z, ZIP_Demographics D, Regions R WHERE C.ZIP_Code=Z.ZIP_Code AND Z.Demographic_ID=D.Demographic_ID AND Z.Region_ID=R.Region_ID AND C.Active_Flag='Y' AND C.Profiled_Flag='N' AND R.Name='SOUTHWEST' AND D.Name IN ('YUPPIE', 'OLDMONEY');
Make the usual assumptions about primary-key names, except that the primary key of ZIP_Codes is simply ZIP_Code, and note that the Name columns of both REGIONS and ZIP_Demographics are also uniquely indexed. You have 5,000,000 rows in Customers, 250,000 rows in Zip_Codes, 20 rows in ZIP_Demographics, and 5 rows in Regions. Assume all foreign keys are never null and always point to valid primary keys. The following query returns 2,000,000 rows:
SELECT COUNT(*) FROM Customers C WHERE Active_Flag='Y' AND Profiled_Flag='N';
Diagram the following query:
SELECT ... FROM Regions R, Zip_Codes Z, Customers C, Customer_Mailings CM, Mailings M, Catalogs Cat, Brands B WHERE R.Region_ID(+)=Z.Region_ID AND Z.ZIP_Code(+)=C.ZIP_Code AND C.Customer_ID=CM.Customer_ID AND CM.Mailing_ID=M.Mailing_ID AND M.Catalog_ID=Cat.Catalog_ID AND Cat.Brand_ID=B.Brand_ID AND B.Name='OhSoGreen' AND M.Mailing_Date >= SYSDATE-365 GROUP BY... ORDER BY ...
Start with the same assumptions and statistics as in Exercise 1. Customer_Mailings contains 30,000,000 rows. Mailings contains 40,000 rows. Catalogs contains 200 rows. Brands contains 12 rows and has an alternate unique key on Name. The following query returns 16,000 rows:
SELECT COUNT(*) FROM Mailings M WHERE Mailing_Date >= SYSDATE-365;
Diagram the following query:
SELECT ... FROM Code_Translations SPCT, Code_Translations TRCT, Code_Translations CTCT, Products P, Product_Lines PL, Inventory_Values IV, Brands B, Product_Locations Loc, Warehouses W, Regions R, Inventory_Taxing_Entities ITx, Inventory_Tax_Rates ITxR, Consignees C WHERE W.Region_ID=R.Region_ID AND Loc.Warehouse_ID=W.Warehouse_ID AND W.Inventory_Taxing_Entity_ID=ITx.Inventory_Taxing_Entity_ID AND ITx.Inventory_Taxing_Entity_ID= ITxR.Inventory_Taxing_Entity_ID AND ITxR.Effective_Start_Date <= SYSDATE AND ITxR.Effective_End_Date > SYSDATE AND ITxR.Rate>0 AND P.Product_ID=Loc.Product_ID AND Loc.Quantity>0 AND P.Product_Line_ID=PL.Product_Line_ID(+) AND P.Product_ID=IV.Product_ID AND P.Taxable_Inventory_Flag='Y' AND P.Consignee_ID=C.Consignee_ID(+) AND P.Strategic_Product_Code=SPCT.Code AND SPCT.Code_Type='STRATEGIC_PRODUCT' AND P.Turnover_Rate_Code=TRCT.Code AND TRCT.Code_Type='TURNOVER_RATE' AND P.Consignment_Type_Code=CTCT.CODE AND CTCT.Code_Type='CONSIGNMENT_TYPE' AND IV.Effective_Start_Date <= SYSDATE AND IV.Effective_End_Date > SYSDATE AND IV.Unit_Value>0 AND P.Brand_ID=B.Brand_ID AND B.Name='2Much$' AND ITX.Tax_Day_Of_Year='DEC31' GROUP BY... ORDER BY ...
Start with the same assumptions and statistics as in Exercises 1 and 2, except that W.Inventory_Taxing_Entity_ID points to a valid taxing entity only when it is not null, which is just 5% of the time. The counts for table rows are as follows:
Code_Translations has a two-part primary key: Code_Type, Code.
Inventory_Values and Inventory_Tax_Rates have a time-dependent primary key consisting of an ID and an effective date range, such that any given date falls in a single date range for any value of the key ID. Specifically, the join conditions to each of these tables are guaranteed to be unique by the Effective_Start_Date and Effective_End_Date conditions, which are part of the joins, not separate filters. (Unfortunately, there is no convenient way to enforce that uniqueness through an index; it is a condition created by the application.) The following queries return the rowcounts shown in the lines that follow each query:
Q1: SELECT COUNT(*) A1 FROM Inventory_Taxing_Entities ITx WHERE ITx.Tax_Day_Of_Year='DEC31' A1: 2 Q2: SELECT COUNT(*) A2 FROM Inventory_Values IV WHERE IV.Unit_Value>0 AND IV.Effective_Start_Date <= SYSDATE AND IV.Effective_End_Date > SYSDATE A2: 7,400 Q3: SELECT COUNT(*) A3 FROM Products P WHERE P.Taxable_Inventory_Flag='Y' A3: 8,300 Q4: SELECT COUNT(*) A4 FROM Product_Locations Loc WHERE Loc.Quantity>0 A4: 123,000 Q5: SELECT COUNT(*) A5 FROM Inventory_Tax_Rates ITxR WHERE ITxR.RATE>0 AND ITxR.Effective_Start_Date <= SYSDATE AND ITxR.Effective_End_Date > SYSDATE A5: 4 Q6: SELECT COUNT(*) A6 FROM Inventory_Values IV WHERE IV.Effective_Start_Date <= SYSDATE AND IV.Effective_End_Date > SYSDATE A6: 8,500 Q7: SELECT COUNT(*) A7 FROM INVENTORY_TAX_RATES ITxR WHERE ITxR.Effective_Start_Date <= SYSDATE AND ITxR.Effective_End_Date > SYSDATE A7: 4 Q8: SELECT COUNT(*) A8 FROM Code_Translations SPCT WHERE Code_Type = 'STRATEGIC_PRODUCT' A8: 3 Q9: SELECT COUNT(*) A9 FROM Code_Translations TRCT WHERE Code_Type = 'TURNOVER_RATE' A9: 2 Q10: SELECT COUNT(*) A10 FROM CTCT WHERE Code_Type = 'CONSIGNMENT_TYPE' A10: 3
Fully simplify the query diagram for Exercise 1. Try starting from the queries and the query statistics, rather than from the full query diagrams. Then, compare your result with what you get when you start from the full query diagrams that you already did.
Fully simplify the query diagram for Exercise 2, following the guidelines in Exercise 4.
Fully simplify the query diagram for Exercise 3, following the guidelines in Exercise 4.