Depending on the circumstances, certain SQL constructs are preferable to others. For example, use of the EXISTS predicate is often preferable to DISTINCT. The next sections discuss the usage of such constructs.
The DISTINCT keyword used in a SELECT clause eliminates duplicate rows in the result set. To eliminate those duplicates, Oracle performs a sort, and that sort requires time and disk space. Therefore, avoid using DISTINCT if you can tolerate having duplicate rows returned by a query. If you can't tolerate the duplicate rows, or your application can't handle them, use EXISTS in place of DISTINCT.
For example, assume you are trying to find the names of customers who have orders. Your query has to be based on two tables: customer and cust_order. Using DISTINCT, your query would be written as follows:
SELECT DISTINCT c.cust_nbr, c.name FROM customer c JOIN cust_order o ON c.cust_nbr = o.cust_nbr;
The corresponding execution plan for this query is as follows. Note the SORT operation, which is a result of DISTINCT being used.
Query Plan ----------------------------------------- SELECT STATEMENT Cost = 3056 SORT UNIQUE MERGE JOIN INDEX FULL SCAN IND_ORD_CUST_NBR SORT JOIN TABLE ACCESS FULL CUSTOMER
To use EXISTS, the query needs to be rewritten as follows:
SELECT c.cust_nbr, c.name FROM customer c WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);
Here is the execution plan for the EXISTS version of the queries:
Query Plan --------------------------------------- SELECT STATEMENT Cost = 320 FILTER TABLE ACCESS FULL CUSTOMER INDEX RANGE SCAN IND_ORD_CUST_NBR
Notice that the second query eliminates the overhead of the sort operation, and therefore runs faster.
We discussed the GROUP BY and HAVING clauses in Chapter 4. Sometimes, when writing a GROUP BY query, you have a condition that you can specify in either the WHERE or HAVING clause. In situations where you have a choice, you'll always get better performance if you specify the condition in the WHERE clause. The reason is that it's less expensive to eliminate rows before they are summarized than it is to eliminate results after summarization.
Let's look at an example illustrating the advantage of WHERE over HAVING. Here's a query with the HAVING clause that reports the number of orders in the year 2000:
SELECT year, COUNT(*) FROM orders GROUP BY year HAVING year = 2001; YEAR COUNT(*) ---------- ---------- 2001 1440
The execution plan for this query is as follows:
Query Plan ------------------------------------------- SELECT STATEMENT Cost = 6 FILTER SORT GROUP BY INDEX FAST FULL SCAN ORDERS_PK
Now, look at that same query, but with the year restriction in the WHERE clause:
SELECT year, COUNT(*) FROM orders WHERE year = 2001 GROUP BY year; YEAR COUNT(*) --------- ---------- 2001 1440
The execution plan for this version of the query is:
Query Plan ------------------------------------- SELECT STATEMENT Cost = 2 SORT GROUP BY NOSORT INDEX FAST FULL SCAN ORDERS_PK
With the HAVING clause, the query performs the group operation first, and then filters the groups for the condition specified. The WHERE clause version of the query filters the rows before performing the group operation. The result of filtering with the WHERE clause is that there are fewer rows to summarize, and consequently, the query performs better.
However, you should note that not all types of filtering can be achieved using the WHERE clause. Sometimes, you may need to summarize the data first and then filter the summarized data based on the summarized values. In such situations, you have to filter using the HAVING clause, because only the HAVING clause can "see" summarized values. Moreover, there are situations when you may need to use the WHERE clause and the HAVING clause together in a query to filter the results the way you want. For details, see Chapter 4.
We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two SELECT statements. UNION combines the results of two SELECT statements, and then returns only distinct rows from the combination; duplicates are eliminated. It is, therefore, obvious that to remove the duplicates, UNION performs one extra step than UNION ALL. This extra step is a sort, which is costly in terms of performance. Therefore, whenever your application can handle duplicates or you are certain that no duplicates will result, consider using UNION ALL instead of UNION.
Let's look an example to understand this issue better. The following query uses UNION to return a list of orders where the sale price exceeds $50.00 or where the customer is located in region 5:
SELECT order_nbr, cust_nbr FROM cust_order WHERE sale_price > 50 UNION SELECT order_nbr, cust_nbr FROM cust_order WHERE cust_nbr IN (SELECT cust_nbr FROM customer WHERE region_id = 5); ORDER_NBR CUST_NBR ---------- ---------- 1000 1 1001 1 1002 5 1003 4 1004 4 1005 8 1006 1 1007 5 1008 5 1009 1 1011 1 1012 1 1015 5 1017 4 1019 4 1021 8 1023 1 1025 5 1027 5 1029 1 20 rows selected.
The execution plan for this UNION query is:
Query Plan --------------------------------------------------------------------------- SELECT STATEMENT Cost = 8 SORT UNIQUE UNION-ALL TABLE ACCESS FULL CUST_ORDER HASH JOIN TABLE ACCESS FULL CUSTOMER TABLE ACCESS FULL CUST_ORDER
The following query uses UNION ALL instead of UNION to get the same information:
SELECT order_nbr, cust_nbr FROM cust_order WHERE sale_price > 50 UNION ALL SELECT order_nbr, cust_nbr FROM cust_order WHERE cust_nbr IN (SELECT cust_nbr FROM customer WHERE region_id = 5); ORDER_NBR CUST_NBR ---------- ---------- 1001 1 1003 4 1005 8 1009 1 1012 1 1017 4 1021 8 1029 1 1001 1 1000 1 1002 5 1003 4 1004 4 1006 1 1007 5 1008 5 1009 1 1012 1 1011 1 1015 5 1017 4 1019 4 1023 1 1025 5 1027 5 1029 1 26 rows selected.
Note the duplicate rows in the output. However, note also that UNION ALL performs better than UNION, as you can see from the following execution plan:
Query Plan --------------------------------------------------------------------------- SELECT STATEMENT Cost = 4 UNION-ALL TABLE ACCESS FULL CUST_ORDER HASH JOIN TABLE ACCESS FULL CUSTOMER TABLE ACCESS FULL CUST_ORDER
You can see that the extra operation (SORT UNIQUE) in the UNION makes it run slower than UNION ALL.
As you have seen in Chapter 3, outer joins can be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways of looking at the same operation. Mixing LEFT and RIGHT outer joins in the same application can cause confusion, as you and other programmers must constantly shift your point-of-view from one approach to the other. Use both LEFT and RIGHT outer joins in the same query, and you'll find your confusion greatly magnified. For example:
SELECT e.lname, j.function, d.name FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id; LNAME FUNCTION NAME -------------------- ------------------------------ ------------ MILLER CLERK ACCOUNTING CLARK MANAGER ACCOUNTING KING PRESIDENT ACCOUNTING SMITH CLERK RESEARCH FORD ANALYST RESEARCH JONES MANAGER RESEARCH SCOTT ANALYST RESEARCH JAMES CLERK SALES BLAKE MANAGER SALES MARTIN SALESPERSON SALES TURNER SALESPERSON SALES ALLEN SALESPERSON SALES OPERATIONS
Such confusion is unnecessary. Since both LEFT and RIGHT outer joins represent the same operation, but from differing points of view, you can simply pick one point of view and use it consistently. For example, many programmers write all outer joins as either FULL or LEFT, ignoring RIGHT.
The preceding query uses a LEFT and then a RIGHT outer join to do the following:
Connect an outer join from employee to job, with employee as the required table
Connect another outer join from department to the results from Step 1, with department as the required table
Using parentheses to explicitly state the above order of operations, you can rewrite the query using all LEFT outer joins, as follows:
SELECT e.lname, j.function, d.name FROM department d LEFT OUTER JOIN (job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id) ON e.dept_id = d.dept_id; LNAME FUNCTION NAME -------------------- ------------------------------ ------------- MILLER CLERK ACCOUNTING CLARK MANAGER ACCOUNTING KING PRESIDENT ACCOUNTING SMITH CLERK RESEARCH FORD ANALYST RESEARCH JONES MANAGER RESEARCH SCOTT ANALYST RESEARCH JAMES CLERK SALES BLAKE MANAGER SALES MARTIN SALESPERSON SALES TURNER SALESPERSON SALES ALLEN SALESPERSON SALES OPERATIONS
The tradeoff here is between using parentheses and mixing RIGHT and LEFT outer joins. This second version of the query still joins employee to job, and then joins department to that result. The operations are exactly the same as in the previous version. This time, the parentheses make the order of operations clearer, and we personally find the second version of the query a bit easier to understand.