15.1 Know When to Use Specific Constructs

Depending on the circumstances, certain SQL constructs are preferable to others. For example, use of the EXISTS predicate is often preferable to DISTINCT. The next sections discuss the usage of such constructs.

15.1.1 EXISTS Is Preferable to DISTINCT

The DISTINCT keyword used in a SELECT clause eliminates duplicate rows in the result set. To eliminate those duplicates, Oracle performs a sort, and that sort requires time and disk space. Therefore, avoid using DISTINCT if you can tolerate having duplicate rows returned by a query. If you can't tolerate the duplicate rows, or your application can't handle them, use EXISTS in place of DISTINCT.

For example, assume you are trying to find the names of customers who have orders. Your query has to be based on two tables: customer and cust_order. Using DISTINCT, your query would be written as follows:

SELECT DISTINCT c.cust_nbr, c.name

FROM customer c JOIN cust_order o

ON c.cust_nbr = o.cust_nbr;

The corresponding execution plan for this query is as follows. Note the SORT operation, which is a result of DISTINCT being used.

Query Plan

-----------------------------------------

SELECT STATEMENT   Cost = 3056

  SORT UNIQUE

    MERGE JOIN

      INDEX FULL SCAN IND_ORD_CUST_NBR

      SORT JOIN

        TABLE ACCESS FULL CUSTOMER

To use EXISTS, the query needs to be rewritten as follows:

SELECT c.cust_nbr, c.name

FROM customer c

WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);

Here is the execution plan for the EXISTS version of the queries:

Query Plan

---------------------------------------

SELECT STATEMENT   Cost = 320

  FILTER

    TABLE ACCESS FULL CUSTOMER

    INDEX RANGE SCAN IND_ORD_CUST_NBR

Notice that the second query eliminates the overhead of the sort operation, and therefore runs faster.

15.1.2 WHERE Versus HAVING

We discussed the GROUP BY and HAVING clauses in Chapter 4. Sometimes, when writing a GROUP BY query, you have a condition that you can specify in either the WHERE or HAVING clause. In situations where you have a choice, you'll always get better performance if you specify the condition in the WHERE clause. The reason is that it's less expensive to eliminate rows before they are summarized than it is to eliminate results after summarization.

Let's look at an example illustrating the advantage of WHERE over HAVING. Here's a query with the HAVING clause that reports the number of orders in the year 2000:

SELECT year, COUNT(*)

FROM orders

GROUP BY year

HAVING year = 2001;



      YEAR   COUNT(*)

---------- ----------

      2001       1440

The execution plan for this query is as follows:

Query Plan

-------------------------------------------

SELECT STATEMENT   Cost = 6

  FILTER

    SORT GROUP BY

      INDEX FAST FULL SCAN ORDERS_PK

Now, look at that same query, but with the year restriction in the WHERE clause:

SELECT year, COUNT(*)

FROM orders

WHERE year = 2001

GROUP BY year;



     YEAR   COUNT(*)

--------- ----------

     2001       1440

The execution plan for this version of the query is:

Query Plan

-------------------------------------

SELECT STATEMENT   Cost = 2

  SORT GROUP BY NOSORT

    INDEX FAST FULL SCAN ORDERS_PK

With the HAVING clause, the query performs the group operation first, and then filters the groups for the condition specified. The WHERE clause version of the query filters the rows before performing the group operation. The result of filtering with the WHERE clause is that there are fewer rows to summarize, and consequently, the query performs better.

However, you should note that not all types of filtering can be achieved using the WHERE clause. Sometimes, you may need to summarize the data first and then filter the summarized data based on the summarized values. In such situations, you have to filter using the HAVING clause, because only the HAVING clause can "see" summarized values. Moreover, there are situations when you may need to use the WHERE clause and the HAVING clause together in a query to filter the results the way you want. For details, see Chapter 4.

15.1.3 UNION Versus UNION ALL

We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two SELECT statements. UNION combines the results of two SELECT statements, and then returns only distinct rows from the combination; duplicates are eliminated. It is, therefore, obvious that to remove the duplicates, UNION performs one extra step than UNION ALL. This extra step is a sort, which is costly in terms of performance. Therefore, whenever your application can handle duplicates or you are certain that no duplicates will result, consider using UNION ALL instead of UNION.

Let's look an example to understand this issue better. The following query uses UNION to return a list of orders where the sale price exceeds $50.00 or where the customer is located in region 5:

SELECT order_nbr, cust_nbr 

FROM cust_order 

WHERE sale_price > 50

UNION

SELECT order_nbr, cust_nbr 

FROM cust_order

WHERE cust_nbr IN 

(SELECT cust_nbr FROM customer WHERE region_id = 5);



 ORDER_NBR   CUST_NBR

---------- ----------

      1000          1

      1001          1

      1002          5

      1003          4

      1004          4

      1005          8

      1006          1

      1007          5

      1008          5

      1009          1

      1011          1

      1012          1

      1015          5

      1017          4

      1019          4

      1021          8

      1023          1

      1025          5

      1027          5

      1029          1



20 rows selected.

The execution plan for this UNION query is:

Query Plan

---------------------------------------------------------------------------

SELECT STATEMENT   Cost = 8

  SORT UNIQUE

    UNION-ALL

      TABLE ACCESS FULL CUST_ORDER

      HASH JOIN

        TABLE ACCESS FULL CUSTOMER

        TABLE ACCESS FULL CUST_ORDER

The following query uses UNION ALL instead of UNION to get the same information:

SELECT order_nbr, cust_nbr 

FROM cust_order 

WHERE sale_price > 50

UNION ALL

SELECT order_nbr, cust_nbr 

FROM cust_order

WHERE cust_nbr IN 

(SELECT cust_nbr FROM customer WHERE region_id = 5);



 ORDER_NBR   CUST_NBR

---------- ----------

      1001          1

      1003          4

      1005          8

      1009          1

      1012          1

      1017          4

      1021          8

      1029          1

      1001          1

      1000          1

      1002          5

      1003          4

      1004          4

      1006          1

      1007          5

      1008          5

      1009          1

      1012          1

      1011          1

      1015          5

      1017          4

      1019          4

      1023          1

      1025          5

      1027          5

      1029          1



26 rows selected.

Note the duplicate rows in the output. However, note also that UNION ALL performs better than UNION, as you can see from the following execution plan:

Query Plan

---------------------------------------------------------------------------

SELECT STATEMENT   Cost = 4

  UNION-ALL

    TABLE ACCESS FULL CUST_ORDER

    HASH JOIN

      TABLE ACCESS FULL CUSTOMER

      TABLE ACCESS FULL CUST_ORDER

You can see that the extra operation (SORT UNIQUE) in the UNION makes it run slower than UNION ALL.

15.1.4 LEFT Versus RIGHT OUTER JOIN

As you have seen in Chapter 3, outer joins can be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways of looking at the same operation. Mixing LEFT and RIGHT outer joins in the same application can cause confusion, as you and other programmers must constantly shift your point-of-view from one approach to the other. Use both LEFT and RIGHT outer joins in the same query, and you'll find your confusion greatly magnified. For example:

SELECT e.lname, j.function, d.name

FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id

           RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id;



LNAME                FUNCTION                       NAME

-------------------- ------------------------------ ------------

MILLER               CLERK                          ACCOUNTING

CLARK                MANAGER                        ACCOUNTING

KING                 PRESIDENT                      ACCOUNTING

SMITH                CLERK                          RESEARCH

FORD                 ANALYST                        RESEARCH

JONES                MANAGER                        RESEARCH

SCOTT                ANALYST                        RESEARCH

JAMES                CLERK                          SALES

BLAKE                MANAGER                        SALES

MARTIN               SALESPERSON                    SALES

TURNER               SALESPERSON                    SALES

ALLEN                SALESPERSON                    SALES

                                                    OPERATIONS

Such confusion is unnecessary. Since both LEFT and RIGHT outer joins represent the same operation, but from differing points of view, you can simply pick one point of view and use it consistently. For example, many programmers write all outer joins as either FULL or LEFT, ignoring RIGHT.

The preceding query uses a LEFT and then a RIGHT outer join to do the following:

  1. Connect an outer join from employee to job, with employee as the required table

  2. Connect another outer join from department to the results from Step 1, with department as the required table

Using parentheses to explicitly state the above order of operations, you can rewrite the query using all LEFT outer joins, as follows:

SELECT e.lname, j.function, d.name

FROM department d LEFT OUTER JOIN

        (job j LEFT OUTER JOIN employee e

         ON e.job_id = j.job_id)

     ON e.dept_id = d.dept_id;



LNAME                FUNCTION                       NAME

-------------------- ------------------------------ -------------

MILLER               CLERK                          ACCOUNTING

CLARK                MANAGER                        ACCOUNTING

KING                 PRESIDENT                      ACCOUNTING

SMITH                CLERK                          RESEARCH

FORD                 ANALYST                        RESEARCH

JONES                MANAGER                        RESEARCH

SCOTT                ANALYST                        RESEARCH

JAMES                CLERK                          SALES

BLAKE                MANAGER                        SALES

MARTIN               SALESPERSON                    SALES

TURNER               SALESPERSON                    SALES

ALLEN                SALESPERSON                    SALES

                                                    OPERATIONS

The tradeoff here is between using parentheses and mixing RIGHT and LEFT outer joins. This second version of the query still joins employee to job, and then joins department to that result. The operations are exactly the same as in the previous version. This time, the parentheses make the order of operations clearer, and we personally find the second version of the query a bit easier to understand.