5.2 Noncorrelated Subqueries

Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. You can divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:

  • Single-row, single-column subqueries

  • Multiple-row, single-column subqueries

  • Multiple-column subqueries

Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.

5.2.1 Single-Row, Single-Column Subqueries

A subquery that returns a single row with a single column is treated like a scalar by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. Such a subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed. The following query illustrates the utility of single-row, single-column subqueries by finding all employees earning an above-average salary. The subquery returns the average salary, and the containing query then returns all employees who earn more than that amount:

SELECT lname 

FROM employee

WHERE salary > (SELECT AVG(salary) 

                FROM employee);



LNAME

--------------------

BLAKE

CLARK

SCOTT

KING

FORD

As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table. For example, many applications include maintenance routines that clean up operational data, such as exception or load logs. Every week, a script might delete all but the latest day's activity. For example:

DELETE FROM load_log 

WHERE load_dt < (SELECT MAX(TRUNC(load_dt)) 

                 FROM load_log);

Noncorrelated subqueries are also commonly found outside the WHERE clause, as illustrated by the following query, which identifies the salesperson responsible for the most orders:

SELECT sales_emp_id, COUNT(*) 

FROM cust_order

GROUP BY sales_emp_id

HAVING COUNT(*) = (SELECT MAX(COUNT(*)) 

                   FROM cust_order 

                   GROUP BY sales_emp_id);



SALES_EMP_ID   COUNT(*)

------------ ----------

        7354          8

This subquery calculates the number of orders attributable to each salesperson, and then applies the MAX function to return only the highest number of orders. The containing query performs the same aggregation as the subquery and then keeps only those salespeople whose total sales count matches the maximum value returned by the subquery. Interestingly, the containing query can return more than one row if multiple salespeople tie for the maximum sales count, while the subquery is guaranteed to return a single row and column. If it seems wasteful that the subquery and containing query both perform the same aggregation, it is; see Chapter 14 for more efficient ways to handle these types of queries.

So far, you have seen scalar subqueries in the WHERE and HAVING clauses of SELECT statements, along with the WHERE clause of a DELETE statement. Before delving deeper into the different types of subqueries, let's explore where else subqueries can and can't be utilized in SQL statements:

  • The FROM clause may contain any type of noncorrelated subquery.

  • The SELECT and ORDER BY clauses may contain scalar subqueries.

  • The GROUP BY clause may not contain subqueries.

  • The START WITH and CONNECT BY clauses, used for querying hierarchical data, may contain subqueries and will be examined in detail in Chapter 8.

  • The WITH clause contains a named noncorrelated subquery that can be referenced multiple times within the containing query but executes only once (see the examples later in this chapter).

  • The USING clause of a MERGE statement may contain noncorrelated subqueries.

  • The SET clause of UPDATE statements may contain scalar or single-row, multiple-column subqueries.

  • INSERT statements may contain scalar subqueries in the VALUES clause.

5.2.2 Multiple-Row, Single-Column Subqueries

Now that you know how to use single-row, single-column subqueries, let's explore how to use subqueries that return multiple rows. When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL are used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any member of the set or all members of the set. Consider the following query:

SELECT fname, lname

FROM employee

WHERE dept_id = 30 AND salary >= ALL

 (SELECT salary

  FROM employee 

  WHERE dept_id = 30);



FNAME                LNAME

-------------------- --------------------

MARION               BLAKE

The subquery returns the set of salaries for department 30, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary returned by the subquery. Thus, this query retrieves the name of the highest paid person in department 30. While every employee has a salary >= any of the salaries in the department, only the highest paid employee has a salary >= all of the salaries in the department. If multiple employees tie for the highest salary in the department, multiple names will be returned.

Another way to phrase the previous query is to find the employee whose salary is not less than any other salary in her department. You can do this using the ANY operator:

SELECT fname, lname

FROM employee

WHERE dept_id = 30 AND NOT salary < ANY

 (SELECT salary

  FROM employee 

  WHERE dept_id = 30);

There are almost always multiple ways to phrase the same query. One of the challenges of writing SQL is striking the right balance between efficiency and readability. In this case, we might prefer using AND salary >= ALL over AND NOT salary < ANY because the first variation is easier to understand; however, the latter form might prove more efficient, since each evaluation of the subquery results requires from 1 to N comparisons when using ANY versus exactly N comparisons when using ALL.

If there are 100 people in the department, each of the 100 salaries needs to be compared to the entire set of 100. When using ANY, the comparison can be suspended as soon as a larger salary is identified in the set, whereas using ALL requires 100 comparisons to ensure that there are no smaller salaries in the set.


The next query uses the ANY operator to find all employees who have been with the company longer than any top-level manager:

SELECT fname, lname

FROM employee

WHERE manager_emp_id IS NOT NULL

  AND hire_date < ANY

 (SELECT hire_date

  FROM employee

  WHERE manager_emp_id IS NULL);



FNAME                LNAME

-------------------- --------------------

JOHN                 SMITH

KEVIN                ALLEN

CYNTHIA              WARD

TERRY                JONES

KENNETH              MARTIN

MARION               BLAKE

CAROL                CLARK

MARY                 TURNER

The subquery returns the set of hire dates for all top-level managers, and the containing query returns the names of non-top-level managers whose hire date is previous to any returned by the subquery.

For the previous three queries, failure to include either the ANY or ALL operators may result in the following error:

ORA-01427: single-row subquery returns more than one row

The wording of this error message is a bit confusing. After all, how can a single-row subquery return multiple rows? What the error message is trying to convey is that a multiple-row subquery has been identified where only a single-row subquery is allowed. If you are not absolutely certain that your subquery will return exactly one row, you must include ANY or ALL to ensure your code doesn't fail in the future.

Along with ANY and ALL, you may also use the IN operator for working with multi-row subqueries. Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE if a match is found in the set returned by the subquery. The following query uses IN to postpone shipment of all orders containing parts that are not currently in stock:

UPDATE cust_order 

SET expected_ship_dt = TRUNC(SYSDATE) + 1

WHERE ship_dt IS NULL 

  AND order_nbr IN

  (SELECT l.order_nbr 

   FROM line_item l INNER JOIN part p

   ON l.part_nbr = p.part_nbr 

   WHERE p.inventory_qty = 0);

The subquery returns the set of orders requesting out-of-stock parts, and the containing UPDATE statement modifies the expected ship date of all orders in the set. We think you will agree that IN is more intuitive than = ANY, which is why IN is almost always used in such situations. Similarly, you can use NOT IN instead of using != ALL as demonstrated by the next query, which deletes all customers who haven't placed an order in the past five years:

DELETE FROM customer

WHERE cust_nbr NOT IN 

  (SELECT cust_nbr 

   FROM cust_order

   WHERE order_dt >= TRUNC(SYSDATE) -- (365 * 5));

The subquery returns the set of customers that have placed an order in the past five years, and the containing DELETE statement removes all customers that are not in the set returned by the subquery.

Finding members of one set that do not exist in another set is referred to as an anti-join. As the name implies, an anti-join is the opposite of a join; rows from table A are returned if the specified data is not found in table B. The Oracle optimizer can employ multiple strategies for formulating execution plans for such queries, including a merge anti-join or a hash anti-join.

Since this is not explicitly a tuning book (in our opinion, mastering the SQL implementation is the best tuning tool available), we will refrain from delving into the inner workings of the Oracle optimizer and how the optimizer can be influenced via hints. For more information, see Oracle SQL Tuning Pocket Reference (O'Reilly).


5.2.3 Multiple-Column Subqueries

While all of the previous examples compare a single column from the containing SQL statement to the result set returned by the subquery, it is also possible to issue a subquery against multiple columns. Consider the following UPDATE statement, which rolls up data from an operational table into an aggregate table:

UPDATE monthly_orders SET

  tot_orders = (SELECT COUNT(*) 

    FROM cust_order 

    WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') 

      AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY')

      AND cancelled_dt IS NULL),

  max_order_amt = (SELECT MAX(sale_price) 

    FROM cust_order 

    WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') 

      AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY')

      AND cancelled_dt IS NULL),

  min_order_amt = (SELECT MIN(sale_price) 

    FROM cust_order 

    WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') 

      AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') 

      AND cancelled_dt IS NULL),

  tot_amt = (SELECT SUM(sale_price) 

    FROM cust_order 

    WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') 

      AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY')

      AND cancelled_dt IS NULL)

WHERE month = 7 and year = 2001;

The UPDATE statement modifies four columns in the monthly_orders table, and values for each of the four columns are calculated by aggregating data in the cust_order table. Looking closely, you can see that the WHERE clauses for all four subqueries are identical; only the aggregation function differs in the four queries. The next query demonstrates how all four columns can be populated with a single trip through the cust_order table by using a single subquery that returns four columns:

UPDATE monthly_orders 

SET (tot_orders, max_order_amt, min_order_amt, tot_amt) =

  (SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price)

   FROM cust_order

   WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') 

     AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') 

     AND cancelled_dt IS NULL)

WHERE month = 7 and year = 2001;

This second statement achieves the same result more efficiently than the first by performing four aggregations during one trip through the cust_order table, rather than one aggregation during each of four separate trips.

Whereas the previous example demonstrates the use of a multiple-column subquery in the SET clause of an UPDATE statement, such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement. The next statement deletes all items from open orders that include discontinued parts:

DELETE FROM line_item

WHERE (order_nbr, part_nbr) IN

 (SELECT c.order_nbr, p.part_nbr

  FROM cust_order c INNER JOIN line_item li

  ON c.order_nbr = li.order_nbr

  INNER JOIN part p 

  ON li.part_nbr = p.part_nbr

  WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL

    AND p.status = 'DISCONTINUED');

Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table.

5.2.4 The WITH Clause

You may find yourself in a situation where you need to reference the same noncorrelated subquery multiple times in the same query. For example, let's say you want to show all employees making more than the average salary, and you want to show how much above the average each employee makes. You can formulate this using the same subquery in both the FROM and WHERE clauses:

SELECT e.emp_id, e.lname, e.fname, 

  ROUND(e.salary - (SELECT AVG(salary) FROM employee)) above_avg

FROM employee e

WHERE e.salary > (SELECT AVG(salary) FROM employee);



    EMP_ID LNAME                FNAME                 ABOVE_AVG

---------- -------------------- -------------------- ----------

      7698 BLAKE                MARION                      846

      7782 CLARK                CAROL                       446

      7788 SCOTT                DONALD                      996

      7839 KING                 FRANCIS                    2996

      7902 FORD                 JENNIFER                    996

To eliminate the inefficiency of executing the same subquery multiple times, Oracle introduced the WITH clause in the Oracle9i release. Using the WITH clause, you can place the subquery that calculates the average salary at the top of the query and reference it throughout the query:

WITH avg_sal AS (SELECT AVG(salary) val FROM employee)

SELECT e.emp_id, e.lname, e.fname, 

  (SELECT ROUND(e.salary - val) FROM avg_sal) above_avg

FROM employee e

WHERE e.salary > (SELECT val FROM avg_sal);



    EMP_ID LNAME                FNAME                 ABOVE_AVG

---------- -------------------- -------------------- ----------

      7698 BLAKE                MARION                      846

      7782 CLARK                CAROL                       446

      7788 SCOTT                DONALD                      996

      7839 KING                 FRANCIS                    2996

      7902 FORD                 JENNIFER                    996

The WITH clause creates a temporary data set called, in this case, avg_sal, which, in this case, consists of a single row of data having a single column, val. This data set is generated once and can be referenced throughout the containing query. To access the data in avg_sal, you query it as if it were a table. In this regard, it acts like a temporary table with statement scope. As you will see shortly, the WITH clause acts in many ways like another type of subquery called the inline view.