Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. You can divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:
Single-row, single-column subqueries
Multiple-row, single-column subqueries
Multiple-column subqueries
Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.
A subquery that returns a single row with a single column is treated like a scalar by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. Such a subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed. The following query illustrates the utility of single-row, single-column subqueries by finding all employees earning an above-average salary. The subquery returns the average salary, and the containing query then returns all employees who earn more than that amount:
SELECT lname FROM employee WHERE salary > (SELECT AVG(salary) FROM employee); LNAME -------------------- BLAKE CLARK SCOTT KING FORD
As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table. For example, many applications include maintenance routines that clean up operational data, such as exception or load logs. Every week, a script might delete all but the latest day's activity. For example:
DELETE FROM load_log WHERE load_dt < (SELECT MAX(TRUNC(load_dt)) FROM load_log);
Noncorrelated subqueries are also commonly found outside the WHERE clause, as illustrated by the following query, which identifies the salesperson responsible for the most orders:
SELECT sales_emp_id, COUNT(*) FROM cust_order GROUP BY sales_emp_id HAVING COUNT(*) = (SELECT MAX(COUNT(*)) FROM cust_order GROUP BY sales_emp_id); SALES_EMP_ID COUNT(*) ------------ ---------- 7354 8
This subquery calculates the number of orders attributable to each salesperson, and then applies the MAX function to return only the highest number of orders. The containing query performs the same aggregation as the subquery and then keeps only those salespeople whose total sales count matches the maximum value returned by the subquery. Interestingly, the containing query can return more than one row if multiple salespeople tie for the maximum sales count, while the subquery is guaranteed to return a single row and column. If it seems wasteful that the subquery and containing query both perform the same aggregation, it is; see Chapter 14 for more efficient ways to handle these types of queries.
So far, you have seen scalar subqueries in the WHERE and HAVING clauses of SELECT statements, along with the WHERE clause of a DELETE statement. Before delving deeper into the different types of subqueries, let's explore where else subqueries can and can't be utilized in SQL statements:
The FROM clause may contain any type of noncorrelated subquery.
The SELECT and ORDER BY clauses may contain scalar subqueries.
The GROUP BY clause may not contain subqueries.
The START WITH and CONNECT BY clauses, used for querying hierarchical data, may contain subqueries and will be examined in detail in Chapter 8.
The WITH clause contains a named noncorrelated subquery that can be referenced multiple times within the containing query but executes only once (see the examples later in this chapter).
The USING clause of a MERGE statement may contain noncorrelated subqueries.
The SET clause of UPDATE statements may contain scalar or single-row, multiple-column subqueries.
INSERT statements may contain scalar subqueries in the VALUES clause.
Now that you know how to use single-row, single-column subqueries, let's explore how to use subqueries that return multiple rows. When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL are used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any member of the set or all members of the set. Consider the following query:
SELECT fname, lname FROM employee WHERE dept_id = 30 AND salary >= ALL (SELECT salary FROM employee WHERE dept_id = 30); FNAME LNAME -------------------- -------------------- MARION BLAKE
The subquery returns the set of salaries for department 30, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary returned by the subquery. Thus, this query retrieves the name of the highest paid person in department 30. While every employee has a salary >= any of the salaries in the department, only the highest paid employee has a salary >= all of the salaries in the department. If multiple employees tie for the highest salary in the department, multiple names will be returned.
Another way to phrase the previous query is to find the employee whose salary is not less than any other salary in her department. You can do this using the ANY operator:
SELECT fname, lname FROM employee WHERE dept_id = 30 AND NOT salary < ANY (SELECT salary FROM employee WHERE dept_id = 30);
There are almost always multiple ways to phrase the same query. One of the challenges of writing SQL is striking the right balance between efficiency and readability. In this case, we might prefer using AND salary >= ALL over AND NOT salary < ANY because the first variation is easier to understand; however, the latter form might prove more efficient, since each evaluation of the subquery results requires from 1 to N comparisons when using ANY versus exactly N comparisons when using ALL.
|
The next query uses the ANY operator to find all employees who have been with the company longer than any top-level manager:
SELECT fname, lname FROM employee WHERE manager_emp_id IS NOT NULL AND hire_date < ANY (SELECT hire_date FROM employee WHERE manager_emp_id IS NULL); FNAME LNAME -------------------- -------------------- JOHN SMITH KEVIN ALLEN CYNTHIA WARD TERRY JONES KENNETH MARTIN MARION BLAKE CAROL CLARK MARY TURNER
The subquery returns the set of hire dates for all top-level managers, and the containing query returns the names of non-top-level managers whose hire date is previous to any returned by the subquery.
For the previous three queries, failure to include either the ANY or ALL operators may result in the following error:
ORA-01427: single-row subquery returns more than one row
The wording of this error message is a bit confusing. After all, how can a single-row subquery return multiple rows? What the error message is trying to convey is that a multiple-row subquery has been identified where only a single-row subquery is allowed. If you are not absolutely certain that your subquery will return exactly one row, you must include ANY or ALL to ensure your code doesn't fail in the future.
Along with ANY and ALL, you may also use the IN operator for working with multi-row subqueries. Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE if a match is found in the set returned by the subquery. The following query uses IN to postpone shipment of all orders containing parts that are not currently in stock:
UPDATE cust_order SET expected_ship_dt = TRUNC(SYSDATE) + 1 WHERE ship_dt IS NULL AND order_nbr IN (SELECT l.order_nbr FROM line_item l INNER JOIN part p ON l.part_nbr = p.part_nbr WHERE p.inventory_qty = 0);
The subquery returns the set of orders requesting out-of-stock parts, and the containing UPDATE statement modifies the expected ship date of all orders in the set. We think you will agree that IN is more intuitive than = ANY, which is why IN is almost always used in such situations. Similarly, you can use NOT IN instead of using != ALL as demonstrated by the next query, which deletes all customers who haven't placed an order in the past five years:
DELETE FROM customer WHERE cust_nbr NOT IN (SELECT cust_nbr FROM cust_order WHERE order_dt >= TRUNC(SYSDATE) -- (365 * 5));
The subquery returns the set of customers that have placed an order in the past five years, and the containing DELETE statement removes all customers that are not in the set returned by the subquery.
Finding members of one set that do not exist in another set is referred to as an anti-join. As the name implies, an anti-join is the opposite of a join; rows from table A are returned if the specified data is not found in table B. The Oracle optimizer can employ multiple strategies for formulating execution plans for such queries, including a merge anti-join or a hash anti-join.
|
While all of the previous examples compare a single column from the containing SQL statement to the result set returned by the subquery, it is also possible to issue a subquery against multiple columns. Consider the following UPDATE statement, which rolls up data from an operational table into an aggregate table:
UPDATE monthly_orders SET tot_orders = (SELECT COUNT(*) FROM cust_order WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') AND cancelled_dt IS NULL), max_order_amt = (SELECT MAX(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') AND cancelled_dt IS NULL), min_order_amt = (SELECT MIN(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') AND cancelled_dt IS NULL), tot_amt = (SELECT SUM(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') AND cancelled_dt IS NULL) WHERE month = 7 and year = 2001;
The UPDATE statement modifies four columns in the monthly_orders table, and values for each of the four columns are calculated by aggregating data in the cust_order table. Looking closely, you can see that the WHERE clauses for all four subqueries are identical; only the aggregation function differs in the four queries. The next query demonstrates how all four columns can be populated with a single trip through the cust_order table by using a single subquery that returns four columns:
UPDATE monthly_orders SET (tot_orders, max_order_amt, min_order_amt, tot_amt) = (SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-JUL-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-AUG-2001','DD-MON-YYYY') AND cancelled_dt IS NULL) WHERE month = 7 and year = 2001;
This second statement achieves the same result more efficiently than the first by performing four aggregations during one trip through the cust_order table, rather than one aggregation during each of four separate trips.
Whereas the previous example demonstrates the use of a multiple-column subquery in the SET clause of an UPDATE statement, such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement. The next statement deletes all items from open orders that include discontinued parts:
DELETE FROM line_item WHERE (order_nbr, part_nbr) IN (SELECT c.order_nbr, p.part_nbr FROM cust_order c INNER JOIN line_item li ON c.order_nbr = li.order_nbr INNER JOIN part p ON li.part_nbr = p.part_nbr WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL AND p.status = 'DISCONTINUED');
Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table.
You may find yourself in a situation where you need to reference the same noncorrelated subquery multiple times in the same query. For example, let's say you want to show all employees making more than the average salary, and you want to show how much above the average each employee makes. You can formulate this using the same subquery in both the FROM and WHERE clauses:
SELECT e.emp_id, e.lname, e.fname, ROUND(e.salary - (SELECT AVG(salary) FROM employee)) above_avg FROM employee e WHERE e.salary > (SELECT AVG(salary) FROM employee); EMP_ID LNAME FNAME ABOVE_AVG ---------- -------------------- -------------------- ---------- 7698 BLAKE MARION 846 7782 CLARK CAROL 446 7788 SCOTT DONALD 996 7839 KING FRANCIS 2996 7902 FORD JENNIFER 996
To eliminate the inefficiency of executing the same subquery multiple times, Oracle introduced the WITH clause in the Oracle9i release. Using the WITH clause, you can place the subquery that calculates the average salary at the top of the query and reference it throughout the query:
WITH avg_sal AS (SELECT AVG(salary) val FROM employee) SELECT e.emp_id, e.lname, e.fname, (SELECT ROUND(e.salary - val) FROM avg_sal) above_avg FROM employee e WHERE e.salary > (SELECT val FROM avg_sal); EMP_ID LNAME FNAME ABOVE_AVG ---------- -------------------- -------------------- ---------- 7698 BLAKE MARION 846 7782 CLARK CAROL 446 7788 SCOTT DONALD 996 7839 KING FRANCIS 2996 7902 FORD JENNIFER 996
The WITH clause creates a temporary data set called, in this case, avg_sal, which, in this case, consists of a single row of data having a single column, val. This data set is generated once and can be referenced throughout the containing query. To access the data in avg_sal, you query it as if it were a table. In this regard, it acts like a temporary table with statement scope. As you will see shortly, the WITH clause acts in many ways like another type of subquery called the inline view.