It is possible to produce a single result combining the results of two or more queries. The combined resultset might be a simple aggregation of all records from the queries; or some operation related to the theory of sets (see Appendix L) could be performed before the final resultset was returned.
The SQL99 standard supports UNION, INTERSECT, and EXCEPT clauses that could be used to combine the results of two or more queries.
The following query returns all the records containing some information about customers that do not yet have an assigned salesman:
SELECT phone_custid_fn OWNER_ID, 'CUSTOMER PHONE' PHONE_TYPE, phone_phonenum_s FROM phone WHERE phone_type_s = 'PHONE' AND phone_salesmanid_fn IS NULL OWNER_ID PHONE_TYPE phone_phonenum_s ----------- -------------- -------------------- 51 CUSTOMER PHONE (817) 555-5524 5 CUSTOMER PHONE (847) 555-2924 12 CUSTOMER PHONE (508) 555-5224 61 CUSTOMER PHONE (541) 555-3091 . . . . . . . . . 16 CUSTOMER PHONE (201) 555-9302 89 CUSTOMER PHONE (908) 555-3779 85 CUSTOMER PHONE (281) 555-2835 152 CUSTOMER PHONE (541) 555-8979 37 records selected
This query returns a total of 37 records. Now, assume that you also would like to include in the resultset the list of salesmen's phones who do not have a customer assigned to them yet. Here is the query to find these salesmen; it returns six records:
SELECT phone_salesmanid_fn, 'SALESMAN PHONE', phone_phonenum_s FROM phone WHERE phone_type_s = 'PHONE' AND phone_custid_fn IS NULL phone_salesmanid_fn phone_phonenum_s ------------------- -------------- -------------------- 23 SALESMAN PHONE (305) 555-8502 24 SALESMAN PHONE (626) 555-4435 25 SALESMAN PHONE (717) 555-5479 26 SALESMAN PHONE (718) 555-7879 27 SALESMAN PHONE (718) 555-5091 28 SALESMAN PHONE (814) 555-0324 6 records selected
To combine these records into a single resultset, you would use the UNION statement:
SELECT phone_custid_fn OWNER_ID, 'CUSTOMER PHONE' PHONE_TYPE, phone_phonenum_s FROM phone WHERE phone_type_s = 'PHONE' AND phone_salesmanid_fn IS NULL UNION SELECT phone_salesmanid_fn, 'SALESMAN PHONE', phone_phonenum_s FROM phone WHERE phone_type_s = 'PHONE' AND phone_custid_fn IS NULL ORDER BY 2, 1 OWNER_ID PHONE_TYPE PHONE_PHONENUM_S ---------- -------------- -------------------- 1 CUSTOMER PHONE (909) 555-9957 2 CUSTOMER PHONE (501) 555-5702 3 CUSTOMER PHONE (520) 555-5513 4 CUSTOMER PHONE (802) 555-2091 ... 26 SALESMAN PHONE (718) 555-7879 27 SALESMAN PHONE (718) 555-5091 28 SALESMAN PHONE (814) 555-0324 43 rows selected.
Now you have a full list that includes all records from the query about customers, combined with the results brought by the query about salesmen. You may visualize this as two resultsets glued together. All queries in an SQL statement containing a UNION operator must have an equal number of expressions in their lists. In addition, these expressions (which could be columns, literals, results of functions, etc.) must be of compatible data types: For example, if the expression evaluates to a character string in one query, it cannot be a number in the second query that is joined to the first by the UNION operator.
The results of UNION could be ordered (as we can see in the UNION query above) but the ORDER BY clause could be used only with the final resultset — that is, it can refer to the result of the UNION, not to particular queries used in it.
If the queries potentially could bring duplicate records, you may want to filter the duplicates, or, conversely, make sure that they all are present. By default, the UNION operator excludes duplicate records; specifying UNION ALL makes sure that your final resultset has all the records returned by all the queries participating in the UNION.
Here is a simple query returning status code and description from the STATUS table of the ACME database:
SELECT status_code_s, status_desc_s FROM status ST STATUS_DESC_S -- ------------------------------ 60 SHIPPED 20 COMPLETE 70 INVOICED 80 CANCELLED 4 rows selected.
You can use exactly the same query to produce a resultset containing the duplicate rows, and combine the resultsets together using UNION:
SELECT status_code_s, status_desc_s FROM status UNION SELECT status_code_s, status_desc_s FROM status ST STATUS_DESC_S -- ------------------------------ 20 COMPLETE 60 SHIPPED 70 INVOICED 80 CANCELLED 4 rows selected.
As you can see, the duplicates (i.e., the record where each corresponding column matched in both queries) were excluded, and the final record count is still 4, which is what you could have gotten from the one query only. This behaviour is in full compliance with the Set Theory. Running the same queries combined with the UNION ALL operator (which overrides this behaviour) returns all records from both queries, no duplicates are excluded:
SELECT status_code_s, status_desc_s FROM status UNION ALL SELECT status_code_s, status_desc_s FROM status ST STATUS_DESC_S -- ------------------------------ 60 SHIPPED 20 COMPLETE 70 INVOICED 80 CANCELLED 60 SHIPPED 20 COMPLETE 70 INVOICED 80 CANCELLED 8 records selected.
The ALL keyword in the UNION operator brings the business world into our discussion: UNION ALL does not comply with set theory, where Idempotent Law A U A = A (see Appendix L) simply states that union of a set with itself is the same union. It is more intuitive from a business point of view to expect all records from constituent queries combined in the final resultset.
The INTERSECT operator is used to evaluate results returned by two queries but includes only the records produced by the first query that have matching ones in the second. This operator is implemented in Oracle and IBM DB2 UDB but not in Microsoft SQL Server 2000, which uses EXISTS operator for this purpose.
Consider the query that selects customer IDs (field CUST_ID_N) from the CUSTOMER table of the ACME database and intersects them with results returned by a second query, producing a resultset of customer's IDs who placed an order:
SELECT cust_id_n FROM customer INTERSECT SELECT ordhdr_custid_fn FROM order_header CUST_ID_N ---------- 1 7 55 63 152 5 records selected
The same results are achievable in a variety of ways. Here is an example, using a subquery and an IN operator:
SELECT cust_id_n FROM customer WHERE cust_id_n IN (SELECT ordhdr_custid_fn FROM order_header) CUST_ID_N ---------- 1 7 55 63 152 5 records selected
MS SQL Server does not have INTERSECT, offering the EXISTS operator instead. The following query could be rewritten in SQL Sever syntax using this operator and a correlated query:
SELECT cust_id_n FROM customer WHERE EXISTS (SELECT ordhdr_custid_fn FROM order_header WHERE ordhdr_custid_fn = cust_id_n) cust_id_n ----------- 1 7 55 63 152 (5 row(s) affected)
The EXISTS keyword is common for all our "big three" databases, so this query would work for all of them.
While INTERSECT could be replaced with a combination of other SQL methods, it contributes to the clarity of the SQL code and speeds up its execution because it is more optimized than a subquery.
When combining the results of two or more queries into a single resultset, you may want to exclude some records from the first query based on what was returned by the second. This keyword is implemented in IBM DB2 UDB only, whereas Oracle uses the MINUS keyword and Microsoft SQL Server 2000 uses EXISTS for the same purpose. The operation's functionality directly follows the rules of set theory, covered in Appendix L.
Consider the following two queries; both select some information about a customer. Here is the resultset returned by the first query:
SELECT cust_id_n FROM customer cust_id_n ----------- 61 64 14 65 . . . 152 . . . 6 7 1 37 rows selected
The first query returns 37 rows. And here are the results of the second query:
SELECT ordhdr_custid_fn FROM order_header ordhdr_custid_fn ---------------- NULL NULL 1 1 . . . 1 7 7 . . . 152 (51 row(s) affected)
The EXCEPT result will be all the records from the first minus those returned by the second.
SELECT cust_id_n FROM customer MINUS SELECT ordhdr_custid_fn FROM order_header; cust_id_n ----------- 61 . . 68 69 15 . . . 9 12 . . . 2 3 (32 row(s) affected)
And the MINUS operation brings 32 records. Five of the records from the second query were excepted from the records returned by the first query. We've shortened the output of all three queries to fit the page, but you may run the query yourself on the RDBMS of your choice; with any luck the results would match these described in the paragraph.
Oracle's name for this operator is MINUS, while DB2 UDB retains the SQL99 standard — EXCEPT.
The DB2 UDB syntax for this query would be identical save for the use of the EXCEPT operator:
SELECT cust_id_n FROM customer EXCEPT SELECT ordhdr_custid_fn FROM order_header
Microsoft SQL Server 2000 offers the NOT EXISTS operator instead; surprisingly this syntax, while not part of the SQL99 standard, is common to all "big-three" databases. To get the results with the NOT EXISTS operator you need to use a correlated subquery. A correlated subquery differs from a regular subquery in that it accepts a parameter from the outer query as a criterion in the WHERE clause:
SELECT cust_name_s, cust_alias_s, cust_status_s FROM customer cust WHERE NOT EXISTS (SELECT * FROM order_header WHERE ordhdr_custid_fn = cust.cust_id_n) cust_name_s cust_alias_s cust_status_s --------------------------------- --------------- ------------- MAGNETICS USA INC. MNGA71398 Y MAGNETOMETRIC DEVICES INC. MNGA71400 Y FAIR PARK GARDENS NULL Y FAIR AND SONS AIR CONDTNG FRIA71406 Y . . . . . . ... INSULECTRO INC. IUSN71521 Y INTEGRATED POWER DESIGNS IETN71523 Y EASTERN SATELLITE COMPANY ETSA71626 Y (32 row(s) affected)
In this case, we used the CUST_ID_N field of the CUSTOMER table as a criterion for the subquery to limit the number of the potential records. If there is a record for a customer ID (ORDHDR_CUSTID_FN = CUST.CUST_ID_N) in the ORDER_HEADER table, the record will be excluded from the outer query final resultset; only records for customers that have not placed orders will be returned.