Combining the Results of Multiple Queries

Combining the Results of Multiple Queries

It is possible to produce a single result combining the results of two or more queries. The combined resultset might be a simple aggregation of all records from the queries; or some operation related to the theory of sets (see Appendix L) could be performed before the final resultset was returned.

The SQL99 standard supports UNION, INTERSECT, and EXCEPT clauses that could be used to combine the results of two or more queries.

UNION

The following query returns all the records containing some information about customers that do not yet have an assigned salesman:

SELECT   phone_custid_fn OWNER_ID,
         'CUSTOMER PHONE' PHONE_TYPE,
         phone_phonenum_s
FROM     phone
WHERE    phone_type_s = 'PHONE'
AND      phone_salesmanid_fn IS NULL
   
OWNER_ID    PHONE_TYPE     phone_phonenum_s     
----------- -------------- -------------------- 
51          CUSTOMER PHONE (817) 555-5524
5           CUSTOMER PHONE (847) 555-2924
12          CUSTOMER PHONE (508) 555-5224
61          CUSTOMER PHONE (541) 555-3091
. . .       . . .          . . .
16          CUSTOMER PHONE (201) 555-9302
89          CUSTOMER PHONE (908) 555-3779
85          CUSTOMER PHONE (281) 555-2835
152         CUSTOMER PHONE (541) 555-8979
   
37 records selected

This query returns a total of 37 records. Now, assume that you also would like to include in the resultset the list of salesmen's phones who do not have a customer assigned to them yet. Here is the query to find these salesmen; it returns six records:

SELECT      phone_salesmanid_fn,
            'SALESMAN PHONE',
            phone_phonenum_s
FROM        phone
WHERE       phone_type_s = 'PHONE'
AND         phone_custid_fn IS NULL
   
phone_salesmanid_fn                phone_phonenum_s     
------------------- -------------- -------------------- 
23                  SALESMAN PHONE (305) 555-8502
24                  SALESMAN PHONE (626) 555-4435
25                  SALESMAN PHONE (717) 555-5479
26                  SALESMAN PHONE (718) 555-7879
27                  SALESMAN PHONE (718) 555-5091
28                  SALESMAN PHONE (814) 555-0324
   
6 records selected

To combine these records into a single resultset, you would use the UNION statement:

SELECT             phone_custid_fn OWNER_ID,
                   'CUSTOMER PHONE' PHONE_TYPE,
                   phone_phonenum_s
FROM               phone
WHERE              phone_type_s = 'PHONE'
AND                phone_salesmanid_fn IS NULL
UNION
SELECT             phone_salesmanid_fn,
                   'SALESMAN PHONE',
                   phone_phonenum_s
FROM               phone
WHERE              phone_type_s = 'PHONE'
AND                phone_custid_fn IS NULL
ORDER BY   2, 1
   
  OWNER_ID PHONE_TYPE     PHONE_PHONENUM_S
---------- -------------- --------------------
         1 CUSTOMER PHONE (909) 555-9957
         2 CUSTOMER PHONE (501) 555-5702
         3 CUSTOMER PHONE (520) 555-5513
         4 CUSTOMER PHONE (802) 555-2091
        ...
        26 SALESMAN PHONE (718) 555-7879
        27 SALESMAN PHONE (718) 555-5091
        28 SALESMAN PHONE (814) 555-0324
   
43 rows selected.

Now you have a full list that includes all records from the query about customers, combined with the results brought by the query about salesmen. You may visualize this as two resultsets glued together. All queries in an SQL statement containing a UNION operator must have an equal number of expressions in their lists. In addition, these expressions (which could be columns, literals, results of functions, etc.) must be of compatible data types: For example, if the expression evaluates to a character string in one query, it cannot be a number in the second query that is joined to the first by the UNION operator.

The results of UNION could be ordered (as we can see in the UNION query above) but the ORDER BY clause could be used only with the final resultset — that is, it can refer to the result of the UNION, not to particular queries used in it.

If the queries potentially could bring duplicate records, you may want to filter the duplicates, or, conversely, make sure that they all are present. By default, the UNION operator excludes duplicate records; specifying UNION ALL makes sure that your final resultset has all the records returned by all the queries participating in the UNION.

Here is a simple query returning status code and description from the STATUS table of the ACME database:

SELECT status_code_s, 
       status_desc_s
FROM   status
   
ST STATUS_DESC_S
-- ------------------------------
60 SHIPPED
20 COMPLETE
70 INVOICED
80 CANCELLED
   
4 rows selected.

You can use exactly the same query to produce a resultset containing the duplicate rows, and combine the resultsets together using UNION:

SELECT status_code_s, 
       status_desc_s
FROM   status
UNION
SELECT status_code_s, 
       status_desc_s
FROM   status
   
ST STATUS_DESC_S
-- ------------------------------
20 COMPLETE
60 SHIPPED
70 INVOICED
80 CANCELLED
   
4 rows selected.

As you can see, the duplicates (i.e., the record where each corresponding column matched in both queries) were excluded, and the final record count is still 4, which is what you could have gotten from the one query only. This behaviour is in full compliance with the Set Theory. Running the same queries combined with the UNION ALL operator (which overrides this behaviour) returns all records from both queries, no duplicates are excluded:

SELECT   status_code_s, 
         status_desc_s
FROM     status
UNION ALL
SELECT   status_code_s, 
         status_desc_s
FROM     status
   
ST STATUS_DESC_S
-- ------------------------------
60 SHIPPED
20 COMPLETE
70 INVOICED
80 CANCELLED
60 SHIPPED
20 COMPLETE
70 INVOICED
80 CANCELLED
   
8 records selected.
Note 

The ALL keyword in the UNION operator brings the business world into our discussion: UNION ALL does not comply with set theory, where Idempotent Law A U A = A (see Appendix L) simply states that union of a set with itself is the same union. It is more intuitive from a business point of view to expect all records from constituent queries combined in the final resultset.

INTERSECT

The INTERSECT operator is used to evaluate results returned by two queries but includes only the records produced by the first query that have matching ones in the second. This operator is implemented in Oracle and IBM DB2 UDB but not in Microsoft SQL Server 2000, which uses EXISTS operator for this purpose.

Consider the query that selects customer IDs (field CUST_ID_N) from the CUSTOMER table of the ACME database and intersects them with results returned by a second query, producing a resultset of customer's IDs who placed an order:

SELECT  cust_id_n
FROM    customer
INTERSECT
SELECT  ordhdr_custid_fn 
FROM    order_header
 
CUST_ID_N
----------
         1
         7
        55
        63
       152
   
5 records selected

The same results are achievable in a variety of ways. Here is an example, using a subquery and an IN operator:

SELECT cust_id_n
FROM   customer
WHERE  cust_id_n IN
 (SELECT ordhdr_custid_fn
  FROM   order_header)
   
CUST_ID_N
----------
         1
         7
        55
        63
       152
   
5 records selected

MS SQL Server does not have INTERSECT, offering the EXISTS operator instead. The following query could be rewritten in SQL Sever syntax using this operator and a correlated query:

SELECT  cust_id_n
FROM    customer
WHERE EXISTS
   (SELECT ordhdr_custid_fn 
    FROM   order_header
    WHERE  ordhdr_custid_fn = cust_id_n)
   
cust_id_n
----------- 
1
7
55
63
152
(5 row(s) affected)
Note 

The EXISTS keyword is common for all our "big three" databases, so this query would work for all of them.

While INTERSECT could be replaced with a combination of other SQL methods, it contributes to the clarity of the SQL code and speeds up its execution because it is more optimized than a subquery.

EXCEPT (MINUS)

When combining the results of two or more queries into a single resultset, you may want to exclude some records from the first query based on what was returned by the second. This keyword is implemented in IBM DB2 UDB only, whereas Oracle uses the MINUS keyword and Microsoft SQL Server 2000 uses EXISTS for the same purpose. The operation's functionality directly follows the rules of set theory, covered in Appendix L.

Consider the following two queries; both select some information about a customer. Here is the resultset returned by the first query:

SELECT cust_id_n
FROM   customer
   
cust_id_n   
----------- 
61
64
14
65
. . .
152
. . .
6
7
1
   
37 rows selected

The first query returns 37 rows. And here are the results of the second query:

SELECT ordhdr_custid_fn 
FROM   order_header
   
ordhdr_custid_fn 
---------------- 
NULL
NULL
1
1
. . .
1
7
7
. . .
152
   
(51 row(s) affected)

The EXCEPT result will be all the records from the first minus those returned by the second.

SELECT  cust_id_n
FROM    customer
MINUS
SELECT  ordhdr_custid_fn 
FROM    order_header;
   
cust_id_n   
----------- 
61
. . 
68
69
15
. . .
9
12
. . .
2
3
   
(32 row(s) affected)

And the MINUS operation brings 32 records. Five of the records from the second query were excepted from the records returned by the first query. We've shortened the output of all three queries to fit the page, but you may run the query yourself on the RDBMS of your choice; with any luck the results would match these described in the paragraph.

Oracle's name for this operator is MINUS, while DB2 UDB retains the SQL99 standard — EXCEPT.

The DB2 UDB syntax for this query would be identical save for the use of the EXCEPT operator:

SELECT cust_id_n
FROM   customer
EXCEPT
SELECT ordhdr_custid_fn 
FROM   order_header

Microsoft SQL Server 2000 offers the NOT EXISTS operator instead; surprisingly this syntax, while not part of the SQL99 standard, is common to all "big-three" databases. To get the results with the NOT EXISTS operator you need to use a correlated subquery. A correlated subquery differs from a regular subquery in that it accepts a parameter from the outer query as a criterion in the WHERE clause:

SELECT  cust_name_s, 
        cust_alias_s, 
        cust_status_s
FROM    customer cust
WHERE NOT EXISTS (SELECT *
                  FROM order_header
                  WHERE ordhdr_custid_fn = cust.cust_id_n)
   
cust_name_s                       cust_alias_s   cust_status_s 
--------------------------------- --------------- ------------- 
MAGNETICS USA INC.                                 MNGA71398       Y
MAGNETOMETRIC DEVICES INC.                         MNGA71400       Y
FAIR PARK GARDENS                                  NULL            Y
FAIR AND SONS AIR CONDTNG                          FRIA71406       Y
. . .                                              . . .         ...
INSULECTRO INC.                                    IUSN71521       Y
INTEGRATED POWER DESIGNS                           IETN71523       Y
EASTERN SATELLITE COMPANY                          ETSA71626       Y
   
(32 row(s) affected)

In this case, we used the CUST_ID_N field of the CUSTOMER table as a criterion for the subquery to limit the number of the potential records. If there is a record for a customer ID (ORDHDR_CUSTID_FN = CUST.CUST_ID_N) in the ORDER_HEADER table, the record will be excluded from the outer query final resultset; only records for customers that have not placed orders will be returned.