7.4 Queries with Set Operations

Occasionally, you must tune multipart queries that use set operations like UNION, UNION ALL, INTERSECT, and EXCEPT to combine results of two or more simple queries. The extension of the SQL-diagramming tuning method to these multipart queries is usually straightforward: diagram and tune each part independently, as if it were a standalone query. When the parts are fast, combining the results with set operations generally works well.

EXCEPT is the keyword specified by the ANSI SQL standard for the set operation to find the difference between two sets. DB2 and SQL Server follow the standard by supporting EXCEPT. Oracle, however, uses MINUS for the same operation, most likely because it supported the operation before the standard existed.


However, some of these set operations deserve a little extra discussion. The UNION operation, in addition to combining the parts, also must sort them and discard duplicates. This last step is often unnecessary, especially if you design the parts to avoid duplicates in the first place. In Oracle, you can replace the UNION operation with UNION ALL when you determine that duplicates are either impossible or need not be discarded. In databases that do not support UNION ALL, you can skip the duplicate-eliminating step by replacing the single UNION query with two or more simple queries, combining the results in the application layer, rather than in the database.

The INTERSECT operation can generally be profitably replaced with an EXISTS-type subquery that looks for the matching row that the second part would produce. For example, if you had two Employees tables, you might look for shared employee records with this:

SELECT Employee_ID FROM Employees1
INTERSECT
SELECT Employee_ID FROM Employees2

You could always replace this INTERSECT query with this:

SELECT DISTINCT Employee_ID 
FROM Employees1 E1
WHERE EXISTS (SELECT null 
              FROM Employees2 E2
              WHERE E1.Employee_ID=E2.Employee_ID)

Using the methods of Section 7.3, you would then determine whether this EXISTS subquery should be expressed in the EXISTS or IN form, or converted to a simple join. Note that the correlating join conditions become numerous if the SELECT list contains many items. Also note that INTERSECT will match column lists with nulls, but a correlation join will not, unless you use join conditions designed for that purpose. For example, if the positive-valued foreign key Manager_ID is allowed to be null (but Employee_ID is not), the Oracle equivalent of this query:

SELECT Employee_ID, Manager_ID FROM Employees1
INTERSECT
SELECT Employee_ID, Manager_ID FROM Employees2

is this query:

SELECT DISTINCT Employee_ID, Manager_ID 
FROM Employees1 E1
WHERE EXISTS (SELECT null 
              FROM Employees2 E2
              WHERE E1.Employee_ID=E2.Employee_ID
              AND NVL(E1.Manager_ID,-1)=NVL(E2.Manager_ID,-1))

The expression NVL(...,-1) in the second correlation join condition converts null values on the nullable column so that they join successfully when null is matched with null.

The EXCEPT (or MINUS) operation can generally be profitably replaced with a NOT EXISTS-type subquery. Searching for employee records in the first table but not in the second table, you might have used this:

SELECT Employee_ID FROM Employees1
MINUS
SELECT Employee_ID FROM Employees2

You could always replace that with this:

SELECT DISTINCT Employee_ID 
FROM Employees1 E1
WHERE NOT EXISTS (SELECT null 
                  FROM Employees2 E2
                  WHERE E1.Employee_ID=E2.Employee_ID)

You would then solve this query using the methods described in Section 7.2.