While selecting everything a table or view could contain might be of value for some operations, most of the time you will be looking for specific information — a person with a particular phone number, data falling into a certain date range, and so on. The table might contain several million rows, and you simply have no time to search for the information all by yourself. The SQL WHERE clause provides a mechanism for setting horizontal limits; specifically, it allows you to limit the number of rows in resultsets returned by a query through specifying some condition or set of conditions. Depending on what conditions you have specified with your query, there might be zero, one, or more records (rows) returned. The search criteria specified in the WHERE clause evaluate to TRUE or FALSE, and all the rules of Boolean algebra are fully applicable there.
Cross-References |
See Appendix L for information on Boolean algebra. |
To specify conditions in the WHERE clause, SQL employs a number of operators. These are discussed in detail in Chapter 11. Here, we are going to touch them only briefly.
Consider the following query run against the ACME database in Microsoft SQL Server (the syntax and results would be identical in all "big three" databases). It returns some information about a particular customer, uniquely identified by the customer ID field CUST_ID_N. The uniqueness of the customer ID (not that of the address record) value is enforced by the primary key constraint as well as a UNIQUE constraint placed onto the column in the table CUSTOMER. (There could be only one customer associated with any given ID.)
SELECT cust_id_n, cust_name_s, cust_status_s FROM customer WHERE cust_id_n = 7 CUST_ID_N CUST_NAME_S CUST_STATUS_S ---------- ---------------------- ------------- 7 WILE ELECTRONICS INC. Y (1 row(s) affected)
You were able to pinpoint the record because only one record satisfied your condition.
Now, nothing prevents a customer from having one or more addresses, which is illustrated by the following query requesting all address records for a customer with a customer ID equal to 7:
SELECT addr_address_s, addr_city_s, addr_state_s, addr_zip_s FROM address WHERE addr_custid_fn = 7 addr_address_s addr_city_s addr_state_s addr_zip_s ----------------- ------------------ ------------ ----------
411 S LONDON AVE EGG HARBOR CITY NJ 08215 232 EEL ST. EGG HARBOR CITY NJ 08215 454 OAK ST. EGG HARBOR CITY NJ 08215 456 WILLOW ST. EGG HARBOR CITY NJ 08215 678 MAPLE AVE. EGG HARBOR CITY NJ 08215 . . . . . . . . . . . . 865 CEDAR AVE. EGG HARBOR CITY NJ 08215 911 MYRTLE ST. EGG HARBOR CITY NJ 08215 777 SITKA AVE. EGG HARBOR CITY NJ 08215 999 ELK AVE. EGG HARBOR CITY NJ 08215 (11 row(s) affected)
This query yields 11 records, containing all the addresses under which customer number 7 conducts its business.
Comparison operators could also determine a range of values. When you want to know what products in your database are selling for more than $20, use the following query:
SELECT prod_description_s, prod_price_n FROM product WHERE prod_price_n > 20 PROD_DESCRIPTION_S PROD_PRICE_N -------------------------------------------- ------------ STEEL NAILS 6'' 33.28 STOOL CAPS 5'' 26.82 BAR RAILS 24X48X128 23.10 BAR RAILS 30X45X60 27.00 BASES 30X45X60 23.10 (5 rows affected)
This query returns information for all products whose price is over $20.
There could be more than one criterion specified with the query. For example, you may want to retrieve all the phone salespersons in your company that are not assigned to a customer. In the table, there are phone numbers and fax numbers; to eliminate the latter, the following query could be used:
SQL> SELECT phone_salesmanid_fn, phone_phonenum_s, phone_type_s FROM phone WHERE phone_custid_fn IS NULL AND phone_type_s = 'PHONE' PHONE_SALESMANID_FN PHONE_PHONENUM_S PHONE_TYPE_S ------------------- -------------------- -------------------- 23 (305) 555-8502 PHONE 24 (626) 555-4435 PHONE 25 (717) 555-5479 PHONE 26 (718) 555-7879 PHONE 27 (718) 555-5091 PHONE 28 (814) 555-0324 PHONE 6 record(s) selected.
Only records where column PHONE_CUSTID_FN contains NULL and the type of the numbers is PHONE made it to the final resultset.
The records could be selected using the OR compound operator. In the following example, only records for the orders that were placed by customer 63 or that have an order ID equal to 30661 are taken into consideration.
SELECT ordhdr_id_n, ordhdr_custid_fn FROM order_header WHERE ordhdr_id_n = 30661 OR ordhdr_custid_fn = 63 ORDHDR_ID_N ORDHDR_CUSTID_FN ----------- ---------------- 30613 63 30614 63 30615 63 30661 1 4 records(s) selected.
This provides you with the functionality to specify disparate selection criteria: the customer ID of the customer who had placed order 30661 is not 7, nevertheless it is present on the list of the records because you've specified that you are interested in the orders placed by customer 63 OR the order #30661, regardless of the customer ID. Using AND in this case would yield no results since order 30661 was not placed by customer 7, and no record satisfying both criteria at the same time would be found.
While it is possible to use a combination of => (greater than or equal to) and <= (less than or equal to) operators to achieve exactly the same results, the BETWEEN operator provides a more convenient (and often more efficient) way for selecting a range of values.
SELECT prod_description_s, prod_price_n FROM product WHERE prod_price_n BETWEEN 23.10 AND 30 PROD_DESCRIPTION_S PROD_PRICE_N -------------------------------------------- ------------ STOOL CAPS 5'' 26.82 BAR RAILS 24X48X128 23.10 BAR RAILS 30X45X60 27.00 BASES 30X45X60 23.10 4 record(s) selected.
Note that the range of the values is inclusive, that is, the values used in specifying the range are included into the final resultset. This behavior is number-specific, if other database types (e.g., characters, dates, etc.) are used for criteria, results might differ. The operator could be used with virtually any data type: dates, characters, numbers, and so on. (See Chapter 11 for more information.)
When there is more than one exact criterion for the WHERE clause, and these criteria do not fit any range of values, you may use an OR statement. Consider the following query:
SELECT cust_name_s, cust_credhold_s FROM customer WHERE cust_alias_s = 'MNGA71396' OR cust_alias_s = 'MNGA71398' OR cust_alias_s = 'MNGA71400' CUST_NAME_S CUST_CREDHOLD_S ------------------------------------- --------------- WILE SEAL CORP. Y MAGNETICS USA INC. N MAGNETOMETRIC DEVICES INC. N 3 record(s) selected.
Any records that correspond to either of the three specified criteria make it into the final resultset. The same result is easier achieved using an IN operator:
SELECT cust_name_s, cust_credhold_s FROM customer WHERE cust_alias_s IN ('MNGA71396', 'MNGA71398', 'MNGA71400') CUST_NAME_S CUST_CREDHOLD_S ---------------------------------- --------------- WILE SEAL CORP. Y MAGNETICS USA INC. N MAGNETOMETRIC DEVICES INC. N 3 record(s) selected.
The IN operator makes your life easier by replacing numerous OR statements and speeding up the query execution along the way. All values specified within an IN operator must be of the same data type as they refer to one column.
The NOT operator negates results of the operator by making it perform a search for the results exactly opposite to those specified. Any of the operators and queries discussed to this point could have produced opposite results if NOT was used. The following example returns all the results that do not match the specified criteria — having the name with the second letter I, third L, and fourth E; only records that do not have such a sequence starting from the second position within the company name are selected:
SQL> SELECT cust_name_s FROM customer WHERE cust_name_s NOT LIKE '_ILE%' cust_name_s -------------------------------------------------- MAGNETICS USA INC. MAGNETOMETRIC DEVICES INC. FAIR PARK GARDENS FAIR AND SONS AIR CONDTNG . . . KILBURN GLASS INDUSTRIES CARLTONBATES COMPANY DABAH BROS INC. . . . INSULECTRO INC. INTEGRATED POWER DESIGNS EASTERN SATELLITE COMPANY 32 record(s) selected
We have mentioned before that relational databases are using a special value to signify the absence of the data in the database table column — NULL. Since this value does not comply with the rules that all the other values follow (e.g., comparison, operations, etc.), they cannot be detected with the equation/comparison operator =; i.e., the syntax WHERE <column_name> = NULL, while being technically valid in Oracle or DB2 UDB (and valid in Microsoft SQL Server 2000 under certain circumstances), would never yield any data because the equation will always evaluate to FALSE.
The test for NULL is performed with the IS keyword, as in the example below, which retrieves information about salesmen that have customers without a PHONE_CUSTID_FN number.
SELECT phone_salesmanid_fn, phone_phonenum_s, phone_type_s FROM phone WHERE phone_custid_fn IS NULL PHONE_SALESMANID_FN PHONE_PHONENUM_S PHONE_TYPE_S ------------------- -------------------- -------------------- 23 (305) 555-8502 PHONE 24 (626) 555-4435 PHONE 25 (717) 555-5479 PHONE 26 (718) 555-7879 PHONE 27 (718) 555-5091 PHONE 28 (814) 555-0324 PHONE 23 (305) 555-8501 FAX 24 (626) 555-4434 FAX 25 (717) 555-5478 FAX 26 (718) 555-7878 FAX 27 (718) 555-5091 FAX 28 (814) 555-0323 FAX 12 record(s) selected.
Note |
In Microsoft SQL Server 2000 Transact-SQL, a NULL is never equal to another NULL unless you specifically instruct SQL Server to do so by issuing command SET ANSI_NULLS OFF; setting this parameter OFF within the session would allow you to compare a NULL value with another NULL value, setting it back ON (default) brings back the SQL99 standard behavior. |
Just as easily, you may test for the absence of the NULL value in the specified column using the NOT operator. The IS NULL test returns FALSE or TRUE, depending on whether the value in the column is or is not a NULL. The syntax follows that of the English language: you cannot use NOT IS NULL (produces error), but IS NOT NULL yields the desired results. The following query produces the records for the salesmen whose customers do have phones (therefore the value in the field PHONE_CUSTID_FN is NOT a NULL):
SELECT phone_custid_fn, phone_phonenum_s, phone_type_s FROM phone WHERE phone_custid_fn IS NOT NULL PHONE_CUSTID_FN PHONE_PHONENUM_S PHONE_TYPE_S --------------- -------------------- -------------------- 1 (909) 555-9957 PHONE 1 (909) 555-9956 FAX 2 (501) 555-5702 PHONE 2 (501) 555-5701 FAX ... 152 (541) 555-8979 PHONE 152 (541) 555-8978 FAX 74 records
As in the SELECT clause, the subqueries could be used with the WHERE clause to provide missing values (or a set of values). For example, you cannot find information from the ORDER_HEADER table using a customer's name only, because the ORDER_HEADER table contains customer IDs, not the names; thus, the customer ID could be found in the table CUSTOMER using the customer name as a criterion, and then used to select values from the ORDER_HEADER table:
SELECT ordhdr_nbr_s, ordhdr_orderdate_d FROM order_header WHERE ordhdr_custid_fn = (SELECT cust_id_n FROM customer WHERE cust_name_s = 'WILE ELECTRONICS INC.') ordhdr_nbr_s ordhdr_orderdate_d ------------- --------------------------- 523733 2002-08-15 00:00:00.000 523734 2002-08-15 00:00:00.000 523735 2002-08-15 00:00:00.000 (3 row(s) affected)
The subquery executes first, thus providing the outer query with a set of values to select from. In this case, a single matching value is expected. If for some reason the subquery returns more than one row (and therefore more than one value could match the ORDHDR_CUSTID field), an error will be generated.
Any of the operators discussed earlier in this chapter could be used with the subqueries, though one has to pay attention to the kind of data required by them: the IN operator would accept a range of values, while LIKE requires a single value to function. For example, the following query uses the IN operator for several values returned by the subquery:
SELECT phone_phonenum_s, phone_type_s FROM phone WHERE phone_salesmanid_fn IN (SELECT salesman_id_n FROM salesman WHERE salesman_code_s BETWEEN '07' and '10') PHONE_PHONENUM_S PHONE_TYPE_S -------------------- -------------------- (718) 555-7879 PHONE (718) 555-5091 PHONE (814) 555-0324 PHONE (718) 555-7878 FAX (718) 555-5091 FAX (814) 555-0323 FAX 6 record(s) selected.
When the subquery returns a set of possible values, and the outer query employs some operator that compares them, how would this query distinguish between these values? SQL introduces two operators ANY and ALL to accomplish this distinction.
There is a view in the ACME database that contains order totals for a single company — WILE BESS COMPANY. The view was created to illustrate the usage of these two operators. Consider the following query:
SELECT v_total.customer_name, v_total.total_price FROM v_customer_totals v_total WHERE v_total.total_price > ANY (SELECT vctw.total_price FROM v_customer_totals_wilebess) ORDER BY total_price ASC CUSTOMER_NAME TOTAL_PRICE --------------------------------------- ----------- WILE ELECTROMATIC INC. 7511.00 WILE BESS COMPANY 7511.00 WILE ELECTROMATIC INC. 7799.20 WILE BESS COMPANY 8390.00 WILE BESS COMPANY 8390.00 . . . . . . WILE SEAL CORP. 15456.80 WILE ELECTRONICS INC. 15456.80 29 rows selected.
It is looking for orders with a total greater than ANY order by WILE BESS COMPANY (i.e., greater than any amount present in the set returned by the subquery). That means that records selected by the outer query should be greater than any of the values — effectively meaning that they should be greater than the smallest of the values returned by the subquery (which is 6608.00). The resultset was sorted in ascending order to help you visualize the situation. Here is a query that would produce identical results:
SELECT v_total.customer_name, v_total.total_price FROM v_customer_totals v_total WHERE v_total.total_price > (SELECT MIN(vctw.total_price) FROM v_customer_totals_wilebess) ORDER BY total_price ASC);
The subquery here returns a single value — the minimal TOTAL_PRICE, which could be compared with a single value from the outer query.
Deploying the ALL operator brings up completely different records. It will instruct the outer query to select records whose TOTAL_PRICE is greater than all the values — meaning greater than the greatest value returned by the subquery. The results are arranged in ascending order, to underscore the point that the lowest TOTAL_PRICE column from the view V_CUSTOMER_TOTAL value returned by the outer query is bigger than the greatest:
SELECT v_total.customer_name, v_total.total_price FROM v_customer_totals v_total WHERE v_total.total_price > ALL (SELECT vctw.total_price FROM v_customer_totals_wilebess vctw) ORDER BY total_price DESC CUSTOMER_NAME TOTAL_PRICE ----------------------------------- ----------- WILE ELECTRONICS INC. 15456.8 WILE SEAL CORP. 15456.8 2 rows selected.
The same results could be achieved employing MAX function in the subquery, which ensures that only a single value (i.e., 15402.20) would be returned for comparison purposes:
SELECT customer_name, total_price FROM v_customer_totals WHERE total_price > (SELECT MAX(total_price) FROM v_customer_totals_wilebess) ORDER BY total_price DESC;
The subquery could host a subquery in its turn. This is called nested subqueries. There is no theoretical limit on the nesting level — i.e., how many times there could be a query within a query — though some vendors limit it. Subquery is an expensive way (in computer resources terms) to find out information and should be used judiciously.
For example, to find all customers that ever ordered a product called CRATING MATERIAL 12X48X72, you could use the following query:
SELECT cust_name_s, cust_alias_s FROM customer WHERE cust_id_n IN (SELECT ordhdr_custid_fn FROM order_header WHERE ordhdr_id_n in (SELECT ordline_ordhdrid_fn FROM order_line WHERE ordline_prodid_fn = (SELECT prod_id_n FROM product WHERE prod_description_s = 'CRATING MATERIAL 12X48X72' ) ) ) CUST_NAME_S CUST_ALIAS_S ------------------------------------ --------------- WILE ELECTROMATIC INC. ECEL71460 WILE BESS COMPANY BSSE71641 2 rows selected.
The first query to be executed would be that with the highest nesting level — three in this case, then the second, then the first, and finally, when all necessary information is gathered, the main query would be executed.
There are also special cases of subquery called correlated subquery. The difference in regular and correlated subqueries is that the values from the outer subquery (host query) can be used as a parameter in the inner subquery. The correlated subqueries will be explained further in the chapter, in the section dealing with combining resultsets of the queries — namely, dealing with INTERSECT and MINUS keywords.