2.1 Life Without WHERE

Before delving into the WHERE clause, let's imagine life without it. Say that you are interested in doing some maintenance on the data in the part table. To inspect the data in the table, you issue the following query:

SELECT part_nbr, name, supplier_id, status, inventory_qty

FROM part;

If the part table contains 10,000 items, the result set returned by the query would consist of 10,000 rows, each with 5 columns. You would then load the 10,000 rows into memory and make your modifications.

Once you have made the required modifications to your data in memory, it is time to apply the changes to the part table. Without the ability to specify the rows to modify, you have no choice but to delete all rows in the table and re-insert all 10,000 rows:

DELETE FROM part;



INSERT INTO part (part_nbr, name, supplier_id, status, inventory_qty)

VALUES ('XY5-1002', 'Wonder Widget', 1, 'IN-STOCK', 1);



/* 9,999 more INSERTs on the wall, 9,999 more INSERTS . . .  */

While this approach works in theory, it wreaks havoc on performance, concurrency (the ability for more than one user to modify data simultaneously), and scalability (the ability to perform predictably as load increases).

Now imagine that you want to modify data in the part table only for those parts supplied by Acme Industries. Since the supplier's name is stored in the supplier table, you must include both the part and supplier tables in the FROM clause:

SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty,

  s.supplier_id, s.name

FROM part p, supplier s;

If 100 companies supply the 10,000 parts in the part table, this query will return 1,000,000 rows. Known as the Cartesian product, this number equates to every possible combination of all rows from the two tables. As you sift through the million rows, you would keep only those where the values of p.supplier_id and s.supplier_id are identical and where the s.name column matches 'Acme Industries'. If Acme Industries supplies only 50 of the 10,000 parts in your database, you will end up discarding 999,950 of the 1,000,000 rows returned by your query.