Multi-Table Joins

So far, all the queries that you've seen involve a single table. Most databases contain multiple tables and there are relationships between these tables. This sample database has an example:


movies=# \d rentals



            Table "rentals"

  Attribute  |     Type     | Modifier

-------------+--------------+----------

 tape_id     | character(8) | not null

 rental_date | date         | not null

 customer_id | integer      | not null

Here's a description of the rentals table from earlier in this chapter:

"When a customer comes in to rent a tape, we will add a row to the rentals table to record the transaction. There are three pieces of information that we need to record for each rental: the tape_id, the customer_id, and the date that the rental occurred. Notice that each row in the rentals table refers to a customer (customer_id) and a tape (tape_id)."

You can see that each row in the rentals table refers to a tape (tape_id) and to a customer (customer_id). If you SELECT from the rentals table, you can see the tape ID and customer ID, but you can't see the movie title or customer name. What you need here is a join. When you need to retrieve data from multiple tables, you join those tables.

PostgreSQL (and all relational databases) supports a number of join types. The most basic join type is a cross-join (or Cartesian product). In a cross join, PostgreSQL joins each row in the first table to each row in the second table to produce a result table. If you are joining against a third table, PostgreSQL joins each row in the intermediate result with each row in the third table.

Let's look at an example. We'll cross-join the rentals and customers tables. First, I'll show you each table:


movies=# SELECT * FROM rentals;

 tape_id  | rental_date | customer_id

----------+-------------+-------------

 AB-12345 | 2001-11-25  |           1

 AB-67472 | 2001-11-25  |           3

 OW-41221 | 2001-11-25  |           1

 MC-68873 | 2001-11-20  |           3

(4 rows)



movies=# SELECT * FROM customers;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  5 | Funkmaster, Freddy   | 555-FUNK |            |

  7 | Gull, Jonathon LC    | 555-1111 | 1984-02-05 |

  8 | Grumby, Jonas        | 555-2222 | 1984-02-21 |

(7 rows)

Now I'll join these tables. To perform a cross-join, we simply list each table in the FROM clause:


movies=# SELECT rentals.*, customers.id, customers.customer_name

movies-#   FROM rentals, customers;

 tape_id  | rental_date | customer_id | id |    customer_name

----------+-------------+-------------+----+----------------------

 AB-12345 | 2001-11-25  |           1 |  1 | Jones, Henry

 AB-12345 | 2001-11-25  |           1 |  2 | Rubin, William

 AB-12345 | 2001-11-25  |           1 |  3 | Panky, Henry

 AB-12345 | 2001-11-25  |           1 |  4 | Wonderland, Alice N.

 AB-12345 | 2001-11-25  |           1 |  5 | Funkmaster, Freddy

 AB-12345 | 2001-11-25  |           1 |  7 | Gull, Jonathon LC

 AB-12345 | 2001-11-25  |           1 |  8 | Grumby, Jonas

 AB-67472 | 2001-11-25  |           3 |  1 | Jones, Henry

 AB-67472 | 2001-11-25  |           3 |  2 | Rubin, William

 AB-67472 | 2001-11-25  |           3 |  3 | Panky, Henry

 AB-67472 | 2001-11-25  |           3 |  4 | Wonderland, Alice N.

 AB-67472 | 2001-11-25  |           3 |  5 | Funkmaster, Freddy

 AB-67472 | 2001-11-25  |           3 |  7 | Gull, Jonathon LC

 AB-67472 | 2001-11-25  |           3 |  8 | Grumby, Jonas

 OW-41221 | 2001-11-25  |           1 |  1 | Jones, Henry

 OW-41221 | 2001-11-25  |           1 |  2 | Rubin, William

 OW-41221 | 2001-11-25  |           1 |  3 | Panky, Henry

 OW-41221 | 2001-11-25  |           1 |  4 | Wonderland, Alice N.

 OW-41221 | 2001-11-25  |           1 |  5 | Funkmaster, Freddy

 OW-41221 | 2001-11-25  |           1 |  7 | Gull, Jonathon LC

 OW-41221 | 2001-11-25  |           1 |  8 | Grumby, Jonas

 MC-68873 | 2001-11-20  |           3 |  1 | Jones, Henry

 MC-68873 | 2001-11-20  |           3 |  2 | Rubin, William

 MC-68873 | 2001-11-20  |           3 |  3 | Panky, Henry

 MC-68873 | 2001-11-20  |           3 |  4 | Wonderland, Alice N.

 MC-68873 | 2001-11-20  |           3 |  5 | Funkmaster, Freddy

 MC-68873 | 2001-11-20  |           3 |  7 | Gull, Jonathon LC

 MC-68873 | 2001-11-20  |           3 |  8 | Grumby, Jonas

(28 rows)

You can see that PostgreSQL has joined each row in the rentals table to each row in the customers table. The rentals table contains four rows; the customers table contains seven rows. The result set contains 4 x 7 or 28 rows.

Cross-joins are rarely useful?they usually don't represent real-world relationships.

The second type of join, the inner-join, is very useful. An inner-join starts with a cross-join, and then throws out the rows that you don't want. Take a close look at the results of the previous query. Here are the first seven rows again:


 tape_id  | rental_date | customer_id | id |    customer_name

----------+-------------+-------------+----+----------------------

 AB-12345 | 2001-11-25  |           1 |  1 | Jones, Henry

 AB-12345 | 2001-11-25  |           1 |  2 | Rubin, William

 AB-12345 | 2001-11-25  |           1 |  3 | Panky, Henry

 AB-12345 | 2001-11-25  |           1 |  4 | Wonderland, Alice N.

 AB-12345 | 2001-11-25  |           1 |  5 | Funkmaster, Freddy

 AB-12345 | 2001-11-25  |           1 |  7 | Gull, Jonathon LC

 AB-12345 | 2001-11-25  |           1 |  8 | Grumby, Jonas

    .            .             .         .       .

    .            .             .         .       .

    .            .             .         .       .

These seven rows were produced by joining the first row in the rentals table:


 tape_id  | rental_date | customer_id

----------+-------------+-------------

 AB-12345 | 2001-11-25  |           1

with each row in the customers table. What is the real-world relationship between a rentals row and a customers row? Each row in the rentals table contains a customer ID. Each row in the customers table is uniquely identified by a customer ID. So, given a rentals row, we can find the corresponding customers row by searching for a customer where the customer ID is equal to rentals.customer_id. Looking back at the previous query, you can see that the meaningful rows are those WHERE customers.id = rentals.customer_id.

Qualifying Column Names

Notice that this WHERE clause mentions two columns with similar names (customer_id and id). You may find it helpful to qualify each column name by prefixing it with the name of the corresponding table, followed by a period. So, customers.id refers to the id column in the customers table and rentals.customer_id refers to the customer_id column in the rentals table. Adding the table qualifier is required if a command involves two columns with identical names, but is useful in other cases.

Now you can construct a query that will show us all of the rentals and the names of the corresponding customers:


movies=# SELECT rentals.*, customers.id, customers.customer_name

movies-#   FROM rentals, customers

movies-#   WHERE customers.id = rentals.customer_id;

 tape_id  | rental_date | customer_id | id | customer_name

----------+-------------+-------------+----+---------------

 AB-12345 | 2001-11-25  |           1 |  1 | Jones, Henry

 OW-41221 | 2001-11-25  |           1 |  1 | Jones, Henry

 AB-67472 | 2001-11-25  |           3 |  3 | Panky, Henry

 MC-68873 | 2001-11-20  |           3 |  3 | Panky, Henry

(4 rows)

To execute this query, PostgreSQL could start by creating the cross-join between all the tables involved, producing an intermediate result table. Next, PostgreSQL could throw out all the rows that fail to satisfy the WHERE clause. In practice, this would be a poor strategy: Cross-joins can get very large quickly. Instead, the PostgreSQL query optimizer analyzes the query and plans an execution strategy to minimize execution time. I'll cover query optimization in Chapter 4, "Query Optimization."

Join Types

We've seen two join types so far: cross-joins and inner-joins. Now we'll look at outer-joins. An outer-join is similar to an inner-join: a relationship between two tables is established by correlating a column from each table.

In an earlier section, you wrote a query that answered the question: "Which customers are currently renting movies?" How would you answer the question: "Who are my customers and which movies are they currently renting?" You might start by trying the following query:


movies=# SELECT customers.*, rentals.tape_id

movies-#   FROM customers, rentals

movies-#   WHERE rentals.customer_id = customers.id;

 id | customer_name |  phone   | birth_date | balance | tape_id

----+---------------+----------+------------+---------+----------

  1 | Jones, Henry  | 555-1212 | 1970-10-10 |    0.00 | AB-12345

  1 | Jones, Henry  | 555-1212 | 1970-10-10 |    0.00 | OW-41221

  3 | Panky, Henry  | 555-1221 | 1968-01-21 |    0.00 | AB-67472

  3 | Panky, Henry  | 555-1221 | 1968-01-21 |    0.00 | MC-68873

(4 rows)

Well, that didn't work. This query showed you which customers are currently renting movies (and the movies that they are renting). What we really want is a list of all customers and, if a customer is currently renting any movies, all the movies rented. This is an outer-join. An outer-join preserves all the rows in one table (or both tables) regardless of whether a matching row can be found in the second table.

The syntax for an outer-join is a little strange. Here is an example:


movies=# SELECT customers.customer_name, rentals.tape_id

movies-#   FROM customers LEFT OUTER JOIN rentals

movies-#   ON customers.id = rentals.customer_id;

    customer_name     | tape_id

----------------------+----------

 Jones, Henry         | AB-12345

 Jones, Henry         | OW-41221

 Rubin, William       |

 Panky, Henry         | AB-67472

 Panky, Henry         | MC-68873

 Wonderland, Alice N. |

 Funkmaster, Freddy   |

 Gull, Jonathon LC    |

 Grumby, Jonas        |

(9 rows)

This query is a left outer-join. Why left? Because you will see each row from the left table (the table to the left of the LEFT OUTER JOIN phrase). An inner-join would list only two customers ("Jones, Henry" and "Panky, Henry")?the other customers have no rentals.

A RIGHT OUTER JOIN preserves each row from the right table. A FULL OUTER JOIN preserves each row from both tables.

The following query shows a list of all customers, all tapes, and any rentals:


movies=# SELECT customers.customer_name, rentals.tape_id, tapes.title

movies-#   FROM customers FULL OUTER JOIN rentals

movies-#     ON customers.id = rentals.customer_id

movies-#   FULL OUTER JOIN tapes

movies-#     ON tapes.tape_id = rentals.tape_id;

    customer_name     | tape_id  |        title

----------------------+----------+----------------------

 Jones, Henry         | AB-12345 | The Godfather

 Panky, Henry         | AB-67472 | The Godfather

                      |          | Rear Window

                      |          | American Citizen, An

 Panky, Henry         | MC-68873 | Casablanca

 Jones, Henry         | OW-41221 | Citizen Kane

 Rubin, William       |          |

 Wonderland, Alice N. |          |

 Funkmaster, Freddy   |          |

 Gull, Jonathon LC    |          |

 Grumby, Jonas        |          |

                      |          | Sly

                      |          | Stone

(13 rows)



    Part II: Programming with PostgreSQL