Retrieving Data from the Sample Database

At this point, you should have a sample database (movies) that contains three tables (tapes, customers, and rentals) and a few rows in each table. You know how to get data into a table; now let's see how to view that data.

The SELECT statement is used to retrieve data from a database. SELECT is the most complex statement in the SQL language, and the most powerful. Using SELECT, you can retrieve entire tables, single rows, a group of rows that meet a set of constraints, combinations of multiple tables, expressions, and more. To help you understand the basics of the SELECT statement, I'll try to break it down into each of its forms and move from the simple to the more complex.

SELECT Expression

In its most simple form, you can use the SELECT statement to retrieve one or more values from a set of predefined functions. You've already seen how to retrieve your PostgreSQL user id:


movies=# select user;

 current_user

---------------

 korry

(1 row)



movies=# \q

Other values that you might want to see are


select 5;          -- returns the number 5 (whoopee)

select sqrt(2.0);  -- returns the square root of 2

select timeofday();-- returns current date/time

select now();      -- returns time of start of transaction

select version();  -- returns the version of PostgreSQL you are using



select now(), timeofday();

Commenting

The -- characters introduce a comment?any text that follows is ignored.

The previous example shows how to SELECT more than one piece of information?just list all the values that you want, separated by commas.

The PostgreSQL User's Guide contains a list of all the functions that are distributed with PostgreSQL. In Chapter 2, I'll show you how to combine columns, functions, operators, and literal values into more complex expressions.

SELECT * FROM Table

You probably won't use the first form of the SELECT statement very often?it just isn't very exciting. Moving to the next level of complexity, let's see how to retrieve data from one of the tables that you created earlier:


movies=# SELECT * FROM customers;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

 (4 rows)



When you write a SELECT statement, you have to tell PostgreSQL what information you are trying to retrieve. Let's take a closer look at the components of this SELECT statement.

Following the SELECT keyword, you specify a list of the columns that you want to retrieve. I used an asterisk (*) here to tell PostgreSQL that we want to see all the columns in the customers table.

Next, you have to tell PostgreSQL which table you want to view; in this case, you want to see the customers table.

Now let's look at the results of this query. A SELECT statement returns a result set. A result set is a table composed of all the rows and columns (or fields) that you request. A result set may be empty.

You asked PostgreSQL to return all the columns in the customers table?notice that the columns are displayed (from left to right) in the order that you specified when you created the table. You may have noticed that the rows are returned in an (apparently) arbitrary order. That's an important thing to keep in mind: Unless you specifically request that PostgreSQL return rows in a particular order, you won't be able to predict which rows will come first[5]. This is a performance feature; if you don't care about row ordering, let PostgreSQL return the rows in the fastest possible way.

[5] Okay, some people probably could predict the order in which the rows will appear. Those people have way too much free time and consider a propeller to be fashionable headwear. They are also very good at inducing sleep.

SELECT Single-Column FROM Table

If you don't want to view all of the columns from a table, you can replace the * (following the SELECT keyword) with the name of a column:


movies=# SELECT title FROM tapes;

     title

---------------

 The Godfather

 The Godfather

 Casablanca

 Citizen Kane

 Rear Window

(5 rows)

Again, the rows are presented in an arbitrary order. But this time you see only a single column. You may have noticed that "The Godfather" appears twice in this list. That happens because our imaginary video store owns two copies of that movie. I'll show you how to get rid of duplicates in a moment.

SELECT Column-List FROM Table

So far, you have seen how to select all the columns in a table and how to select a single column. Of course, there is a middle ground?you can select a list of columns:


movies=# SELECT customer_name, birth_date FROM customers;

    customer_name     | birth_date

----------------------+------------

 Jones, Henry         | 1970-10-10

 Rubin, William       | 1972-07-10

 Panky, Henry         | 1968-01-21

 Wonderland, Alice N. | 1969-03-05

(4 rows)

Instead of naming a single column after the SELECT keyword, you can provide a column-separated list of column names. Column names can appear in any order, and the results will appear in the order you specify.

SELECT Expression-List FROM Table

In addition to selecting columns, you can also select expressions. Remember, an expression is a combination of columns, functions, operators, literal values, and other expressions that will evaluate to a single value. Here is an example:


movies=# SELECT

movies-#     customer_name,

movies-#     birth_date,

movies-#     age( birth_date )

movies-# FROM customers;

    customer_name     | birth_date |             age

----------------------+------------+------------------------------

 Jones, Henry         | 1970-10-10 | 31 years 4 mons 3 days 01:00

 Rubin, William       | 1972-07-10 | 29 years 7 mons 3 days 01:00

 Panky, Henry         | 1968-01-21 | 34 years 23 days

 Wonderland, Alice N. | 1969-03-05 | 32 years 11 mons 8 days

(4 rows)

In this example, I've selected two columns and an expression. The expression age( birth_date ) is evaluated for each row in the table. The age() function subtracts the given date from the current date[6].

[6] Technically, the age() function subtracts the given timestamp (date+time) from the current date and time.

Selecting Specific Rows

The preceding few sections have shown you how to specify which columns you want to see in a result set. Now let's see how to choose only the rows that you want.

First, I'll show you to how to eliminate duplicate rows; then I'll introduce the WHERE clause.

SELECT [ALL | DISTINCT | DISTINCT ON]

In an earlier example, you selected the titles of all the videotapes owned by your video store:


movies=# SELECT title from tapes;

     title

---------------

 The Godfather

 The Godfather

 Casablanca

 Citizen Kane

 Rear Window

(5 rows)

Notice that "The Godfather" is listed twice (you own two copies of that video). You can use the DISTINCT clause to filter out duplicate rows:


movies=# SELECT DISTINCT title FROM tapes;

     title

---------------

 Casablanca

 Citizen Kane

 Rear Window

 The Godfather

(4 rows)

You now have a single row with the value "The Godfather." Let's see what happens when you add the tape_id back into the previous query:


movies=# SELECT DISTINCT title, tape_id FROM tapes;

     title     | tape_id

---------------+----------

 Casablanca    | MC-68873

 Citizen Kane  | OW-41221

 Rear Window   | AH-54706

 The Godfather | AB-12345

 The Godfather | AB-67472

(5 rows)

We're back to seeing "The Godfather" twice. What happened? The DISTINCT clause removes duplicate rows, not duplicate column values; and when the tape IDs are added to the result, the rows containing "The Godfather" are no longer identical.

If you want to filter rows that have duplicate values in one (or more) columns, use the DISTINCT ON() form:


movies=# SELECT DISTINCT ON (title) title, tape_id FROM tapes;

     title     | tape_id

---------------+----------

 Casablanca    | MC-68873

 Citizen Kane  | OW-41221

 Rear Window   | AH-54706

 The Godfather | AB-12345

(4 rows)

Notice that one of the "The Godfather" rows has been omitted from the result set. If you don't include an ORDER BY clause (I'll cover that in a moment), you can't predict which row in a set of duplicates will be included in the result set.

You can list multiple columns (or expressions) in the DISTINCT ON() clause.

The WHERE Clause

The next form of the SELECT statement includes the WHERE clause. Here is the syntax diagram for this form:


SELECT expression-list FROM table WHERE conditions

Using the WHERE clause, you can filter out rows that you don't want included in the result set. Let's see a simple example. First, here is the complete customers table:


movies=# SELECT * FROM customers;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

(4 rows)

Now pick out only those customers who owe you some money:


movies=# SELECT * FROM customers WHERE balance > 0;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

(2 rows)

In this example, I've used a single condition to restrict the rows included in the result set: balance > 0.

When PostgreSQL executes a SELECT statement, it evaluates the WHERE clause as it processes each row. If all the conditions specified by the WHERE clause are met, the row will be included in the result set (if a row meets all the conditions in the WHERE clause, the row satisfies the WHERE clause).

Here is an example that is slightly more complex:


movies=# SELECT customer_name, phone FROM customers

movies-#   WHERE

movies-#     ( balance = 0 )

movies-#     AND

movies-#     ( AGE( birth_date ) < '34 years' )

movies-# ;

 customer_name |  phone

---------------+----------

 Jones, Henry  | 555-1212

(1 row)

In this query, I've specified two conditions, separated by an AND operator. The conditions are: balance = 0 and AGE( birth_date ) < '34 years'[7]. As before, PostgreSQL reads each row in the customers table and evaluates the WHERE clause. If a given row is to be included in the result set, it must satisfy two constraints?balance must be equal to zero and the customer must be younger than 34 years of age. If either of these conditions is false for a given row, that row will not be included in the result set.

[7] I'll show you how to format various date/time related values in Chapter 2.

AND is one of the logical operators supported by PostgreSQL. A logical operator is used to combine logical expressions. A logical expression is an expression that evaluates to TRUE, FALSE, or unknown (NULL). The other two logical operators are OR and NOT.

Let's see how the OR operator works:


movies=# SELECT id, customer_name, balance, AGE(birth_date)

movies-# FROM customers

movies-#   WHERE

movies-#     ( balance = 0 )

movies-#     OR

movies-#     ( AGE( birth_date ) < '30 years' )

movies-# ;

 id | customer_name  | balance |             age

----+----------------+---------+------------------------------

  1 | Jones, Henry   |    0.00 | 31 years 4 mons 5 days 01:00

  2 | Rubin, William |   15.00 | 29 years 7 mons 5 days 01:00

  3 | Panky, Henry   |    0.00 | 34 years 25 days

(3 rows)

The OR operator evaluates to TRUE if either (or both) of the conditions is TRUE. The first row (id = 1) is included in the result set because it satisfies the first condition (balance = 0). It is included even if it does not satisfy the second condition. The second row (id = 2) is included in the result set because it satisfies the second condition, but not the first. You can see the difference between AND and OR. A row satisfies the AND operator if both conditions are TRUE. A row satisfies the OR operator if either condition is TRUE (or if both are TRUE).

The NOT operator is simple:


movies=# SELECT * FROM customers

movies-# WHERE

movies-#     NOT ( balance = 0 )

movies-# ;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

(2 rows)

NOT evaluates to TRUE if its operand is FALSE and evaluates to FALSE if its operand is TRUE. The NOT operator inverts (or reverses) a test. Without the NOT operator, the previous example would have returned all customers where the balance column was equal to zero. With the NOT operator, you get the other rows instead.

One other point that I should mention about the WHERE clause. Just because you mention a column in the WHERE clause does not mean that you have to include the column in the result set. For example:


movies=# SELECT id, customer_name FROM customers

movies-#   WHERE

movies-#     balance != 0

movies-# ;

 id |    customer_name

----+----------------------

  2 | Rubin, William

  4 | Wonderland, Alice N.

(2 rows)

This example also shows a more common alternative to the NOT operator. The != operator means "is not equal to." The != operator is not an exact replacement for NOT?it can only be used to check for inequality, whereas NOT is used to reverse the sense of any logical expression.

NULL Values

Sometimes when you add data to a table, you find that you don't know what value you should include for a column. For example, you may encounter a customer who does not want to provide you with his or her birthday. What value should be recorded in the birth_date column for that customer? You don't really want to make up an answer?you want a date value that means "unknown." This is what the NULL value is for. NULL usually means that you don't know what value should be entered into a column, but it can also mean that a column does not apply. A NULL value in the birth_date column certainly means that we don't know a customer's birth_date, not that birth_date does not apply[8]. On the other hand, you might want to include a rating column in the tapes table. A NULL value in the rating column might imply that the movie was produced before ratings were introduced and therefore the rating column does not apply.

[8] I am making the assumption that the customers for your video store have actually been born. For some of you, that may not be a valid assumption.

Some columns should not allow NULL values. In most cases, it would not make sense to add a customer to your customers table unless you know the customer's name. Therefore, the customer_name column should be mandatory (in other words, customer_name should not allow NULL values).

Let's drop and re-create the customers table so that you can tell PostgreSQL which columns should allow NULL values:


movies=# DROP TABLE customers;

DROP

movies=# CREATE TABLE customers (

movies-#        customer_id   INTEGER UNIQUE NOT NULL,

movies-#        name          VARCHAR(50)    NOT NULL,

movies-#        phone         CHAR(8),

movies-#        birth_date    DATE,

movies-#        balance       DECIMAL(7,2)

movies-#);

CREATE

The NOT NULL modifier tells PostgreSQL that the customer_id and name columns are mandatory. If you don't specify NOT NULL, PostgreSQL assumes that a column is optional. You can include the keyword NULL to make your choices more obvious:


movies=# DROP TABLE customers;

DROP

movies=# CREATE TABLE customers (

movies-#        customer_id   INTEGER UNIQUE NOT NULL,

movies-#        name          VARCHAR(50)    NOT NULL,

movies-#        phone         CHAR(8)        NULL,

movies-#        birth_date    DATE           NULL,

movies-#        balance       DECIMAL(7,2)   NULL

movies-#);

CREATE

Notice that a column of any data type can support NULL values.

The NULL value has a unique property that is often the source of much confusion. NULL is not equal to any value, not even itself. NULL is not less than any value, and NULL is not greater than any value. Let's add a customer with a NULL balance:


movies=# INSERT INTO customers movies-#   VALUES

movies-#   (

movies(#      5, 'Funkmaster, Freddy', '555-FUNK', NULL, NULL

movies(#   )

movies-# ;

Now we have five customers:


movies=# SELECT * FROM customers;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  5 | Funkmaster, Freddy   | 555-FUNK |            |

(5 rows)

One of these customers has a NULL balance. Let's try a few queries:


movies=# SELECT * FROM customers WHERE balance > NULL;

 id | customer_name | phone | birth_date | balance

----+---------------+-------+------------+---------

(0 rows)

This query did not return any rows. You might think that it should have customer number 2 (Rubin, William); after all, 15.00 is surely greater than 0. But remember, NULL is not equal to, greater than, or less than any other value. NULL is not the same as zero. Rather than using relational operators ( '=', '!=', '<', or '>'), you should use either the IS or IS NOT operator.


movies=# SELECT * FROM customers WHERE balance IS NULL;

 id |   customer_name    |  phone   | birth_date | balance

----+--------------------+----------+------------+---------

  6 | Funkmaster, Freddy | 555-FUNK |            |

(1 row)



movies=# SELECT * FROM customers WHERE balance IS NOT NULL;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

(4 rows)

The NULL value introduces another complication. If NULL is not greater than, equal to, or less than any other value, what would 'NULL + 4' mean? Is NULL + 4 greater than NULL? It can't be because that would imply that NULL is less than NULL + 4 and, by definition, NULL can't be less than another value. What does all this mean? It means that you can't do math with a NULL value.


movies=# SELECT id, customer_name, balance, balance+4 FROM customers;

 id |    customer_name     | balance | ?column?

----+----------------------+---------+----------

  1 | Jones, Henry         |    0.00 |     4.00

  2 | Rubin, William       |   15.00 |    19.00

  3 | Panky, Henry         |    0.00 |     4.00

  4 | Wonderland, Alice N. |    3.00 |     7.00

  5 | Funkmaster, Freddy   |         |

(5 rows)

This query shows what happens when you try to perform a mathematical operation using NULL. When you try to add '4' to NULL, you end up with NULL.

The NULL value complicates logic operators as well. Most programmers are familiar with two-valued logic operators (that is, logic operators that are defined for the values TRUE and FALSE). When you add in NULL values, the logic operators become a bit more complex. Tables 1.4, 1.5, and 1.6 show the truth tables for each logical operator.

Table 1.4. Truth Table for Three-Valued AND Operator

a

b

a AND b

TRUE

TRUE

TRUE

TRUE

FALSE

FALSE

TRUE

NULL

NULL

FALSE

FALSE

FALSE

FALSE

NULL

FALSE

NULL

NULL

NULL

Source: PostgreSQL User's Guide

Table 1.5. Truth Table for Three-Valued OR Operator

a

b

a OR b

TRUE

TRUE

TRUE

TRUE

FALSE

TRUE

TRUE

NULL

TRUE

FALSE

FALSE

FALSE

FALSE

NULL

NULL

NULL

NULL

NULL

Source: PostgreSQL User's Guide

Table 1.6. Truth Table for Three-Valued NOT Operator

a

NOT a

TRUE

FALSE

FALSE

TRUE

NULL

NULL

Source: PostgreSQL User's Guide

I don't mean to scare you away from the NULL value?it's very useful and often necessary?but you do have to understand the complications that it introduces.

The ORDER BY Clause

So far, all the queries that you have seen return rows in an arbitrary order. You can add an ORDER BY clause to a SELECT command if you need to impose a predictable ordering. The general form of the ORDER BY clause is[9]

[9] PostgreSQL supports another form for the ORDER BY clause: ORDER BY expression [ USING operator ] [, ...]. This might seem a little confusing at first. When you specify ASC, PostgreSQL uses the < operator to determine row ordering. When you specify DESC, PostgreSQL uses the > operator. The second form of the ORDER BY clause allows you to specify an alternative operator.


ORDER BY expression [ ASC | DESC ] [, ...]

The ASC and DESC terms mean ascending and descending, respectively. If you don't specify ASC or DESC, PostgreSQL assumes that you want to see results in ascending order. The expression following ORDER BY is called a sort key.

Let's look at a simple example:


movies=# SELECT * FROM customers ORDER BY balance;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  5 | Funkmaster, Freddy   | 555-FUNK |            |

(5 rows)

You can see that this SELECT command returns the result set in ascending order of the balance column. Here is the same query, but in descending order:


movies=# SELECT * FROM customers ORDER BY balance DESC;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  5 | Funkmaster, Freddy   | 555-FUNK |            |

(5 rows)

This time, the largest balance is first, followed by successively smaller values.

You may have noticed something odd about how the ORDER BY clause handles the customer named Freddy Funkmaster. Recall from the previous section that NULL cannot be compared to other values. By its very nature, the ORDER BY clause must compare values. PostgreSQL resolves this issue with a simple rule: NULL values always sort last. For ascending sorts, NULL is considered greater than all other values. For descending sorts, NULL is considered less than all other values. Note that starting with PostgreSQL version 7.2, NULL is always considered larger than all other values when evaluating an ORDER BY clause.

You can include multiple sort keys in the ORDER BY clause. The following query sorts customers in ascending balance order, and then in descending birth_date order:


movies=# SELECT * FROM customers ORDER BY balance, birth_date DESC;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  5 | Funkmaster, Freddy   | 555-FUNK |            |

(5 rows)

When an ORDER BY clause contains multiple sort keys, you are telling PostgreSQL how to break ties. You can see that customers 1 and 3 have the same value (0.00) in the balance column?you have asked PostgreSQL to order rows using the balance column. What happens when PostgreSQL finds two rows with the same balance? When two sort key values are equal, PostgreSQL moves to the next sort key to break the tie. If two sort key values are not equal, sort keys with a lower precedence are ignored. So, when PostgreSQL finds that customers 1 and 3 have the same balance, it moves to the birth_date column to break the tie.

If you don't have a sort key with a lower precedence, you won't be able to predict the ordering of rows with duplicate sort key values.

You can include as many sort keys as you like.

LIMIT and OFFSET

Occasionally, you will find that you want to answer a question such as "Who are my top 10 salespeople?" In most relational databases, this is a difficult question to ask. PostgreSQL offers two extensions that make it easy to answer "Top n" or "Bottom n"-type questions. The first extension is the LIMIT clause. The following query shows the two customers who owe you the most money:


movies=# SELECT * FROM customers ORDER BY balance DESC LIMIT 2;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

(2 rows)

You can see here that I used an ORDER BY clause so that the rows are sorted such that the highest balances appear first?in most cases, you won't use a LIMIT clause without also using an ORDER BY clause. Let's change this query a little?this time we want the top five customers who have a balance over $10:


movies=# SELECT * FROM customers

movies-#   WHERE

movies-#     balance >= 10

movies-#   ORDER BY balance DESC

movies-#   LIMIT 5;

 id | customer_name  |  phone   | birth_date | balance

----+----------------+----------+------------+---------

  2 | Rubin, William | 555-2211 | 1972-07-10 |   15.00

(1 row)

This example shows that the LIMIT clause won't always return the number of rows that were specified. Instead, LIMIT returns no more than the number of rows that you request. In this sample database, you have only one customer who owes you more than $10.

The second extension is the OFFSET n clause. The OFFSET n clause tells PostgreSQL to skip the first n rows of the result set. For example:


movies=# SELECT * FROM customers ORDER BY balance DESC OFFSET 1;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

  3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

  5 | Funkmaster, Freddy   | 555-FUNK |            |

(4 rows)

In this case, we are viewing all the customers except the customer with the greatest balance. It's common to use LIMIT and OFFSET together:


movies=# SELECT * FROM customers

movies-#   ORDER BY balance DESC LIMIT 2 OFFSET 1;

 id |    customer_name     |  phone   | birth_date | balance

----+----------------------+----------+------------+---------

  4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00

  1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

(2 rows)

Formatting Column Results

So far, you have seen how to tell PostgreSQL which rows you want to view, which columns you want to view, and the order in which the rows should be returned. Let's take a short side-trip here and learn how to change the appearance of the values that you select.

Take a look at the following query:


movies=# SELECT id, customer_name, balance, balance+4 FROM customers;

 id |    customer_name     | balance | ?column?

----+----------------------+---------+----------

  1 | Jones, Henry         |    0.00 |     4.00

  2 | Rubin, William       |   15.00 |    19.00

  3 | Panky, Henry         |    0.00 |     4.00

  4 | Wonderland, Alice N. |    3.00 |     7.00

  5 | Funkmaster, Freddy   |         |

(5 rows)

PostgreSQL inserts two lines of text between your query and the result set. These two lines are (obviously) column headings. You can see that the header for each of the first three columns contains the name of the column. What about the last column? When you SELECT an expression, PostgreSQL uses "?column?" for the field header[10].

[10] Actually, if you SELECT a function (such as AGE() or SQRT()), PostgreSQL will use the name of the function for the field header.

You can change field headers using the AS clause:


movies=# SELECT id, customer_name,

movies-#        balance AS "Old balance",

movies-#        balance + 4 AS "New balance"

movies-#   FROM customers;

 id |    customer_name     | Old balance | New balance

----+----------------------+-------------+-------------

  1 | Jones, Henry         |        0.00 |        4.00

  2 | Rubin, William       |       15.00 |       19.00

  3 | Panky, Henry         |        0.00 |        4.00

  4 | Wonderland, Alice N. |        3.00 |        7.00

  5 | Funkmaster, Freddy   |             |

(5 rows)

Notice that you can provide a field header for table columns as well as for expressions. If you rename a field and the query includes an ORDER BY clause that refers to the field, the ORDER BY should use the new name, not the original one:


movies=# SELECT id, customer_name,

movies-#        balance AS "Old balance",

movies-#        balance + 4 AS "New balance"

movies-#   FROM customers

movies-#   ORDER BY "Old balance";

 id |    customer_name     | Old balance | New balance

----+----------------------+-------------+-------------

  1 | Jones, Henry         |        0.00 |        4.00

  3 | Panky, Henry         |        0.00 |        4.00

  4 | Wonderland, Alice N. |        3.00 |        7.00

  2 | Rubin, William       |       15.00 |       19.00

  5 | Funkmaster, Freddy   |             |

(5 rows)

This section explained how to change the column headers for a SELECT command. You can also change the appearance of the data values. In the next section, I'll show you a few examples using date values for illustration.

Working with Date Values

PostgreSQL supports six basic date, time, and date/time data types, as shown in Table 1.7. I'll use the term temporal to cover date, time, and date/time data types.

Table 1.7. PostgreSQL Temporal Data Types

Data Type Name

Type of Data Stored

Earliest Date/Time

Latest Date/Time

TIMESTAMP

Date/Time

4713 BC

1465001 AD

TIMESTAMP WITH TIME ZONE

Date/Time

1903 AD

2037 AD

INTERVAL

Interval

?178000000 years

178000000 years

DATE

Date

4713 BC

32767 AD

TIME

Time

00:00:00.00

23:59:59.99

TIME WITH TIME ZONE

Time

00:00:00.00+12

23:59:59.99?12

I'll cover the details of the date/time data types in Chapter 2. You have already seen two of these temporal data types. The customers table contains a DATE column (birth_date):


movies=# \d customers

                Table "customers"

   Attribute   |         Type          | Modifier

---------------+-----------------------+----------

 id            | integer               | not null

 customer_name | character varying(50) | not null

 phone         | character(8)          |

 birth_date    | date                  |

 balance       | numeric(7,2)          |

Index: customers_id_key



movies=# SELECT customer_name, birth_date FROM customers;

    customer_name     | birth_date

----------------------+------------

 Jones, Henry         | 1970-10-10

 Rubin, William       | 1972-07-10

 Panky, Henry         | 1968-01-21

 Wonderland, Alice N. | 1969-03-05

 Funkmaster, Freddy   |

(5 rows)



You've also seen the INTERVAL data type?the AGE() function returns an INTERVAL:


movies=# SELECT customer_name, AGE( birth_date ) FROM customers;

    customer_name     |             age

----------------------+------------------------------

 Jones, Henry         | 31 years 4 mons 8 days 01:00

 Rubin, William       | 29 years 7 mons 8 days 01:00

 Panky, Henry         | 34 years 28 days

 Wonderland, Alice N. | 32 years 11 mons 13 days

 Funkmaster, Freddy   |

(5 rows)

Date/time values are usually pretty easy to work with, but there is a complication that you need to be aware of. Let's say that I need to add a new customer:


movies=# INSERT INTO customers

movies-#   VALUES

movies-#   (

movies-#     7, 'Gull, Jonathon LC', '555-1111', '02/05/1984', NULL

movies-#   );

This customer has a birth_date of '02/05/1984'?does that mean "February 5th 1984", or "May 2nd 1984"? How does PostgreSQL know which date I meant? The problem is that a date such as '02/05/1984' is ambiguous?you can't know which date this string represents without knowing something about the context in which it was entered. '02/05/1984' is ambiguous. 'May 02 1984' is unambiguous.

PostgreSQL enables you to enter and display dates in a number of formats?some date formats are ambiguous and some are unambiguous. The DATESTYLE runtime variable tells PostgreSQL how to format dates when displaying data and how to interpret ambiguous dates that you enter.

The DATESTYLE variable can be a little confusing. DATESTYLE is composed of two parts. The first part, called the convention, tells PostgreSQL how to interpret ambiguous dates. The second part, called the display format, determines how PostgreSQL displays date values. The convention controls date input and the display format controls date output.

Let's talk about the display format first. PostgreSQL supports four different display formats. Three of the display formats are unambiguous and one is ambiguous.

The default display format is named ISO. In ISO format, dates always appear in the form 'YYYY-MM-DD'. The next display format is GERMAN. In GERMAN format, dates always appear in the form 'DD.MM.YYYY'. The ISO and GERMAN formats are unambiguous because the format never changes. The POSTGRES format is also unambiguous, but the display format can vary. PostgreSQL needs a second piece of information (the convention) to decide whether the month should appear before the day (US convention) or the day should appear before the month (European convention). In POSTGRES format, date values display the day-of-the-week and month name in abbreviated text form; for example 'Wed May 02 1984' (US) or 'Wed 02 May 1984' (European).

The final display format is SQL. SQL format is ambiguous. In SQL format, the date 'May 02 1984' is displayed as '05/02/1984' (US), or as '02/05/1984'(European).

Table 1.8. DATESTYLE Display Formats

Display Format

US Convention

European Convention

ISO

1984-05-02

1984-05-02

GERMAN

02.05.1984

02.05.1984

POSTGRES

Wed May 02 1984

Wed 02 May 1984

SQL

05/02/1984

02/05/1984

As I mentioned earlier, the ISO and GERMAN display formats are unambiguous. In ISO format, the month always precedes the day. In GERMAN format, the day always precedes the month. If you choose POSTGRES or SQL format, you must also specify the order in which you want the month and day components to appear. You can specify the desired display format and month/day ordering (that is, the convention) in the DATESTYLE runtime variable:


movies=# SET DATESTYLE TO 'US,ISO';            -- 1984-05-02

movies=# SET DATESTYLE TO 'US,GERMAN';         -- 02.05.1984

movies=# SET DATESTYLE TO 'US,POSTGRES';       -- Wed May 02 1984

movies=# SET DATESTYLE TO 'US,SQL';            -- 05/02/1984



movies=# SET DATESTYLE TO 'EUROPEAN,ISO';      -- 1984-05-02

movies=# SET DATESTYLE TO 'EUROPEAN,GERMAN';   -- 02.05.1984

movies=# SET DATESTYLE TO 'EUROPEAN,POSTGRES'; -- Wed 02 May 1984

movies=# SET DATESTYLE TO 'EUROPEAN,SQL';      -- 02/05/1984

The convention part of the DATESTYLE variable determines how PostgreSQL will make sense of the date values that you enter. The convention also affects the ordering of the month and day components when displaying a POSTGRES or SQL date. Note that you are not restricted to entering date values in the format specified by DATESTYLE. For example, if you have chosen to display dates in 'US,SQL' format, you can still enter date values in any of the other formats.

Recall that the ISO and GERMAN date formats are unambiguous?the ordering of the month and day components is predefined. A date entered in POSTGRES format is unambiguous as well?you enter the name of the month so it cannot be confused with the day. If you choose to enter a date in SQL format, PostgreSQL will look to the first component of DATESTYLE (that is, the convention) to determine whether you want the value interpreted as a US or a European date. Let's look at a few examples.


movies=# SET DATESTYLE TO 'US,ISO';

movies=# SELECT CAST( '02/05/1984' AS DATE );

 1984-02-05



movies=# SET DATESTYLE TO 'EUROPEAN,ISO';

movies=# SELECT CAST( '02/05/1984' AS DATE );

 1984-05-02

In this example, I've asked PostgreSQL to display dates in ISO format, but I've entered a date in an ambiguous format. In the first case, you can see that PostgreSQL interpreted the ambiguous date using US conventions (the month precedes the day). In the second case, PostgreSQL uses European conventions to interpret the date.

Now let's see what happens when I enter an unambiguous date:


movies=# SET DATESTYLE TO 'US,ISO';

SET VARIABLE

movies=# SELECT CAST( '1984-05-02' AS DATE );

 1984-05-02

movies=# SET DATESTYLE TO 'EUROPEAN,ISO';

SET VARIABLE

movies=# SELECT CAST( '1984-05-02' AS DATE );

 1984-05-02

This time, there can be no confusion?an ISO-formatted date is always entered in 'YYYY-MM-DD' format. PostgreSQL ignores the convention.

So, you can see that I can enter date values in many formats. If I choose to enter a date in an ambiguous format, PostgreSQL uses the convention part of the current DATESTYLE to interpret the date. I can also use DATESTYLE to control the display format.

Matching Patterns

In the previous two sections, you took a short detour to learn a little about how to format results. Now let's get back to the task of producing the desired results.

The WHERE clause is used to restrict the number of rows returned by a SELECT command[11]. Sometimes, you don't know the exact value that you are searching for. For example, you may have a customer ask you for a film, but he doesn't remember the exact name, although he knows that the film has the word "Citizen" in the title. PostgreSQL provides two features that make it possible to search for partial alphanumeric values.

[11] Technically, the WHERE clause constrains the set of rows affected by a SELECT, UPDATE, or DELETE command. I'll show you the UPDATE and DELETE commands a little later.

LIKE and NOT LIKE

The LIKE operator provides simple pattern-matching capabilities. LIKE uses two special characters that indicate the unknown part of a pattern. The underscore (_) character matches any single character. The percent sign (%) matches any sequence of zero or more characters. Table 1.9 shows a few examples.

Table 1.9. Pattern Matching with the LIKE Operator

String

Pattern

Result

The Godfather

%Godfather%

Matches

The Godfather

%Godfather

Matches

The Godfather

%Godfathe_

Matches

The Godfather

___ Godfather

Matches

The Godfather

Godfather%

Does not match

The Godfather

_Godfather

Does not match

The Godfather: Part II

%Godfather

Does not match

Now let's see how to use the LIKE operator in a SELECT command:


movies=# SELECT * FROM tapes WHERE title LIKE '%Citizen%';

 tape_id  |        title         | duration

----------+----------------------+----------

 OW-41221 | Citizen Kane         |

 KJ-03335 | American Citizen, An |

(2 rows)



The LIKE operator is case-sensitive:


movies=# SELECT * FROM tapes WHERE title LIKE '%citizen%';

 tape_id | title | duration

---------+-------+----------

(0 rows)

If you want to perform case-insensitive pattern matching, use the ILIKE operator:


movies=# SELECT * FROM tapes WHERE title ILIKE '%citizen%';

 tape_id  |        title         | duration

----------+----------------------+----------

 OW-41221 | Citizen Kane         |

 KJ-03335 | American Citizen, An |

(2 rows)

You can, of course, combine LIKE and ILIKE with the NOT operator to return rows that do not match a pattern:


movies=# SELECT * FROM tapes WHERE title NOT ILIKE '%citizen%';

 tape_id  |     title     |   duration

----------+---------------+--------------

 AB-12345 | The Godfather |

 AB-67472 | The Godfather |

 MC-68873 | Casablanca    |

 AH-54706 | Rear Window   |

 OW-42200 | Sly           | 01:36

 OW-42201 | Stone         | 4 days 01:36

(6 rows)

Pattern Matching with Regular Expressions

The LIKE and ILIKE operators are easy to use, but they aren't very powerful. Fortunately, PostgreSQL lets you search for data using regular expressions. A regular expression is a string that specifies a pattern. The language that you use to create regular expressions is far more powerful than the LIKE and ILIKE operators. You have probably used regular expressions before; programs such as grep, awk, and the Unix (and DOS) shells use regular expressions.

The LIKE and ILIKE operators define two pattern-matching characters; the regular expression operator defines far more. First, the character "." within a regular expression operates in the same way as the "_" character in a LIKE pattern: it matches any single character. The characters ".*" in a regular expression operate in the same way as the "%" character in a LIKE pattern: they match zero or more occurrences of any single character.

Notice that in a regular expression, you use two characters to match a sequence of characters, whereas you use a single character in a LIKE pattern. The regular expression ".*" is actually two regular expressions combined into one complex expression. As I mentioned earlier, the "." character matches any single character. The "*" character matches zero or more occurrences of the pattern that precedes it. So, ".*" means to match any single character, zero or more times. There are three other repetition operators: The "+" character matches one or more occurrences of the preceding pattern, and the "?" character matches zero or one occurrence of the preceding pattern. If you need to get really fancy (I never have), you can use the form "{x[,y]}" to match at least x and no more than y occurrences of the preceding pattern.

You can also search for things other than ".". For example, the character "^" matches the beginning of a string and "$" matches the end. The regular expression syntax even includes support for character classes. The pattern "[:upper:]*[:digit:]" will match any string that includes zero or more uppercase characters followed by a single digit.

The "|" character gives you a way to search for a string that matches either of two patterns. For example, the regular expression "(^God)|.*Donuts.*" would match a string that either starts with the string "God" or includes the word "Donuts".

Regular expressions are extremely powerful, but they can get awfully complex. If you need more information, Chapter 4 of the PostgreSQL User's Manual provides an exhaustive reference to the complete regular expression syntax.

Table 1.10 shows how to construct regular expressions that match the same strings matched by the LIKE patterns in shown in Table 1.9.

Table 1.10. Pattern Matching with Regular Expressions

String

Pattern

Result

The Godfather

.*Godfather

Matches

The Godfather

.*Godfather.*



Part II: Programming with PostgreSQL