Column Constraints

When you create a PostgreSQL table, you can define column constraints^[15]. A column constraint is a rule that must be satisfied whenever you insert or update a value in that column.

^[15] You can also define table constraints. A table constraint applies to the table as a whole, not just a single column. We'll discuss table constraints in Chapter 3.

It's very important to understand that when you define a column constraint, PostgreSQL won't ever let your table get into a state in which the constraints are not met. If you try to INSERT a value that violates a constraint, the insertion will fail. If you try to UPDATE a value in such a way that it would violate a constraint, the modification will be rejected.

You can also define constraints that establish relationships between two tables. For example, each row in the rentals table contains a tape_id (corresponding to a row in the tapes table). You could define a constraint to tell PostgreSQL that the rentals.tape_id column REFERENCES the tapes.tape_id column. I'll discuss the implications of a REFERENCES constraint in a moment.

Needless to say, column constraints are a very powerful feature.

NULL/NOT NULL

Let's start with the most basic column constraints: NULL and NOT NULL. You've already seen some examples of the NOT NULL constraint (in Chapter 1):


CREATE TABLE customers (

        customer_id   INTEGER UNIQUE NOT NULL,

        name          VARCHAR(50)    NOT NULL,

        phone         CHAR(8),

        birth_date    DATE,

        balance       DECIMAL(7,2)

);

I have specified that the customer_id and name columns are NOT NULL. The meaning of a NOT NULL constraint is pretty clear: The column is not allowed to contain a NULL value^[16]. If you try to INSERT a NULL value into the customer_id or name columns, you will receive an error:

^[16] A column that has been defined to be NOT NULL is also known as a mandatory column. A column that can accept NULL values is said to be optional.


INSERT INTO customers VALUES

(

    11,

    NULL,

    '555-1984',

    '10-MAY-1980',

    0

);



ERROR:  ExecAppend: Fail to add null value in not null

                    attribute customer_name

You'll also get an error if you try to UPDATE either column in such a way that the result would be NULL:


UPDATE customers SET customer_name = NULL WHERE customer_id = 1;



ERROR:  ExecReplace: Fail to add null value in not null

                     attribute customer_name

The opposite of NOT NULL is NULL. You can explicitly define a NULL constraint, but it really doesn't function as a constraint. A NULL constraint does not force a column to contain only NULL values (that would be pretty pointless). Instead, a NULL constraint simply tells PostgreSQL that NULL values are allowed in a particular column. If you don't specify that a column is mandatory, it is considered optional.

UNIQUE

The UNIQUE constraint ensures that a column will contain unique values; that is, there will be no duplicate values in the column. If you look back to the previous section, you'll see that you specified that the customer_id column should be UNIQUE. If you try to INSERT a duplicate value into a UNIQUE column, you will receive an error message:


movies=# SELECT * FROM customers;



 customer_id |    customer_name     |  phone   | birth_date | balance

-------------+----------------------+----------+------------+--------

           1 | Jones, Henry         | 555-1212 | 1970-10-10 |    0.00

           2 | Rubin, William       | 555-2211 | 1972-07-10 |   15.00

           3 | Panky, Henry         | 555-1221 | 1968-01-21 |    0.00

           4 | Wonderland, Alice N. | 555-1122 | 1969-03-05 |    3.00



movies=# INSERT INTO customers VALUES

movies-# (

movies-#    1,

movies-#    'John Gomez',

movies-#    '555-4272',

movies-#    '1982-06-02',

movies-#    0.00

movies-# );



ERROR:  Cannot insert a duplicate key into unique

            index customers_customer_id_key

When you create a UNIQUE column, PostgreSQL will ensure that an index exists for that column. If you don't create one yourself, PostgreSQL will create one for you. We'll talk more about indexes in Chapter 3.

PRIMARY KEY

Almost every table that you create will have one column (or possibly a set of columns) that uniquely identifies each row. For example, each tape in the tapes table is uniquely identified by its tape_id. Each customer in your customers table is identified by a UNIQUE customer_id. In relational database lingo, the set of columns that act to identify a row is called the primary key.

Quite often, you will find that a table has more than one unique column. For example, a table holding employee information might have an employee_id column and a social_security_number (SSN) column. You could argue that either of these would be a reasonable primary key. The employee_id would probably be the better choice for at least three reasons. First, you are likely to refer to an employee record in other tables (for example, withholdings and earnings)?an employee_id is (most likely) shorter than an SSN. Second, an SSN is considered private information, and you don't want to expose an employee's SSN to everyone who has access to one of the related files. Third, it is entirely possible that some of your employees may not have Social Security numbers (they may not be U.S. citizens)?you can't define a column as the PRIMARY KEY if that column allows NULL values.

PostgreSQL provides a constraint, PRIMARY KEY, that you can use to define the primary key for a table. Practically speaking, identifying a column (or a set of columns) as a PRIMARY KEY is the same as defining the column to be NOT NULL and UNIQUE. But the PRIMARY KEY constraint does offer one advantage over NULL and UNIQUE: documentation. When you create a PRIMARY KEY, you are stating that the columns that comprise the key should be used when you need to refer to a row in that table. Each row in the rentals table, for example, contains a reference to a tape (rentals.tape_id) and a reference to a customer (rentals.customer_id). You should define the customers. customer_id column as the primary key of the customers table:


CREATE TABLE customers (

        customer_id   INTEGER PRIMARY KEY,

        name          VARCHAR(50)    NOT NULL,

        phone         CHAR(8),

        birth_date    DATE,

        balance       DECIMAL(7,2)

);

You should also define the tapes.tape_id column as the primary key of the tapes table:


CREATE TABLE tapes (

        tape_id     CHARACTER(8) PRIMARY KEY,

        title       CHARACTER VARYING(80)

);

Now, let's look at the other half of the equation: the REFERENCES constraint.

REFERENCES

A foreign key is a column (or group of columns) in one table that refers to a row in another table. Usually, but not always, a foreign key refers to the primary key of another table.

The REFERENCES constraint tells PostgreSQL that one table refers to another table (or more precisely, a foreign key in one table refers to the primary key of another). Let's look at an example:


CREATE TABLE rentals (

        tape_id     CHARACTER(8) REFERENCES tapes,

        customer_id INTEGER      REFERENCES customers,

        rental_date DATE

);

I've now defined rentals.tape_id and rentals.customer_id to be foreign keys. In this example, the rentals.tape_id column is also called a reference and the tapes.tape_id column is called the referent.

There are a few implications to the REFERENCES constraint that you will need to consider. First, the REFERENCES constraint is a constraint: PostgreSQL does not allow you to change the database in such a way that the constraint would be violated. You cannot add a rentals row that refers to a nonexistent tape (or to a nonexistent customer):


movies=# SELECT * FROM tapes;

 tape_id  |     title

----------+---------------

 AB-12345 | The Godfather

 AB-67472 | The Godfather

 MC-68873 | Casablanca

 OW-41221 | Citizen Kane

 AH-54706 | Rear Window



movies=# INSERT INTO rentals VALUES

movies-# (

movies(#     'OW-00000',

movies(#     1,

movies(#     '2002-02-21'

movies(# );

ERROR:  <unnamed> referential integrity violation ?

             key referenced from rentals not found in tapes

The next thing to consider is that you cannot (normally) DELETE a referent?doing so would violate the REFERENCES constraint:


movies=# SELECT * FROM rentals;

 tape_id  | customer_id | rental_date

----------+-------------+-------------

 AB-12345 |           1 | 2001-11-25

 AB-67472 |           3 | 2001-11-25

 OW-41221 |           1 | 2001-11-25

 MC-68873 |           3 | 2001-11-20

(4 rows)



movies=# DELETE FROM tapes WHERE tape_id = 'AB-12345';

ERROR:  <unnamed> referential integrity violation ?

                  key in tapes still referenced from rentals

Sometimes, it's not appropriate for a REFERENCES constraint to block the deletion of a referent. You can specify the action that PostgreSQL should take when the referent is deleted. The default action (also known as NO ACTION and RESTRICT) is to prevent the deletion of a referent if there are still any references to it. The next alternative, CASCADE, deletes all rows that refer to a value when the referent is deleted. The final two choices break the link between the reference and the referent: SET NULL updates any references to NULL whenever a referent is deleted, whereas SET DEFAULT updates any references to their default values when a referent is deleted.

If you want to specify one of the alternatives, you would use the following syntax when you create the REFERENCES constraint:


REFERENCES table [ (column) ] ON DELETE

     NO ACTION | RESTRICT | CASCADE | SET NULL | SET DEFAULT

By default, a REFERENCES constraint also prevents you from changing data in such a way that the constraint would be violated. You can use the ON UPDATE clause to relax the constraint a little, much the same as the ON DELETE clause.

The syntax required for ON UPDATE is


REFERENCES table [ (column) ] ON UPDATE

    NO ACTION | RESTRICT | CASCADE | SET NULL | SET DEFAULT

There is a subtle difference between the ON UPDATE clause and ON DELETE clause. When you DELETE a referent, the entire row disappears, so the behavior of the ON DELETE clause is obvious. When you UPDATE a referent row, you may change values other than the referent column(s). If you UPDATE a referent row, but you don't update the referent column, you can't introduce a constraint violation, so the ON UPDATE action doesn't come into play. If you do change the referent column, the ON UPDATE action is triggered.

The NO ACTION and RESTRICT actions simply prevent a constraint violation?this is identical to the ON DELETE clause. The CASCADE action causes all references to be updated whenever a referent changes. SET NULL and SET DEFAULT actions work the same for ON UPDATE as for ON DELETE.

CHECK()

By defining a CHECK() constraint on a column, you can tell PostgreSQL that any values inserted into that column must satisfy an arbitrary Boolean expression. The syntax for a CHECK() constraint is


[CONSTRAINT constraint-name] CHECK( boolean-expression )

For example, if you want to ensure that the customer_balance column is a positive value, but less than $10,000.00, you might use the following:


CREATE TABLE customers

(

        customer_id   INTEGER UNIQUE,

        customer_name VARCHAR(50),

        phone         CHAR(8),

        birth_date    DATE,

        balance       DECIMAL(7,2)

           CONSTRAINT invalid_balance

           CHECK( balance > 0 AND balance < 10000 )

);

Now, if you try to INSERT an invalid value into the customer_balance table, you'll cause an error:


INSERT INTO customers VALUES

(

  10,

  'John Smallberries',

  '555-8426',

  '1970-JAN-02',

  20000

);

ERROR:  ExecAppend: rejected due to CHECK constraint invalid_balance