Now that you have an overview of the components of a PL/pgSQL function, let's look at the function body in greater detail. I'll start by showing you how to include documentation (that is, comments) in your PL/pgSQL functions. Next, I'll look at variable declarations. Finally, I'll finish up this section by describing the different kinds of statements that you can use inside of a PL/pgSQL function.
There are two comment styles in PL/pgSQL. The most frequently seen comment indicator is the double dash: --. A double dash introduces a comment that extends to the end of the current line. For example:
-- This line contains a comment and nothing else DECLARE customer_id INTEGER; -- This is also a comment -- due_date DATE; -- This entire line is a comment -- because it begins with a '--'
PL/pgSQL understands C-style comments as well. A C-style comment begins with the characters /* and ends with the characters */. A C-style comment can span multiple lines:
/* NAME: compute_due_date() DESCRIPTION: This function will compute the due date for a tape rental. INPUT: $1 -- Date of original rental RETURNS: A date indicating when the rental is due. */ CREATE FUNCTION compute_due_date( DATE ) RETURNS DATE ...
Choosing a comment style is purely a matter of personal preference. Of course, the person choosing the style may not be y ou?you may have to conform to coding standards imposed by your customer (and/or employer). I tend to use only the double-dash comment style in PL/pgSQL code. If I want to include a multiline comment, I start each line with a double dash:
---------------------------------------------------------------------- -- NAME: compute_due_date() -- -- DESCRIPTION: This function will compute the due date for a tape -- rental. -- -- INPUT: -- $1 -- Date of original rental -- -- RETURNS: A date indicating when the rental is due. CREATE FUNCTION compute_due_date( DATE ) RETURNS DATE ...
I find that the double-dash style looks a little cleaner.
The variable declarations that you've seen up to this point have all been pretty simple. There are actually five ways to introduce a new variable (or at least a new variable name) into a PL/pgSQL function.
Each parameter defines a new variable (the name is automatically assigned, but you declare the data type).
You can declare new variables in the DECLARE section of a block.
You can create an alternate name for a function parameter using the ALIAS statement.
You can define a new name for a variable (invalidating the old name) using the RENAME statement.
The iterator variable for an integer-based FOR loop is automatically declared to be an integer.
Let's look at these variables one at a time.
I mentioned earlier in this chapter that each parameter in a PL/pgSQL function is automatically assigned a name. The first parameter (in left-to-right order) is named $1, the second parameter is named $2, and so on. You define the data type for each parameter in the function definition?for example:
CREATE FUNCTION write_history( DATE, rentals )...
This function expects two parameters. The first parameter is named $1 and is of type DATE. The second parameter is named $2 and is of type rentals.
Notice that the write_history() function (in the preceding code line) expects an argument of type rentals. In the sample database, 'rentals' is actually the name of a table. Inside of the write_history() function, you can use the rentals parameter ($2) as if it were a row in the rentals table. That means that you can work with $2.tape_id, $2.customer_id, and $2.rental_date.
When you call this function, you need to pass a row from the rentals table as the second argument?for example:
SELECT write_history( NOW(), rentals ) FROM rentals;
If you define a function that expects a row as a parameter, I would recommend ALIASing that parameter for the sake of readability. It's less confusing to see "rentals.tape_id" than "$2.tape_id".
The second way to introduce a new variable into a PL/pgSQL function is to list the variable in the DECLARE section of a block. The name of a nonparameter variable can include alphabetic characters (A?Z), underscores, and digits. Variable names must begin with a letter (A?Z or a?z) or an underscore. Names are case-insensitive: my_variable can also be written as My_Variable, and both still refer to the same variable.
The PL/pgSQL documentation mentions that you can force a variable name to be case-sensitive by enclosing it in double quotes?for example, "pi". As of PostgreSQL 7.1.3, this does not seem to work. You can enclose a variable name within double quotes if you need to start the name with a digit.
Oddly enough, you can actually DECLARE a variable whose name starts with a '$', $3 for example, but I wouldn't recommend it; I would expect that this feature (bug?) may be removed (fixed?) at some point in the future.
The complete syntax for a variable declaration is
var-name [CONSTANT] var-type [NOT NULL] [{ DEFAULT | := } expression];
Some of the examples in this chapter have declared variables using the most basic form:
due_date DATE; rental_period INTERVAL := ''7 days'';
The first line creates a new variable named due_date. The data type of due_date is DATE. Because I haven't explicitly provided an initial value for due_date, it will be initialized to NULL.
The second line defines a new INTERVAL variable named rental_period. In this case, I have provided an initial value, so rental_period will be initialized to the INTERVAL value '7 days'. I could have written this declaration as
rental_period INTERVAL DEFAULT ''7 days'';
In the DECLARE section of a block, DEFAULT is synonymous with ':='.
The initializer expression must evaluate to a value of the correct type. If you are creating an INTEGER variable, the initializer expression must evaluate to an INTEGER value or to a type that can be coerced into an INTEGER value.
There are two things about the DECLARE section that you may find a bit surprising. First, you cannot use any of the function parameters in the initializer expression, even if you ALIAS them. The following is illegal:
CREATE FUNCTION compute_due_date(DATE) RETURNS DATE AS '
DECLARE
due_date DATE := $1 + ''7 days''::INTERVAL;
...
ERROR: Parameter $1 is out of range
The second issue is that once you create a variable in a DECLARE section, you cannot use that variable later within the same DECLARE section. This means that you can't do something like
CREATE FUNCTION do_some_geometry(REAL) RETURNS REAL AS '
DECLARE
pi CONSTANT REAL := 3.1415926535;
radius REAL := 3.0;
diameter REAL := pi * ( radius * radius );
...
ERROR: Attribute 'pi' not found
Notice in the previous example that I declared pi to be a 'CONSTANT REAL'. When you define a variable as CONSTANT, you prevent assignment to that variable. You must provide an initializer for a CONSTANT.
The final modifier for a variable declaration is NOT NULL. Defining a variable to be NOT NULL means that you will receive an error if you try to set that variable to NULL. You must provide an initializer when you create a NOT NULL variable[5].
[5] This makes perfect sense if you think about it. If you don't provide an initializer, PL/pgSQL will initialize each variable to NULL?you can't do that if you have declared the variable to be NOT NULL.
Now you can put all these pieces together. The following declarations are identical in function:
pi CONSTANT REAL NOT NULL DEFAULT 3.1415926535; pi CONSTANT REAL NOT NULL := 3.1415926535; pi CONSTANT REAL := 3.1415926535;
Each declares a REAL variable named pi, with an initial value of 3.14159265. The NOT NULL clause is superfluous here because we have declared pi to be a constant and we have given it a non-null initial value; it's not a bad idea to include NOT NULL for documentation purposes.
When you create a PL/pgSQL variable, you must declare its data type. Before moving on to the ALIAS command, there are four pseudo data types that you should know about.
%TYPE lets you define one variable to be of the same type as another. Quite often, you will find that you need to temporarily store a value that you have retrieved from a table, or you might need to make a copy of a function parameter. Let's say that you are writing a function to process a rentals record in some way:
CREATE FUNCTION process_rental( rentals ) RETURNS BOOLEAN AS ' DECLARE original_tape_id CHAR(8); original_customer_id INTEGER; original_rental_row ALIAS FOR $1; BEGIN original_tape_id := original_rental_row.tape_id; original_customer_id := original_rental_row.customer_id; ...
In this snippet, you are making a local copy of the rentals.tape_id and rentals.customer_id columns. Without %TYPE, you have to ensure that you use the correct data types when you declare the original_tape_id and original_customer_id variables. That might not sound like such a big deal now, but what about six months later when you decide that eight characters isn't enough to hold a tape ID?
Instead of doing all that maintenance work yourself, you can let PL/pgSQL do the work for you. Here is a much better version of the process_rental() function:
CREATE FUNCTION process_rental( rentals ) RETURNS BOOLEAN AS ' DECLARE original_tape_id rentals.tape_id%TYPE; original_customer_id rentals.customer_id%TYPE; original_rental_row ALIAS FOR $1; BEGIN original_tape_id := original_rental_row.tape_id; original_customer_id := original_rental_row.customer_id; ...
By using %TYPE, I've told PL/pgSQL to create the original_tape_id variable using whatever type rentals.tape_id is defined to be. I've also created original_ customer_id with the same data type as the rentals.customer_id column.
This is an extremely powerful feature. At first blush, it may appear to be just a simple timesaving trick that you can use when you first create a function. The real power behind %TYPE is that your functions become self-maintaining. If you change the data type of the rentals.tape_id column, the process_rentals() function will automatically inherit the change. You won't have to track down all the places where you have made a temporary copy of a tape_id and change the data types.
You can use the %TYPE feature to obtain the type of a column or type of another variable (as shown in the code that follows). You cannot use %TYPE to obtain the type of a parameter. Starting with PostgreSQL version 7.2, you can use %TYPE in the argument list for a function?for example:
CREATE FUNCTION process_rental( rentals, rentals.customer_id%TYPE ) RETURNS BOOLEAN AS ' DECLARE original_tape_id rentals.tape_id%TYPE; original_customer_id rentals.customer_id%TYPE; original_rental_row ALIAS FOR $1; ...
%TYPE lets you access the data type of a column (or variable). %ROWTYPE provides similar functionality. You can use %ROWTYPE to declare a variable that has the same structure as a row in the given table. For example:
CREATE FUNCTION process_rental( rentals ) RETURNS BOOLEAN AS ' DECLARE original_tape_id rentals.tape_id%TYPE; original_customer_id rentals.customer_id%TYPE; original_rental_row rentals%ROWTYPE; ...
The original_rental_row variable is defined to have the same structure as a row in the rentals table. You can access columns in original_rental_row using the normal dot syntax: original_rental_row.tape_id, original_rental_row.rental_date, and so on.
Using %ROWTYPE, you can define a variable that has the same structure as a row in a specific table. A bit later in this chapter, I'll show you how to process dynamic queries (see the section "EXECUTE"); that is, a query whose text is not known at the time you are writing your function. When you are processing dynamic queries, you won't know which table to use with %ROWTYPE. The RECORD data type is used to declare a composite variable whose structure will be determined at execution time. I'll describe the RECORD type in more detail a bit later (see the section "Loop Constructs").
The final pseudo data type is OPAQUE. The OPAQUE type can be used only to define the return type of a function[6]. You cannot declare a variable (or parameter) to be of type OPAQUE. In fact, you can use OPAQUE only to define the return type of a trigger function (and a trigger function can return only an OPAQUE value). OPAQUE is a little strange. When you return an OPAQUE value, you return a row in the trigger's table. I'll talk about trigger functions later in this chapter (see the section "Triggers").
[6] You can use OPAQUE to define the data type of a function argument, but not when you are creating a PL/pgSQL function. Remember, functions can be defined in a number of different languages.
Now, let's move on to the next method that you can use to define a new variable, or a least a new name for an existing variable. You've already seen the ALIAS statement earlier in this chapter. The ALIAS statement creates an alternative name for a function parameter. You cannot ALIAS a variable that is not a function parameter. Using ALIAS, you can define any number of names that equate to a parameter:
CREATE FUNCTION foo( INTEGER ) RETURNS INTEGER AS ' DECLARE param_1 ALIAS FOR $1; my_param ALIAS FOR $1; arg_1 ALIAS FOR $1; BEGIN $1 := 42; -- At this point, $1, param_1, my_param and arg_1 -- are all set to 42. ...
The RENAME statement is similar to ALIAS; it provides a new name for an existing variable. Unlike ALIAS, RENAME invalidates the old variable name. You can RENAME any variable, not just function parameters. The syntax for the RENAME statement is
RENAME old-name TO new-name
Here is an example of the RENAME statement:
CREATE FUNCTION foo( INTEGER ) RETURNS INTEGER AS ' DECLARE RENAME $1 TO param1; BEGIN ...
Important NoteThe RENAME statement does not work in PostgreSQL versions 7.1.2 through at least 7.2. |
RENAME and ALIAS can be used only within the DECLARE section of a block.
So far, you have seen four methods for introducing a new variable or a new variable name. In each of the preceding methods, you explicitly declare a new variable (or name) in the DECLARE section of a block and the scope of the variable is the block in which it is defined. The final method is different.
One of the control structures that you will be looking at soon is the FOR loop. The FOR loop comes in two flavors?the first flavor is used to execute a block of statements some fixed number of times; the second flavor executes a statement block for each row returned by a query. In this section, I will talk only about the first flavor.
Here is an example of a FOR loop:
FOR i IN 1 .. 12 LOOP balance := balance + customers.monthly_balances[i]; END LOOP;
In this example, you have defined a loop that will execute 12 times. Each statement within the loop (you have only a single statement) will be executed 12 times. The variable i is called the iterator for the loop (you may also see the term loop index to describe the iterator). Each time you go through this loop, the iterator (i) is incremented by 1.
The iterator for an integer FOR loop is automatically declared for you. The type of the iterator is INTEGER. It is important to remember that the iterator for an integer FOR loop is a new variable. If you have already declared a variable with the same name as the iterator, the original variable will be hidden for the remainder of the loop. For example:
... DECLARE i REAL = 0; balance NUMERIC(9,2) = 0; BEGIN -- -- At this point, i = 0 -- FOR i IN 1 .. 12 LOOP -- -- we now have a new copy of i, it will vary from 1 to 12 -- balance := balance + customers.monthly_balances[i]; END LOOP; -- -- Now, if we access i, we will find that it is -- equal to 0 again --
Notice that while you are inside the loop, there are two variables named i?the inner variable is the loop iterator, and the outer variable was declared inside of this block. If you refer to i inside the loop, you are referring to the inner variable. If you refer to i outside the loop, you are referring to the outer variable. A little later, I'll show you how to access the outer variable from within the loop.
Now that you have seen how to define new variables, it's time to move on. This next section explains each type of statement that you can use in the body of a PL/pgSQL function.
At the beginning of this chapter, I said that PL/pgSQL adds a set of procedural constructs to the basic SQL language. In this next section, I'll examine the statement types added by PL/pgSQL. PL/pgSQL includes constructs for looping, exception and error handling, simple assignment, and conditional execution (that is, IF/THEN/ELSE). Although I don't describe them here, it's important to remember that you can also include any SQL command in a PL/pgSQL function.
The most commonly seen statement in many programs is the assignment statement. Assignment lets you assign a new value to a variable. The format of an assignment statement should be familiar by now; you've already seen it in most of the examples in this chapter:
target := expression;
target should identify a variable, a function parameter, a column, or in some cases, a row. If target is declared as CONSTANT, you will receive an error. When PL/pgSQL executes an assignment statement, it starts by evaluating the expression. If expression evaluates to a value whose data type is not the same as the data type of target, PL/pgSQL will convert the value to the target type. (In cases where conversion is not possible, PostgreSQL will reward you with an error message.)
The expression is actually evaluated by the PostgreSQL server, not by PL/pgSQL. This means that expression can be any valid PostgreSQL expression. Chapter 2, "Working with Data in PostgreSQL," describes PostgreSQL expressions in more detail.
The assignment statement is one way to put data into a variable; SELECT INTO is another. The syntax for a SELECT INTO statement is
SELECT INTO destination [, ...] select-list FROM ...;
A typical SELECT INTO statement might look like this:
... DECLARE customer customers%ROWTYPE; BEGIN SELECT INTO customer * FROM customers WHERE customer_id = 10; ...
When this statement is executed, PL/pgSQL sends the query "SELECT * FROM customers WHERE customer_id = 10" to the server. This query cannot return more than one row (if it does return more than one row, an error will occur). The results of the query are placed into the customer variable. Because I specified that customer is of type customers%ROWTYPE, the query must return a row shaped exactly like a customers row; otherwise, PL/pgSQL signals an error.
I could also SELECT INTO a list of variables, rather than into a single composite variable:
DECLARE phone customers.phone%TYPE; name customers.customer_name%TYPE: BEGIN SELECT INTO name,phone customer_name, customers.phone FROM customers WHERE customer_id = 10; ...
Notice that I had to explicitly request customers.phone in this query. If I had simply requested phone, PL/pgSQL would have assumed that I really wanted to execute the query:
SELECT customer_name, NULL FROM customers where customer_id = 10;
Why? Because I have declared a local variable named phone in this function, and PL/pgSQL would substitute the current value of phone wherever it occurred in the query. Because phone (the local variable) is initialized to NULL, PL/pgSQL would have stuffed NULL into the query. You should choose variable names that don't conflict with column names, or fully qualify column name references.
Of course, you can also SELECT INTO a RECORD variable and the RECORD will adapt its shape to match the results of the query.
I mentioned earlier that the query specified in a SELECT INTO statement must return no more than one row. What happens if the query returns no data? The variables that you are selecting into are set to NULL. You can also check the value of the predefined variable FOUND (described later in this chapter) to determine whether a row was actually retrieved. A bit later in this chapter, you'll see the FOR-IN-SELECT loop that can handle an arbitrary number of rows (see the section "Loop Constructs").
Using the IF statement, you can conditionally execute a section of code. The most basic form of the IF statement is
IF expression THEN statements END IF;
The expression must evaluate to a BOOLEAN value or to a value that can be coerced into a BOOLEAN value. If expression evaluates to TRUE, the statements between THEN and END IF are executed. If expression evaluates to FALSE or NULL, the statements are not executed.
Here are some sample IF statements:
IF ( now() > rentals.rental_date + rental_period ) THEN late_fee := handle_rental_overdue(); END IF; IF ( customers.balance > maximum_balance ) THEN PERFORM customer_over_balance( customers ); RETURN( FALSE ); END IF;
In each of these statements, the condition expression is evaluated by the PostgreSQL server. If the condition evaluates to TRUE, the statements between THEN and END IF are executed; otherwise, they are skipped and execution continues with the statement following the END IF.
You can also define a new block within the IF statement:
IF ( tapes.dist_id IS NULL ) THEN DECLARE default_dist_id CONSTANT integer := 0; BEGIN ... END; END IF;
The obvious advantage to defining a new block within an IF statement is that you can declare new variables. It's usually a good idea to declare variables with the shortest possible scope; you won't pollute the function's namespace with variables that you need in only a few places, and you can assign initial values that may rely on earlier computations.
The next form of the IF statement provides a way to execute one section of code if a condition is TRUE and a different set of code if the condition is not TRUE. The syntax for an IF-THEN-ELSE statement is
IF expression THEN statements_1 ELSE statements_2 END IF;
In this form, statements_1 will execute if expression evaluates to TRUE; otherwise, statements_2 will execute. Note that statements_2 will not execute if the expression is TRUE. Here are some sample IF-THEN-ELSE statements:
IF ( now() > rentals.rental_date + rental_period ) THEN late_fee := handle_rental_overdue(); ELSE late_fee := 0; END IF; IF ( customers.balance > maximum_balance ) THEN PERFORM customer_over_balance( customers ); RETURN( FALSE ); ELSE rental_ok = TRUE; END IF;
An IF-THEN-ELSE is almost equivalent to two IF statements?for example, the following
IF ( now() > rentals.rental_date + rental_period ) THEN statements_1 ELSE statements_2 END IF;
is nearly identical to
IF ( now() > rentals.rental_date + rental_period ) THEN statements_1 END IF; IF ( now() <= rentals.rental_date + rental_period ) THEN statements_2 END IF;
The difference between these two scenarios is that using IF-THEN-ELSE, the condition expression is evaluated once; but using two IF statements, the condition expression is evaluated twice. In many cases, this distinction won't be important; but in some circumstances, the condition expression may have side effects (such as causing a trigger to execute), and evaluating the expression twice will double the side effects.
You can nest IF-THEN-ELSE statements:
IF ( today > compute_due_date( rentals )) THEN -- -- This rental is past due -- ... ELSE IF ( today = compute_due_date( rentals )) THEN -- -- This rental is due today -- ... ELSE -- -- This rental is not late and it's not due today -- ... END IF; END IF;
PostgreSQL version 7.2 supports a more convenient way to nest IF-THEN-ELSE-IF statements:
IF ( today > compute_due_date( rentals )) THEN -- -- This rental is past due -- ... ELSIF ( today = compute_due_date( rentals )) THEN -- -- This rental is due today -- ... ELSE -- -- This rental is not late and it's not due today -- ... END IF;
The ELSIF form is functionally equivalent to a nested IF-THEN-ELSE-IF but you need only a single END IF statement. Notice that the spelling is ELSIF, not ELSE IF. You can include as many ELSIF sections as you like.
Next, let's look at the loop constructs offered by PL/pgSQL. Using a loop, you can repeat a sequence of statements until a condition occurs. The most basic loop construct is the LOOP statement:
[<<label>>] LOOP statements END LOOP;
In this form, the statements between LOOP and END LOOP are repeated until an EXIT or RETURN statement exits the loop. If you don't include an EXIT or RETURN statement, your function will loop forever. I'll explain the optional <<label>> in the section that covers the EXIT statement.
You can nest loops as deeply as you need:
1 row := 0; 2 3 LOOP 4 IF( row = 100 ) THEN 5 EXIT; 6 END IF; 7 8 col := 0; 9 10 LOOP 11 IF( col = 100 ) THEN 12 EXIT; 13 END IF; 14 15 PERFORM process( row, col ); 16 17 col := col + 1; 18 19 END LOOP; 20 21 row := row + 1; 22 END LOOP; 23 24 RETURN( 0 );
In the preceding code snippet, there are two loops. Because the inner loop is completely enclosed within the outer loop, the inner loop executes each time the outer loop repeats. The statements in the outer loop execute 100 times. The statements in the inner loop (lines 10 through 19) execute 100 x 100 times.
The EXIT statement at line 5 causes the outer LOOP to terminate; when you execute that statement, execution continues at the statement following the END LOOP for the enclosing loop (at line 24). The EXIT statement at line 12 will change the point of execution to the statement following the END LOOP for the enclosing loop (at line 21).
I'll cover the EXIT statement in more detail in the next section.
The next loop construct is the WHILE loop. The syntax for a WHILE loop is
[<<label>>] WHILE expression LOOP statements END LOOP;
The WHILE loop is used more frequently than a plain LOOP. A WHILE loop is equivalent to
[<<label>>] LOOP IF( NOT ( expression )) THEN EXIT; END IF; statements END LOOP;
The condition expression must evaluate to a BOOLEAN value or to a value that can be coerced to a BOOLEAN. The expression is evaluated each time execution reaches the top of the loop. If expression evaluates to TRUE, the statements within the loop are executed. If expression evaluates to FALSE or NULL, execution continues with the statement following the END LOOP.
Here is the nested loop example again, but this time, I have replaced the IF tests with a WHILE loop:
1 row := 0; 2 3 WHILE ( row < 100 ) LOOP 4 5 col := 0; 6 7 WHILE ( col < 100 ) LOOP 8 9 PERFORM process( row, col ); 10 11 col := col + 1; 12 13 END LOOP; 14 15 row := row + 1; 16 END LOOP; 17 18 RETURN( 0 );
You can see that the WHILE loop is much neater and easier to understand than the previous form. It's also a lot easier to introduce a bug if you use a plain LOOP and have to write the IF tests yourself.
The third loop construct is the FOR loop. There are two forms of the FOR loop. In the first form, called the integer-FOR loop, the loop is controlled by an integer variable:
[<<label>>] FOR iterator IN [ REVERSE ] start-expression .. end-expression LOOP statements END LOOP;
In this form, the statements inside the loop are repeated while the iterator is less than or equal to end-expression (or greater than or equal to if the loop direction is REVERSE). Just before the first iteration of the loop, iterator is initialized to start-expression. At the bottom of the loop, iterator is incremented by 1 (or ?1 if the loop direction is REVERSE); and if within the end-expression, execution jumps back to the first statement in the loop.
An integer-FOR loop is equivalent to:
[<<label>>] DECLARE Iterator INTEGER; increment INTEGER; end_value INTEGER; BEGIN IF( loop-direction = REVERSE ) THEN increment := -1; ELSE increment := 1; END IF; iterator := start-expression; end_value := end-expression; LOOP IF( iterator >= end_value ) THEN EXIT; END IF; statements iterator := iterator + increment; END LOOP; END;
The start-expression and end-expression are evaluated once, just before the loop begins. Both expressions must evaluate to an INTEGER value or to a value that can be coerced to an INTEGER.
Here is the example code snippet again, this time written in the form of an integer-FOR loop:
1 FOR row IN 0 .. 99 LOOP 2 3 FOR col in 0 .. 99 LOOP 4 5 PERFORM process( row, col ); 6 8 END LOOP; 9 10 END LOOP; 11 12 RETURN( 0 );
This version is more readable than the version that used a WHILE loop. All the information that you need in order to understand the loop construct is in the first line of the loop. Looking at line 1, you can see that this loop uses a variable named row as the iterator; and unless something unusual happens inside the loop, row starts at 0 and increments to 99.
There are a few points to remember about the integer-FOR loop. First, the iterator variable is automatically declared??it is defined to be an INTEGER and is local to the loop. Second, you can terminate the loop early using the EXIT (or RETURN) statement. Third, you can change the value of the iterator variable inside the loop (but I don't recommend it): Doing so can affect the number of iterations through the loop.
You can use this last point to your advantage. In PL/pgSQL, there is no way to explicitly specify a loop increment other than 1 (or ?1 if the loop is REVERSEd). But you can change the effective increment by modifying the iterator within the loop. For example, let's say that you want to process only odd numbers inside a loop:
1 ... 2 FOR i IN 1 .. 100 LOOP 3 ... 4 i := i + 1; 5 ... 6 END LOOP; 7 ...
The first time you go through this loop, i will be initialized to 1. At line 4, you increment i to 2. When you reach line 6, the FOR loop will increment i to 3 and then jump back to line 3 (the first line in the loop). You can, of course, increment the loop iterator in whatever form you need. If you fiddle with the loop iterator, be sure to write yourself a comment that explains what you're doing.
The second form of the FOR loop is used to process the results of a query. The syntax for this form is
[<<label>>] FOR iterator IN query LOOP statements END LOOP;
In this form, which I'll call the FOR-IN-SELECT form, the statements within the loop are executed once for each row returned by the query. query must be a SQL SELECT command. Each time through the loop, iterator will contain the next row returned by the query. If the query does not return any rows, the statements within the loop will not execute.
The iterator variable must either be of type RECORD or of a %ROWTYPE that matches the structure of a row returned by the query. Even if the query returns a single column, the iterator must be a RECORD or a %ROWTYPE.
Here is a code snippet that shows the FOR statement:
1 DECLARE 2 rental rentals%ROWTYPE; 3 BEGIN 4 5 FOR rental IN SELECT * FROM rentals ORDER BY rental_date LOOP 6 IF( rental_is_overdue( rental )) THEN 7 PERFORM process_late_rental( rental ); 8 END IF; 9 END LOOP; 10 11 END;
A %ROWTYPE iterator is fine if the query returns an entire row. If you need to retrieve a partial row, or you want to retrieve the result of a computation, declare the iterator variable as a RECORD. Here is an example:
1 DECLARE 2 my_record RECORD; 3 BEGIN 4 5 FOR my_record IN 6 SELECT tape_id, compute_due_date(rentals) AS due_date FROM rentals 7 LOOP 8 PERFORM 9 check_for_late_rental( my_record.tape_id, my_record.due_date ); 10 END LOOP; 11 12 END;
A RECORD variable does not have a fixed structure. The fields in a RECORD variable are determined at the time that a row is assigned. In the previous example, you assign a row returned by the SELECT to the my_record RECORD. Because the query returns two columns, my_record will contain two fields: tape_id and due_date. A RECORD variable can change its shape. If you used the my_record variable as the iterator in a second FOR-IN-SELECT loop in this function, the field names within the RECORD would change. For example:
1 DECLARE 2 my_record RECORD; 3 BEGIN 4 5 FOR my_record IN SELECT * FROM rentals LOOP 6 -- my_record now holds a row from the rentals table 7 -- I can access my_record.tape_id, my_record.rental_date, etc. 8 END LOOP; 9 10 FOR my_record IN SELECT * FROM tapes LOOP 11 -- my_record now holds a row from the tapes table 12 -- I can now access my_record.tape_id, my_record.title, etc. 13 END LOOP; 12 END;
You also can process the results of a dynamic query (that is, a query not known at the time you write the function) in a FOR loop. To execute a dynamic query in a FOR loop, the syntax is a bit different:
[<<label>>] FOR iterator IN EXECUTE query-string LOOP statements END LOOP;
Notice that this is nearly identical to a FOR-IN loop. The EXECUTE keyword tells PL/pgSQL that the following string may change each time the statement is executed. The query-string can be an arbitrarily complex expression that evaluates to a string value; of course, it must evaluate to a valid SELECT statement. The following function shows the FOR-IN-EXECUTE loop:
1 CREATE OR REPLACE FUNCTION my_count( VARCHAR ) RETURNS INTEGER AS ' 2 DECLARE 3 query ALIAS FOR $1; 4 count INTEGER := 0; 5 my_record RECORD; 6 BEGIN 7 FOR my_record IN EXECUTE query LOOP 8 count := count + 1; 9 END LOOP; 10 RETURN count; 11 END; 12 ' LANGUAGE 'plpgsql';
An EXIT statement (without any operands) terminates the enclosing block, and execution continues at the statement following the end of the block.
The full syntax for the EXIT statement is
EXIT [label] [WHEN boolean-expression];
All the EXIT statements that you have seen in this chapter have been simple EXIT statements. A simple EXIT statement unconditionally terminates the most closely nested block.
If you include WHEN boolean-expression in an EXIT statement, the EXIT becomes conditional?the EXIT occurs only if boolean-expression evaluates to TRUE?for example:
1 FOR i IN 1 .. 12 LOOP 2 balance := customer.customer_balances[i]; 3 EXIT WHEN ( balance = 0 ); 4 PERFORM check_balance( customer, balance ); 5 END LOOP; 6 7 RETURN( 0 );
When execution reaches line 3, the WHEN expression is evaluated. If the expression evaluates to TRUE, the loop will be terminated and execution will continue at line 7.
This statement should really be named EXIT...IF. The EXIT...WHEN expression is not evaluated after each statement, as the name might imply.
Now let's turn our attention to the subject of labels. A label is simply a string of the form
<<label>>
You can include a label prior to any of the following:
A DECLARE section
A LOOP
A WHILE loop
An integer FOR loop
A FOR...SELECT loop
A label can perform two distinct functions. First, a label can be referenced in an EXIT statement?for example:
1 <<row_loop>> 2 FOR row IN 0 .. 99 LOOP 3 4 <<column_loop>> 5 FOR col in 0 .. 99 LOOP 6 7 IF( process( row, col ) = FALSE ) THEN 8 EXIT row_loop; 9 END IF; 10 11 END LOOP; 12 13 END LOOP; 15 15 RETURN( 0 );
Normally, an EXIT statement terminates the most closely nested block (or loop). When you refer to a label in an EXIT statement, you can terminate more than one nested block. When PL/pgSQL executes the EXIT statement at line 8, it will terminate the <<column_loop>> block and the <<row_loop>> block. You can't EXIT a block unless it is active: In other words, you can't EXIT a block that has already ended or that has not yet begun.
The second use for a label has to do with variable scoping. Remember that an integer-FOR loop creates a new copy of the iterator variable. If you have already declared the iterator variable outside of the loop, you can't directly access it within the loop. Consider the following example:
1 <<func>> 2 DECLARE 3 month_num INTEGER := 6; 4 BEGIN 5 FOR month_num IN 1 .. 12 LOOP 6 PERFORM compute_monthly_info( month_num ); 7 END LOOP; 8 END;
Line 2 declares a variable named month_num. When execution reaches line 4, PL/pgSQL will create a second variable named month_num (and this variable will vary between 1 and 12). Within the scope of the new variable (between lines 4 and 6), any reference to month_num will refer to the new variable created at line 4. If you want to refer to the outer variable, you can qualify the name as func.month_num. In general terms, you can refer to any variable in a fully qualified form. If you omit the label qualifier, a variable reference refers to the variable with the shortest lifetime (that is, the most recently created variable).
Every PL/pgSQL function must terminate with a RETURN statement. The syntax for a RETURN statement is
RETURN expression;
When a RETURN statement executes, four things happen:
The expression is evaluated and, if necessary, coerced into the appropriate data type. The RETURN type of a function is declared when you create the function. In the example "CREATE FUNCTION func() RETURNS INTEGER ...", the RETURN type is declared to be an INTEGER. If the RETURN expression does not evaluate to the declared RETURN type, PL/pgSQL will try to convert it to the required type.
The current function terminates. When a function terminates, all code blocks within that function terminate, and all variables declared within that function are destroyed.
The return value (obtained by evaluating expression) is returned to the caller. If the caller assigns the return value to a variable, the assignment completes. If the caller uses the return value in an expression, the caller uses the return value to evaluate the expression. If the function was called by a PERFORM statement, the return value is discarded.
The point of execution returns to the caller.
If you fail to return a value, you will receive an error (control reaches end of function without RETURN). You can include many RETURN statements in a function, but only one will execute: whichever RETURN statement is reached first.
A function written in PL/pgSQL can contain SQL commands intermingled with PL/pgSQL-specific statements. Remember, a SQL command is something like CREATE TABLE, INSERT, UPDATE, and so on; whereas PL/pgSQL adds procedural statements such as IF, RETURN, or WHILE. If you want to create a new table within a PL/pgSQL function, you can just include a CREATE TABLE command in the code:
CREATE FUNCTION process_month_end( ) RETURNS BOOLEAN AS ' BEGIN ... CREATE TABLE temp_data ( ... ); ... DROP TABLE temp_data; ... END; ' LANGUAGE 'plpgsql';
You can include almost any SQL command just by writing the command inline. The exception is the SELECT command. A SELECT command retrieves data from the server. If you want to execute a SELECT command in a PL/pgSQL function, you normally provide variables to hold the results:
DECLARE Customer customers%ROWTYPE; BEGIN ... SELECT INTO customer * FROM customers WHERE( customer_id = 1 ); -- -- The customer variable will now hold the results of the query -- ... END;
On rare occasions, you may need to execute a SELECT statement, but you want to ignore the data returned by the query. Most likely, the SELECT statement that you want to execute will have some side effect, such as executing a function. You can use the PERFORM statement to execute an arbitrary SELECT command without using the results. For example:
... PERFORM SELECT my_function( rentals ) FROM rentals; ...
You can also use PERFORM to evaluate an arbitrary expression, again discarding the results:
... PERFORM record_timestamp( timeofday() ); ...
The EXECUTE statement is similar to the PERFORM statement. Although the PERFORM statement evaluates a SQL expression and discards the results, the EXECUTE statement executes a dynamic SQL command, and then discards the results. The difference is subtle but important. When the PL/pgSQL processor compiles a PERFORM expression statement, the query plan required to evaluate the expression is generated and stored along with the function. This means that expression must be known at the time you write your function. The EXECUTE statement, on the other hand, executes a SQL statement that is not known at the time you write your function. You may, for example, construct the text of a SQL statement within your function, or you might accept a string value from the caller and then execute that string.
Here is a function that uses the EXECUTE command to time the execution of a SQL command:
1 CREATE FUNCTION time_command( VARCHAR ) RETURNS INTERVAL AS ' 2 DECLARE 3 beg_time TIMESTAMP; 4 end_time TIMESTAMP; 5 BEGIN 6 7 beg_time := timeofday( ); 8 EXECUTE $1; 9 end_time := timeofday( ); 10 11 RETURN( end_time - beg_time ); 12 END; 13 ' LANGUAGE 'plpgsql';
You would call the time_command() function like this:
movies=# SELECT time_command( 'SELECT * FROM rentals' ); time_command -------------- 00:00:00.82 (1 row)
With the EXECUTE statement, you can execute any SQL command (including calls to PL/pgSQL functions) and the results will be discarded, except for the side effects.
PL/pgSQL provides a catch-all statement that gives you access to various pieces of result information: GET DIAGNOSTICS. Using GET DIAGNOSTICS, you can retrieve a count of the rows affected by the most recent UPDATE or DELETE command and the object-ID of the most recently inserted row. The syntax for the GET DIAGNOSTICS statement is
GET DIAGNOSTICS variable = [ROW_COUNT|RESULT_OID], ...;
ROW_COUNT is meaningless until you have executed an UPDATE or DELETE command. Likewise, RESULT_OID is meaningless until you execute an INSERT command.
Error handling is PL/pgSQL's weak point (actually, the problem is with PostgreSQL, not specifically with PL/pgSQL). Whenever the PostgreSQL server decides that something has "gone wrong," it aborts the current transaction and reports an error. That's it. You can't intercept the error in PL/pgSQL, you can't correct it and try again, and you can't even translate the error message into a more user-friendly format.
It seems likely that the error-handling mechanism in PostgreSQL will be improved in the future. At that point, you can probably expect PL/pgSQL to offer better ways to intercept and handle error conditions.
For now, you should try to write PL/pgSQL functions so that errors are headed off before they occur. For example, if your function needs to INSERT a row into a table with a UNIQUE constraint, you might want to check for a duplicate value before performing the INSERT.
Even though PL/pgSQL doesn't offer a way to intercept errors, it does provide a way to generate an error: the RAISE statement. The syntax for a RAISE statement is
RAISE severity 'message' [, variable [...]];
The severity determines how far the error message will go and whether the error should abort the current transaction.
Valid values for severity are
DEBUG?? The message is written to the server's log file and otherwise ignored. The function runs to completion, and the current transaction is not affected.
NOTICE?? The message is written to the server's log file and sent to the client application. The function runs to completion, and the current transaction is not affected.
EXCEPTION?? The message is written to the server's log file, the function terminates, and the current transaction is aborted.
The message string must be a literal value??you can't use a PL/pgSQL variable in this slot, and you cannot include a more complex expression. If you need to include variable information in the error message, you can sneak it into the message by including a % character wherever you want the variable value to appear?for example:
rentals.tape_id := ''AH-54706''; RAISE DEBUG ''tape_id = %'', rentals.tape_id;
When these statements are executed, the message tape_id = AH-54706 will be written to the server's log file. For each (single) % character in the message string, you must include a variable. If you want to include a literal percent character in the message, write it as %%?for example:
percentage := 20; RAISE NOTICE ''Top (%)%%'', percentage;
translates to Top (20)%.
The RAISE statement is useful for debugging your PL/pgSQL code; it's even better for debugging someone else's code. I find that the DEBUG severity is perfect for leaving evidence in the server log. When you ship a PL/pgSQL function to your users, you might want to leave a few RAISE DEBUG statements in your code. This can certainly make it easier to track down an elusive bug (remember, users never write down error messages, so you might as well arrange for the messages to appear in a log file). I use the RAISE NOTICE statement for interactive debugging. When I am first building a new PL/pgSQL function, the chances are very slim that I'll get it right the first time. (Funny, it doesn't seem to matter how trivial or complex the function is…) I start out by littering my code with RAISE NOTICE statements; I'll usually print the value of each function parameter as well as key information from each record that I SELECT. As it becomes clearer that my code is working, I'll either remove or comment out (using "--") the RAISE NOTICE statements. Before I send out my code to a victim, er, user, I'll find strategic places where I can leave RAISE DEBUG statements. The RAISE DEBUG statement is perfect for reporting things that should never happen. For example, because of the referential integrity that I built into the tapes, customers, and rentals tables, I should never find a rentals record that refers to a nonexistent customer. I'll check for that condition (a missing customer) and report the error with a RAISE DEBUG statement. Of course, in some circumstances, a missing customer should really trigger a RAISE EXCEPTION?if I just happen to notice the problem in passing and really doesn't affect the current function, I'll just note it with a RAISE DEBUG. So, the rule I follow is if the condition prevents further processing, I RAISE an EXCEPTION; if the condition should never happen, I RAISE a DEBUG message; if I am still developing my code, I RAISE a NOTICE.