Direct cursor support is new in PL/pgSQL version 7.2. Processing a result set using a cursor is similar to processing a result set using a FOR loop, but cursors offer a few distinct advantages that you'll see in a moment.
You can think of a cursor as a name for a result set. You must declare a cursor variable just as you declare any other variable. The following code snippet shows how you might declare a cursor variable:
... DECLARE rental_cursor CURSOR FOR SELECT * FROM rentals; ...
rental_cursor is declared to be a cursor for the result set of the query SELECT * FROM rentals. When you declare a variable of type CURSOR, you must include a query. The cursor variable is said to be bound to this query, and the variable is a bound cursor variable.
Before you can use a bound cursor, you must open the cursor using the OPEN statement:
... DECLARE rental_cursor CURSOR FOR SELECT * FROM rentals; BEGIN OPEN rental_cursor; ...
If you try to OPEN a cursor that is already open, you will receive an error message (cursor "name" already in use). If you try to FETCH (see the section that follows) from a cursor that has not been opened, you'll receive an error message (cursor "name" is invalid). When you use a cursor, you first DECLARE it, then OPEN it, FETCH from it, and finally CLOSE it, in that order. You can repeat the OPEN, FETCH, CLOSE cycle if you want to process the cursor results again.
After a bound cursor has been opened, you can retrieve the result set (one row at a time) using the FETCH statement. When you fetch a row from a cursor, you have to provide one or more destination variables that PL/pgSQL can stuff the results into. The syntax for the FETCH statement is
FETCH cursor-name INTO destination [ , destination [...]];
The destination (or destinations) must match the shape of a row returned by the cursor. For example, if the cursor SELECTs a row from the rentals table, there are three possible destinations:
A variable of type rentals%ROWTYPE
Three variables: one of type rentals.tape_id%TYPE, one of type rentals.customer_id%TYPE, and the last of type rentals.rental_date%TYPE
A variable of type RECORD
Let's look at each of these destination types in more detail.
When you FETCH into a variable of some %ROWTYPE, you can refer to the individual columns using the usual variable.column notation. For example:
... DECLARE rental_cursor CURSOR FOR SELECT * FROM rentals; rental rentals%ROWTYPE; BEGIN OPEN rental_cursor; FETCH rental_cursor INTO rental; -- -- I can now access rental.tape_id, -- rental.customer_id, and rental.rental_date -- IF ( overdue( rental.rental_date )) THEN ...
Next, I can FETCH into a comma-separated list of variables. In the previous example, the rental_cursor cursor will return rows that each contain three columns. Rather than fetching into a %ROWTYPE variable, I can declare three separate variables (of the appropriate types) and FETCH into those instead:
... DECLARE rental_cursor CURSOR FOR SELECT * FROM rentals; tape_id rentals.tape_id%TYPE; customer_id rentals.customer_id%TYPE; rental_date rentals.rental_date%TYPE; BEGIN OPEN rental_cursor; FETCH rental_cursor INTO tape_id, customer_id, rental_date; IF ( overdue( rental_date )) THEN ...
You are not required to use variables declared with %TYPE, but this is the perfect place to do so. At the time you create a function, you usually know which columns you will be interested in, and declaring variables with %TYPE will make your functions much less fragile in cases where the referenced column types might change.
You cannot combine composite variables and scalar variables in the same FETCH statement[7]:
[7] This seems like a bug to me. You may be able to combine composite and scalar variables in a future release.
... DECLARE rental_cursor CURSOR FOR SELECT *, now() - rental_date FROM rentals; rental rentals%ROWTYPE; elapsed INTERVAL; BEGIN OPEN rental_cursor; FETCH rental_cursor INTO rental, elapsed; -- WRONG! Can't combine -- composite and scalar -- variables in the same -- FETCH IF ( overdue( rental.rental_date )) THEN ...
The third type of destination that you can use with a FETCH statement is a variable of type RECORD. You may recall from earlier in this chapter that a RECORD variable is something of a chameleon?it adjusts to whatever kind of data that you put into it. For example, the following snippet uses the same RECORD variable to hold two differently shaped rows:
... DECLARE rental_cursor CURSOR FOR SELECT * FROM rentals; customer_cursor CURSOR FOR SELECT * FROM customers; my_data RECORD; BEGIN OPEN rental_cursor; OPEN customer_cursor; FETCH rental_cursor INTO my_data; -- I can now refer to: -- my_data.tape_id -- my_data.customer_id -- my_data.rental_date FETCH customer_cursor INTO my_data; -- Now I can refer to: -- my_data.customer_id -- my_data.customer_name -- my_data.phone -- my_data.birth_date -- my_data.balance ...
After you have executed a FETCH statement, how do you know whether a row was actually retrieved? If you FETCH after retrieving the entire result, no error occurs. Instead, each PL/pgSQL function has an automatically declared variable named FOUND. FOUND is a BOOLEAN variable that is set by the PL/pgSQL interpreter to indicate various kinds of state information. Table 7.1 lists the points in time where PL/pgSQL sets the FOUND variable and the corresponding values.
Event |
Value |
---|---|
Start of each function |
FALSE |
Start of an integer-FOR loop |
FALSE |
Within an integer-FOR loop |
TRUE |
Start of a FOR...SELECT loop |
FALSE |
Within a FOR...SELECT loop |
TRUE |
Before SELECT INTO statement |
FALSE |
After SELECT INTO statement |
TRUE (if rows are returned) |
Before FETCH statement |
FALSE |
After FETCH statement |
TRUE (if a row is returned) |
So, you can see that FOUND is set to TRUE if a FETCH statement returns a row. Let's see how to put all the cursor related statements together into a single PL/pgSQL function:
... DECLARE next_rental CURSOR FOR SELECT * FROM rentals; rental rentals%ROWTYPE; BEGIN OPEN next_rental; LOOP FETCH next_rental INTO rental; EXIT WHEN NOT FOUND; PERFORM process_rental( rental ); END LOOP; CLOSE next_rental; END; ...
The first thing you do in this code snippet is OPEN the cursor. Next, you enter a LOOP that will process every row returned from the cursor. Inside of the LOOP, you FETCH a single record, EXIT the loop if the cursor is exhausted, and call another function (process_rental())if not. After the loop terminates, close the cursor using the CLOSE statement.
So far, it looks like a cursor loop is pretty much the same as a FOR-IN-SELECT loop. What else can you do with a cursor?
You've seen that you must provide a SELECT statement when you declare a CURSOR. Quite often, you'll find that you don't know the exact values involved in the query at the time you're writing a function. You can declare a parameterized cursor to solve this problem.
A parameterized cursor is similar in concept to a parameterized function. When you define a function, you can declare a set of parameters (these are called the formal parameters, or formal arguments); those parameters can be used within the function to change the results of the function. If you define a function without parameters, the function will always return the same results (unless influenced by global, external data). Each language imposes restrictions on where you can use a parameter within a function. In general, function parameters can be used anywhere that a value-yielding expression can be used. When you make a call to a parameterized function, you provide a value for each parameter: The values that you provide (these are called the actual parameters, or actual arguments) are substituted inside of the function wherever the formal parameters appear.
When you define a cursor, you can declare a set of formal parameters; those parameters can be used with the cursor to change the result set of the query. If you define a cursor without parameters, the query will always return the same result set, unless influenced by external data. PL/pgSQL restricts the places that you can use a parameter within a cursor definition. A cursor parameter can be used anywhere that a value-yielding expression can be used. When you open a cursor, you must specify values for each formal parameter. The actual parameters are substituted inside of the cursor wherever the formal parameters appear.
Let's look at an example:
1 ... 2 DECLARE 3 next_customer CURSOR (ID INTEGER) FOR 4 SELECT * FROM customers WHERE 5 customer_id = ID; 6 customer customers%ROWTYPE; 7 target_customer ALIAS FOR $1; 8 BEGIN 9 10 OPEN next_customer( target_customer ); 11 ...
Lines 3, 4, and 5 declare a parameterized cursor. This cursor has a single formal parameter; an INTEGER named ID. Notice (at the end of line 5), that I have used the formal parameter within the cursor definition. When I open this cursor, I'll provide an INTEGER value for the ID parameter. The actual parameter that I provide will be substituted into the query wherever the formal parameter is used. So, if target_customer is equal to, say, 42, the cursor opened at line 10 will read:
SELECT * FROM customers WHERE customer_id = 42;
The full syntax for a cursor declaration is
variable-name CURSOR [ (param-name param-type [, param-name param-type ...] ) ] FOR select-query;
The full syntax for an OPEN statement is
OPEN cursor-name [ ( actual-param-value [, actual-param-value...] ) ];
You would parameterize a cursor for the same reasons that you would parameterize a function: you want the results to depend on the actual arguments. When you parameterize a cursor, you are also making the cursor more reusable. For example, I might want to process all the tapes in my inventory, but I want to process the tapes one distributor at a time. If I don't use a parameterized cursor, I have to declare one cursor for each of my distributors (and I have to know the set of distributors at the time I write the function). Using a parameterized cursor, I can declare the cursor once and provide different actual arguments each time I open the cursor:
1 CREATE FUNCTION process_tapes_by_distributors( ) RETURNS INTEGER AS ' 2 DECLARE 3 next_distributor CURSOR FOR SELECT * FROM distributors; 4 next_tape CURSOR( ID ) CURSOR FOR 5 SELECT * FROM tapes WHERE dist_id = ID; 6 dist distributors%ROWTYPE; 7 tape tapes%ROWTYPE; 8 count INTEGER := 0; 9 BEGIN 10 OPEN next_distributor; 11 LOOP 12 FETCH next_distributor INTO dist; 13 EXIT WHEN NOT FOUND; 14 OPEN next_tape( dist.distributor_id ); 15 LOOP 16 FETCH next_tape INTO tape; 17 EXIT WHEN NOT FOUND; 18 PERFORM process_tape( dist, tape ); 19 count := count + 1; 20 END LOOP; 21 CLOSE next_tape; 22 END LOOP; 23 CLOSE next_distributor; 24 RETURN( count ); 25 END; 26 ' LANGUAGE 'plpgsql';
Notice that you can OPEN and CLOSE a cursor as often as you like. A cursor must be closed before it can be opened. Each time you open a parameterized cursor, you can provide new actual parameters.
Now, let's turn our attention to another aspect of cursor support in PL/pgSQL?cursor references.
When you declare a CURSOR variable, you provide a SELECT statement that is bound to the cursor. You can't change the text of the query after the cursor has been declared. Of course, you can parameterize the query to change the results, but the shape of the query remains the same: If the query returns rows from the tapes table, it will always return rows from the tapes table.
Instead of declaring a CURSOR, you can declare a variable to be of type REFCURSOR. A REFCURSOR is not actually a cursor, but a reference to a cursor. The syntax for declaring a REFCURSOR is
DECLARE
ref-name REFCURSOR;
...
Notice that you do not specify a query when creating a REFCURSOR. Instead, a cursor is bound to a REFCURSOR at runtime. Here is a simple example:
1 ... 2 DECLARE 3 next_rental CURSOR FOR SELECT * FROM rentals; 4 next_tape CURSOR FOR SELECT * FROM tapes; 5 rental rentals%ROWTYPE; 6 tape tape%ROWTYPE; 7 next_row REFCURSOR; 8 BEGIN 9 OPEN next_rental; 10 next_row := next_rental; 11 FETCH next_rental INTO rental; 12 FETCH next_row INTO rental; 13 CLOSE next_rental; 14 15 next_row := next_tape; 16 OPEN next_tape; 17 FETCH next_row INTO tape; 18 CLOSE next_row; 19 ...
In this block, I've declared two cursors and one cursor reference. One of the cursors returns rows from the rentals table, and the other returns rows from the tapes table.
At line 9, the next_rental cursor opens. At line 10, I give a value to the next_row cursor reference. We now have two ways to access the next_rental cursor: through the next_rental cursor variable and through the next_row cursor reference. At this point, next_row refers to the next_rental cursor. You can see (at lines 11 and 12) that you can FETCH a row using either variable. Both FETCH statements return a row from the rentals table.
At line 14, the next_row cursor reference points to a different cursor. Now, when you FETCH from next_row, you'll get a row from the tapes table. Notice that you can point next_row to a cursor that has not yet been opened. You can CLOSE a cursor using a cursor reference, but you can't OPEN a cursor using a cursor reference.
Actually, you can open a cursor using a REFCURSOR; you just can't open a named cursor. When you declare a CURSOR variable, you are really creating a PostgreSQL cursor whose name is the same as the name of the variable. In the previous example, you created one cursor (not just a cursor variable) named next_rental and a cursor named next_tape. PL/pgSQL allows you to create anonymous cursors using REFCURSOR variables. An anonymous cursor is a cursor that doesn't have a name[8]. You create an anonymous cursor using the OPEN statement, a REFCURSOR, and a SELECT statement:
[8] An anonymous cursor does in fact have a name, but PostgreSQL constructs the name, and it isn't very reader-friendly. An anonymous cursor has a name such as <unnamed cursor 42>.
1 ... 2 DECLARE 3 next_row REFCURSOR; 4 BEGIN 5 OPEN next_row FOR SELECT * FROM customers; 6 ...
At line 5, you are creating an anonymous cursor and binding it to the next_row cursor reference. After an anonymous cursor has been opened, you can treat it like any other cursor. You can FETCH from it, CLOSE it, and lose it. That last part might sound a little fishy, so let me explain further. Take a close look at the following code fragment:
1 CREATE FUNCTION leak_cursors( INTEGER ) RETURNS INTEGER AS ' 2 DECLARE 3 next_customer CURSOR FOR SELECT * FROM customers; 4 next_rental REFCURSOR; 5 customer customers%ROWTYPE; 6 rental rentals%ROWTYPE; 7 count INTEGER := 0; 8 BEGIN 9 10 OPEN next_customer; 11 12 LOOP 13 FETCH next_customer INTO customer; 14 EXIT WHEN NOT FOUND; 15 OPEN next_rental FOR 16 SELECT * FROM rentals 17 WHERE rentals.customer_id = customer.customer_id; 18 19 LOOP 20 FETCH next_rental INTO rental; 21 EXIT WHEN NOT FOUND; 22 23 RAISE NOTICE ''customer_id = %, rental_date = %'', 24 customer.customer_id, rental.rental_date; 25 26 count := count + 1; 27 END LOOP; 28 29 next_rental := NULL; 30 31 END LOOP; 32 CLOSE next_customer; 33 RETURN( count ); 34 END; 35 ' LANGUAGE 'plpgsql';
This function contains two loops: an outer loop that reads through the customers table and an inner loop that reads each rental for a given customer. The next_customer cursor is opened (at line 10) before the outer loop begins. The next_rental cursor is bound and opened (at lines 15, 16, and 17) just before the inner loop begins. After the inner loop completes, I set the next_rental cursor reference to NULL and continue with the outer loop. What happens to the cursor that was bound to next_rental? I didn't explicitly close the cursor, so it must remain open. After executing the assignment statement at line 29, I have no way to access the cursor again??remember, it's an anonymous cursor, so I can't refer to it by name. This situation is called a resource leak. A resource leak occurs when you create an object (in this case, a cursor) and then you lose all references to that object. If you can't find the object again, you can't free the resource. Avoid resource leaks; they're nasty and can cause performance problems. Resource leaks will also cause your code to fail if you run out of a resource (such as memory space). We can avoid the resource leak shown in this example by closing the next_rental before setting it to NULL.
You've seen what not to do with a cursor reference, but let's see what cursor references are really good for. The nice thing about a cursor reference is that you can pass the reference to another function, or you can return a reference to the caller. These are powerful features. By sharing cursor references between functions, you can factor your PL/pgSQL code into reusable pieces.
One of the more effective ways to use cursor references is to separate the code that processes a cursor from the code that creates the cursor. For example, you may find that we need a function to compute the total amount of money that we have received from a given customer over a given period of time. I might start by creating a single function that constructs a cursor and processes each row in that cursor:
... OPEN next_rental FOR SELECT * FROM rentals WHERE customer_id = $1 AND rental_date BETWEEN $2 AND $3; LOOP FETCH next_rental INTO rental -- accumulate rental values here ...
This is a good start, but it works only for a single set of conditions: a given customer and a given pair of dates. Instead, you can factor this one function into three separate functions.
The first function creates a cursor that, when opened, will return all rentals records for a given customer within a given period; the cursor is returned to the caller:
CREATE FUNCTION select_rentals_by_customer_interval( INTEGER, DATE, DATE ) RETURNS REFCURSOR AS ' DECLARE next_rental REFCURSOR; BEGIN OPEN next_rental FOR SELECT * FROM RENTALS WHERE customer_id = $1 AND rental_date BETWEEN $2 AND $3; RETURN( next_rental ); END; ' LANGUAGE 'plpgsql';
The second function, given a cursor that returns rentals records, computes the total value of the rentals accessible through that cursor:
CREATE FUNCTION compute_rental_value( REFCURSOR ) RETURNS NUMERIC AS ' DECLARE total NUMERIC(7,2) := 0; rental rentals%ROWTYPE; next_rental ALIAS FOR $1; BEGIN LOOP FETCH next_rental INTO rental; EXIT WHEN NOT FOUND; -- accumulate rental values here -- -- pretend that this is a complex -- task which requires loads of amazingly -- clever code ... END LOOP; RETURN( total ); END; ' LANGUAGE 'plpgsql';
The last function invokes the first two:
CREATE FUNCTION compute_value_by_customer_interval( INTEGER, DATE, DATE ) RETURNS NUMERIC AS ' DECLARE curs REFCURSOR; total NUMERIC(7,2); BEGIN curs := select_rentals_by_customer_interval( $1, $2, $3 ); total := compute_rental_value( curs ); CLOSE curs; RETURN( total ); END; ' LANGUAGE 'plpgsql';
The advantage to this approach is that you can construct a cursor using different selection criteria and call compute_total_value(). For example, you might want to compute the total values of all rentals of a given tape:
CREATE FUNCTION compute_tape_value( VARCHAR ) RETURNS NUMERIC AS ' DECLARE curs REFCURSOR; total NUMERIC(7,2); BEGIN OPEN curs FOR SELECT * FROM rentals WHERE tape_id = $1; total := compute_rental_value( curs ); CLOSE curs; RETURN( total ); END; ' LANGUAGE 'plpgsql';