The libpgtcl Large-Object API

The libpgtcl library provides a number of procedures that you can use to interact with PostgreSQL large-objects. A large-object is a value that is stored indirectly. When you create a column that will contain a large-object, the column should be of type OID (object-id). When you import a large-object into your database, the bits that make up the object are stored in the pg_largeobject system table and a reference is stored in your table. Large-objects are typically used to hold images, sound files, or large pieces of unstructured data.

There are two ways to create a large-object. First, you can create a large-object using the pg_lo_creat procedure. pg_lo_creat creates a new (empty) entry in the pg_largeobject table and returns the OID of that entry. After you have an empty large-object, you can write data into it using pg_lo_write.

Second, you can import an existing file (such as a JPEG-encoded photograph) into a database using pg_lo_import. The pg_lo_import manual page says that pg_lo_import requires two parameters (a connection handle and a filename) and returns nothing. That documentation is incorrect: pg_lo_import returns the OID of the new large-object.

Here is a code snippet that shows how to use the pg_lo_import procedure:


...

pg_result [pg_exec $conn "BEGIN WORK"] -clear # Start a transaction



set large_object_oid [pg_lo_import $conn "/images/happyface.jpg"]



pg_result [pg_exec $conn "COMMIT WORK"] -clear

...

Note that you must call pg_lo_import within a transaction block. In fact, all large-object operations must occur within a transaction block.

The inverse of pg_lo_import is pg_lo_export. pg_lo_export copies a large-object into a file:


...

pg_result [pg_exec $conn "BEGIN WORK"] -clear # Start a transaction



pg_log_export $conn $large_object_oid "/images/jocularface.jpg"



pg_result [pg_exec $conn "COMMIT WORK"] -clear

...

Like pg_lo_import, pg_lo_export must be called within a transaction block. You can also read the contents of a large-object using libpgtcl. To start with, you must open the desired large-object using pg_lo_open:


...

set fd [pg_lo_open $conn $large_object_oid "rw"]

...

When you call pg_lo_open, you provide a connection handle, the OID of the large-object that you want, and an access mode. libpgtcl is a little fickle when it comes to large-object access modes; pg_lo_open expects "r", "w", or "rw"; but pg_lo_create expects "INV_READ", "INV_WRITE", or "INV_READ|INV_WRITE". The value returned by pg_lo_open is a large-object handle; and after you have one of those, you can read from, write to, or move around in the large-object.

First, let's talk about positioning within a large-object. Large-objects can be, well, large. Your application may not need to read (or write) an entire large-object all at once; for really big large-objects, you may want to work with small chunks. To make this possible, libpgtcl lets you seek your large-object handle to the part that you are interested in; then, you can read or write from there.

The pg_lo_lseek procedure is modeled after the Unix lseek() function. pg_lo_lseek requires three parameters:


pg_lo_lseek connection-handle large-object-handle offset starting-point

The connection-handle and large-object-handle parameters are self-explanatory. offset specifies the number of bytes you want to move. starting-point specifies which position you want to move from. SEEK_CUR means that you want to move offset bytes relative to the current position. SEEK_SET means that you want to move offset bytes relative to the start of the object. SEEK_END will position your offset bytes from the end of the object.

If you specify a starting-point of SEEK_CUR or SEEK_END, offset can be either positive or negative (a negative offset moves you toward the beginning of the object). With SEEK_SET, offset should always be a non-negative number. A starting-point of SEEK_SET and an offset of 0 position you to the beginning of the object. A starting-point of SEEK_END and an offset of 0 position you to the end of the object. If you specify a starting-point of SEEK_CUR and an offset of 0, your position within the object remains unchanged.

The pg_lo_tell procedure returns your current position within an object. pg_lo_tell requires two parameters:


set current_offset [pg_lo_tell connection-handle large-object-handle]

You can determine the number of bytes in a large-object by seeking to the end of the object and then finding the offset:


...

pg_lo_lseek $conn $object_handle 0 SEEK_CUR

set object_size [pg_lo_tell $conn $object_handle]

...

After you have established a position within a large-object, you can read from or write to the object. To write (or modify) data in a large-object, use the pg_lo_write procedure:


pg_lo_write connection-handle large-object-handle string length

For example, if you want to append a file onto an existing large-object, you would write code similar to this:


...

pg_exec $conn "BEGIN"

set fd [open "/images/sadface.jpg"]

set object_handle [pg_lo_open $conn $large_object_oid "rw"]



pg_lo_lseek $conn $object_handle 0 SEEK_END



while { [eof $fd] != 1 } {

  set val [read $fd 1000]

  pg_lo_write $conn $object_handle val [string length $val]

}



close $fd

pg_lo_close $object_handle

pg_exec $conn "COMMIT"

...

After opening the file and the large-object, seek to the end of the large-object and then copy from the file handle to the large-object handle, 1000 bytes at a time. We've also called pg_lo_close to close the large-object handle.

When you write to a large-object, you can create holes in the data. For example, if you start out with an empty large-object and then seek 100 bytes into it before calling pg_lo_write, you are creating a 100-byte hole at the beginning of the large-object. Holes are treated as if they contain zeroes. In other words, when you read back this particular large-object, the first 100 bytes will contain nothing but zeroes^[5].

^[5] In case you are wondering, PostgreSQL stores each large-object in a collection of blocks. Each block is typically 2048 bytes long. When you create a hole in a large-object, PostgreSQL will store the minimal number of blocks required to hold the object. If a block within a large-object contains nothing but a hole, it will not take up any physical space in the pg_largeobject table.

You can also read from a large-object in a piece-by-piece manner using pg_lo_lseek and pg_lo_read:


...

pg_exec $conn "BEGIN"



set object_handle [pg_lo_open $conn $large_object_oid "r"]



pg_lo_lseek $conn $object_handle 0 SEEK_END

set len [pg_tell $conn $object_handle]

pg_lo_lseek $conn $object_handle 0 SEEK_SET



pg_lo_read $conn $object_handle img $len



image create photo my_photo



my_photo put $img -format gif



pg_lo_close $object_handle

pg_exec $conn "COMMIT"

...

As before, you must start a transaction block before using any of the large-object procedures. After opening the large-object (using pg_lo_open), compute the size of the object. The easiest way to find the size of an existing large-object is to seek to the end and then use pg_lo_tell to find the offset of the last byte. After you know the size, you can read the entire object into a string variable using pg_lo_read. In the preceding example, we read the entire large-object in one call to pg_lo_read, but that is not strictly necessary. You can use pg_lo_lseek to move around within the large-object before you read (or write).

One important point here: When you call pg_lo_read (or pg_lo_write), your position within the object is advanced by the number of bytes read (or written).

The pg_lo_read procedure requires four parameters:


pg_lo_read connection-handle object-handle varname length

The connection-handle and object-handle parameters should be familiar by now. The varname parameter should contain the name of a variable?be careful with this parameter: You don't want to pass the contents of a variable; you want to pass the name. So, the following example will usually be incorrect:


pg_lo_read $conn $object_handle $img $len

This is likely to be wrong because you are passing the contents of the $img variable, not the name. You most likely want^[6]

^[6] The only time you would want to pass the value of a variable (as the third parameter) would be when one variable holds the name of another.


pg_lo_read $conn $object_handle img $len

There is one more large-object procedure that you might need to know about. If you want to remove a large-object from your database, use the pg_lo_unlink procedure: