11.4 The Python Database API (DBAPI) 2.0

As I mentioned earlier, the Python standard library does not come with an RDBMS interface, but there are many free third-party modules that let your Python programs access specific databases. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI.

At the time of this writing, Python's DBAPI Special Interest Group (SIG) was busy preparing a new version of the DBAPI (possibly to be known as 3.0 when it is ready). Programs written against DBAPI 2.0 should work with minimal or no changes with the future DBAPI 3.0, although 3.0 will no doubt offer further enhancements that future programs will be able to take advantage of.

If your Python program runs only on Windows, you may prefer to access databases by using Microsoft's ADO package through COM. For more information on using Python on Windows, see the book Python Programming on Win32, by Mark Hammond and Andy Robinson (O'Reilly). Since ADO and COM are platform-specific, and this book focuses on cross-platform use of Python, I do not cover ADO nor COM further in this book.

After importing a DBAPI-compliant module, you call the module's connect function with suitable parameters. connect returns an instance of class Connection, which represents a connection to the database. This instance supplies commit and rollback methods to let you deal with transactions, a close method to call as soon as you're done with the database, and a cursor method that returns an instance of class Cursor. This instance supplies the methods and attributes that you'll use for all database operations. A DBAPI-compliant module also supplies exception classes, descriptive attributes, factory functions, and type-description attributes.

11.4.1 Exception Classes

A DBAPI-compliant module supplies exception classes Warning, Error, and several subclasses of Error. Warning indicates such anomalies as data truncation during insertion. Error's subclasses indicate various kinds of errors that your program can encounter when dealing with the database and the DBAPI-compliant module that interfaces to it. Generally, your code uses a statement of the form:

try: 
    ...
except module.Error, err: 
    ...

in order to trap all database-related errors that you need to handle without terminating.

11.4.2 Thread Safety

When a DBAPI-compliant module has an attribute threadsafety that is greater than 0, the module is asserting some specific level of thread safety for database interfacing. Rather than relying on this, it's safer and more portable to ensure that a single thread has exclusive access to any given external resource, such as a database, as outlined in Chapter 14.

11.4.3 Parameter Style

A DBAPI-compliant module has an attribute paramstyle that identifies the style of markers to use as placeholders for parameters. You insert such markers in SQL statement strings that you pass to methods of Cursor instances, such as method execute, in order to use runtime-determined parameter values. Say, for example, that you need to fetch the rows of database table ATABLE where field AFIELD equals the current value of Python variable x. Assuming the cursor instance is named c, you could perform this task by using Python's string formatting operator % as follows:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=%r' % x)

However, this is not the recommended approach. This approach generates a different statement string for each value of x, requiring such statements to be parsed and prepared anew each time. With parameter substitution, you pass to execute a single statement string, with a placeholder instead of the parameter value. This lets execute perform parsing and preparation just once, giving potentially better performance. For example, if a module's paramstyle attribute is 'qmark', you can express the above query as:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=?', [x])

The read-only attribute paramstyle is meant to inform your program about the way to use parameter substitution with that module. The possible values of paramstyle are:

format

The marker is %s, as in string formatting. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=%s', [x])
named

The marker is :name and parameters are named. A query look like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=:x', {'x':x})
numeric

The marker is :n, giving the parameter's number. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=:1', [x])
pyformat

The marker is %(name)s and parameters are named. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=%(x)s', {'x':x})
qmark

The marker is ?. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=?', [x])

When paramstyle does not imply named parameters, the second argument of method execute is a sequence. When parameters are named, the second argument of method execute is a dictionary.

11.4.4 Factory Functions

Parameters passed to the database via placeholders must typically be of the right type. This means Python numbers (integers or floating-point values), strings (plain or Unicode), and None to represent SQL NULL. Python has no specific types to represent dates, times, and binary large objects (BLOBs). A DBAPI-compliant module supplies factory functions to build such objects. The types used for this purpose by most DBAPI-compliant modules are those supplied by module mxDateTime, covered in Chapter 12, and strings or buffer types for BLOBs. The factory functions are as follows.

Binary

Binary(string)

Returns an object representing the given string of bytes as a BLOB.

Date

Date(year,month,day)

Returns an object representing the specified date.

DateFromTicks

DateFromTicks(s)

Returns an object representing the date that is s seconds after the epoch of module time, covered in Chapter 12. For example, DateFromTicks(time.time( )) is today's date.

Time

Time(hour,minute,second)

Returns an object representing the specified time.

TimeFromTicks

TimeFromTicks(s)

Returns an object representing the time that is s seconds after the epoch of module time, covered in Chapter 12. For example, TimeFromTicks(time.time( )) is the current time.

Timestamp

Timestamp(year,month,day,hour,minute,second)

Returns an object representing the specified date and time.

TimestampFromTicks

TimestampFromTicks(s)

Returns an object representing the date and time that is s seconds after the epoch of module time, covered in Chapter 12. For example, TimestampFromTicks(time.time( )) is the current date and time.

11.4.5 Type Description Attributes

A Cursor instance's attribute description describes the types and other characteristics of each column of a query. Each column's type (the second item of the tuple describing the column) equals one of the following attributes of the DBAPI-compliant module:

BINARY

Describes columns containing BLOBs

DATETIME

Describes columns containing dates, times, or both

NUMBER

Describes columns containing numbers of any kind

ROWID

Describes columns containing a row-identification number

STRING

Describes columns containing text of any kind

A cursor's description, and in particular each column's type, is mostly useful for introspection about the database your program is working with. Such introspection can help you write general modules that are able to work with databases that have different schemas, schemas that may not be fully known at the time you are writing your code.

11.4.6 The connect Function

A DBAPI-compliant module's connect function accepts arguments that vary depending on the kind of database and the specific module involved. The DBAPI standard recommends, but does not mandate, that connect accept named arguments. In particular, connect should at least accept optional arguments with the following names:

database

Name of the specific database to connect

dsn

Data-source name to use for the connection

host

Hostname on which the database is running

password

Password to use for the connection

user

Username for the connection

11.4.7 Connection Objects

A DBAPI-compliant module's connect function returns an object x that is an instance of class Connection. x supplies the following methods.

close

x.close(  )

Terminates the database connection and releases all related resources. Call close as soon as you're done with the database, since keeping database connections uselessly open can be a serious resource drain on the system.

commit

x
.commit(  )

Commits the current transaction in the database. If the database does not support transactions, x.commit( ) is an innocuous no-op.

cursor

x.close(  )

Returns a new instance of class Cursor, covered later in this section.

rollback

x.rollback(  )

Rolls back the current transaction in the database. If the database does not support transactions, x.rollback( ) raises an exception. The DBAPI recommends, but does not mandate, that for databases that do not support transactions class Connection supplies no rollback method, so that x.rollback( ) raises AttributeError. You can test whether transaction support is present with hasattr(x,'rollback').

11.4.8 Cursor Objects

A Connection instance provides a cursor method that returns an object c that is an instance of class Cursor. A SQL cursor represents the set of results of a query and lets you work with the records in that set, in sequence, one at a time. A cursor as modeled by the DBAPI is a richer concept, since it also represents the only way in which your program executes SQL queries in the first place. On the other hand, a DBAPI cursor allows you only to advance in the sequence of results (some relational databases, but not all, also provide richer cursors that are able to go backward as well as forward), and does not support the SQL clause WHERE CURRENT OF CURSOR. These limitations of DBAPI cursors enable DBAPI-compliant modules to provide cursors even on RDBMSes that provide no real SQL cursors at all. An instance of class Cursor c supplies many attributes and methods; the most frequently used ones are documented here.

close

c.close(  )

Closes the cursor and releases all related resources.

description

A read-only attribute that is a sequence of seven-item tuples, one per column in the last query executed:

name, typecode, displaysize, internalsize, precision, scale, nullable

c.description is None if the last operation on c was not a query or returned no usable description of the columns involved. A cursor's description is mostly useful for introspection about the database your program is working with. Such introspection can help you write general modules that are able to work with databases that have different schemas, including schemas that may not be fully known at the time you are writing your code.

execute

c.execute(statement,parameters=None)

Executes a SQL statement on the database with the given parameters. parameters is a sequence when the module's paramstyle is 'format', 'numeric', or 'qmark', and a dictionary when 'named' or 'pyformat'.

executemany

c.executemany(statement,*parameters)

Executes a SQL statement on the database, once for each item of the given parameters. parameters is a sequence of sequences when the module's paramstyle is 'format', 'numeric', or 'qmark', and a sequence of dictionaries when 'named' or 'pyformat'. For example, the statement:

c.executemany('UPDATE atable SET x=? WHERE y=?',
                     (12,23),(23,34))

that uses a module whose paramstyle is 'qmark' is equivalent to, but probably faster than, the two statements:

c.execute('UPDATE atable SET x=12 WHERE y=23')
c.execute('UPDATE atable SET x=23 WHERE y=34')
fetchall

c.fetchall(  )

Returns all remaining result rows from the last query as a sequence of tuples. Raises an exception if the last operation was not a SELECT query.

fetchmany

c.fetchmany(n)

Returns up to n remaining result rows from the last query as a sequence of tuples. Raises an exception if the last operation was not a SELECT query.

fetchone

c.fetchone(  )

Returns the next result row from the last query as a tuple. Raises an exception if the last operation was not a SELECT query.

rowcount

A read-only attribute that specifies the number of rows fetched or affected by the last operation, or -1 if the module is unable to determine this value.

11.4.9 DBAPI-Compliant Modules

Whatever relational database you want to use, there's at least one (and often more than one) DBAPI-compliant module that you can download from the Internet. All modules listed in the following sections, except mxODBC, have liberal licenses that are mostly similar to Python's license (the SAP DB, however, is licensed under GPL) and that let you use them freely in either open source or closed source programs. mxODBC can be used freely for noncommercial purposes, but you must purchase a license for any commercial use. There are so many relational databases that it's impossible to list them all, but here are some of the most popular ones:

ODBC

Open DataBase Connectivity (ODBC) is a popular standard that lets you connect to many different relational databases, including ones not otherwise supported by DBAPI-compliant modules, such as Microsoft Jet (also known as the Access database). The Windows distribution of Python contains an odbc module, but the module is unsupported and complies to an older version of the DBAPI, not to the current version 2.0. On either Unix or Windows, use mxODBC, available at http://www.lemburg.com/files/Python/mxODBC.html. mxODBC's paramstyle is 'qmark'. Its connect function accepts three optional arguments, named dsn, user, and password.

Oracle

Oracle is a widespread, commercial RDBMS. To interface to Oracle, I recommend module DCOracle2, available at http://www.zope.org/Members/matt/dco2. DCOracle2's paramstyle is 'numeric'. Its connect function accepts a single optional, unnamed argument string with the syntax:

'user/password@service'
Microsoft SQL Server

To interface to Microsoft SQL Server, I recommend module mssqldb, available at http://www.object-craft.com.au/projects/mssql/. mssqldb's paramstyle is 'qmark'. Its connect function accepts three arguments, named dsn, user, and passwd, as well as an optional database argument.

DB2

For IBM DB/2, try module DB2, available at ftp://people.linuxkorea.co.kr/pub/DB2/. DB2's paramstyle is 'format'. Its connect function accepts three optional arguments, named dsn, uid, and pwd.

MySQL

MySQL is a widespread, open source RDBMS. To interface to MySQL, try MySQLdb, available at http://sourceforge.net/projects/mysql-python. MySQLdb's paramstyle is 'format'. Its connect function accepts four optional arguments, named db, host, user, and passwd.

PostgreSQL

PostgreSQL is an excellent open source RDBMS. To interface to PostgreSQL, I recommend psycopg, available at http://initd.org/Software/psycopg. psycopg's paramstyle is 'pyformat'. Its connect function accepts a single mandatory argument, named dsn, with the syntax:

'host=host dbname=dbname user=username password=password'
SAP DB

SAP DB, once known as Adabas, is a powerful RDBMS that used to be closed source, but is now open source. SAP DB comes with sapdbapi, available at http://www.sapdb.org/sapdbapi.html, as well as other useful Python modules. sapdbapi's paramstyle is 'pyformat'. Its connect function accepts three mandatory arguments, named user, password, and database, and an optional argument named host.

11.4.10 Gadfly

Gadfly, available at http://gadfly.sf.net, is not an interface to some other RDBMS, but rather a complete RDBMS engine written in Python. Gadfly supports a large subset of standard SQL. For example, Gadfly lacks NULL, but it does support VIEW, which is a crucial SQL feature that engines such as MySQL still lack at the time of this writing. Gadfly can run as a daemon server, to which clients connect with TCP/IP. Alternatively, you can run the Gadfly engine directly in your application's process, if you don't need other processes to be able to access the same database concurrently.

The gadfly module has several discrepancies from the DBAPI 2.0 covered in this chapter because Gadfly implements a variant of the older DBAPI 1.0. The concepts are quite close, but several details differ. The main differences are:

  • gadfly does not supply custom exception classes, so Gadfly operations that fail raise normal Python exceptions, such as IOError, NameError, etc.

  • gadfly does not supply a paramstyle attribute. However, the module behaves as if it supplied a paramstyle of 'qmark'.

  • gadfly does not supply a function named connect; use the gadfly.gadfly or gadfly.client.gfclient functions instead.

  • gadfly does not supply factory functions for data types.

  • Gadfly cursors do not supply the executemany method. Instead, in the specific case in which the SQL statement is an INSERT, the execute method optionally accepts as its second argument a list of tuples and inserts all the data.

  • Gadfly cursors do not supply the rowcount method.

The gadfly module supplies the following functions.

gadfly

gadfly.gadfly(dbname,dirpath)

Returns a connection object for the database named dbname, which must have been previously created in the directory indicated by string dirpath. The database engine runs in the same process as your application.

gfclient

gadfly.client.gfclient(policyname, port, password, host)

Returns a connection object for the database served by a gfserve process on the given host and port. policyname identifies the level of access required, and is often 'admin' to specify unlimited access.



    Part III: Python Library and Extension Modules