Configuring Your PostgreSQL Runtime Environment

After you have finished installing the PostgreSQL distribution, you may want to review the runtime configuration options.

Permanent configuration options should be defined in the file $PGDATA/postgresql.conf. The postgresql.conf file is a plain text file that you can maintain with your favorite editor (vi, emacs, and so on). When you create a new database cluster, the initdb program will create a default postgresql.conf file for you. postgresql.conf is arranged as a series of option=value pairs; blank lines are ignored and any text that follows an octothorpe (#) is treated as a comment. Here is a snippet from a postgresql.conf file created by initdb:


#

#       Connection Parameters

#

#tcpip_socket = false

#ssl = false



#max_connections = 32



#port = 5432

#hostname_lookup = false

#show_source_port = false



#unix_socket_directory = ''

#unix_socket_group = ''

#unix_socket_permissions = 0777

PostgreSQL supports a large number of runtime configuration options (more than 90 at last count). In the next few sections, you'll see a description of each parameter and the parameter's default value. Default values can come from four sources: a hard-wired default value that you can't adjust without changing the source code, a symbolic value that can be changed only by editing the include/pg_config.h header file, a compile-time configuration option, or a command-line option to the postmaster.

Some of the options can be modified at runtime using the SET command; others can be defined only before starting the postmaster. The sections that follow document the modification time for each parameter.

Parameters with a Modify Time of "Postmaster startup" can be changed only by modifying the postgresql.conf file and restarting the postmaster.

Parameters labeled SIGHUP can be modified after the postmaster process has started. To modify a SIGHUP option, edit the postgresql.conf configuration file and send a SIGHUP signal to the postmaster process. You can use the pg_ctl reload command to signal the postmaster.

The parameters that you can change with the SET command are labeled with a modification time of "SET command".

Connection-Related Parameters

This section looks at the connection-related configuration parameters. Notice that most of the connection-related parameters must be defined at the time that the postmaster starts.

TCPIP_SOCKET

Default Value:

False

Modify Time:

Postmaster startup

Override:

postmaster -i

This parameter determines whether the postmaster listens for connection requests coming from a TCP/IP socket. If TCPIP_SOCKET is false, the postmaster will listen for connection requests coming only from a Unix local domain socket. If TCPIP_SOCKET is true, the postmaster will listen for connection requests coming from a TCP/IP socket, as well as listening for local connection requests. You can override this variable by invoking the postmaster with the -i flag.

SSL

Default Value:

False

Modify Time:

Postmaster startup

Override:

postmaster -l

If true, the SSL parameter tells the postmaster to negotiate with clients over the use of SSL-secured connections. SSL is a protocol that encrypts the data stream flowing between the client and the server. If SSL is true, and the client supports SSL, the data stream will be encrypted; otherwise, PostgreSQL data will be sent in clear-text form. You can override this parameter by invoking the postmaster with the -l flag.

MAX_CONNECTIONS

Default Value:

32

Modify Time:

Postmaster startup

Override:

postmaster -n connections

The MAX_CONNECTIONS parameter determines the maximum number of concurrent client connections that the postmaster will accept. You can increase (or decrease) the maximum number of connections by invoking the postmaster with the -n connections parameter. You also can change the default value for MAX_CONNECTIONS by invoking configure with the --with-maxbackends=connections option when you build PostgreSQL from source code.

PORT

Default Value:

5432

Modify Time:

Postmaster startup

Override:

postmaster -p port

This parameter determines which TCP/IP port the postmaster should listen to. When a remote client application wants to connect to a PostgreSQL server, it must connect to a TCP/IP port where a postmaster is listening for connection requests. The client and server must agree on the same port number. You can override this parameter by invoking the postmaster with the -p port parameter. You can also change the default value for PORT by invoking configure with the --with-pgport=port when you build PostgreSQL from source code.

HOSTNAME_LOOKUP

Default Value:

False

Modify Time:

SIGHUP or Postmaster startup

Override:

None

If HOSTNAME_LOOKUP is False, any connection logs that you are gathering will show the IP address of each client. If HOSTNAME_LOOKUP is True, the postmaster will try to resolve the IP address into a host name and will include the hostname in the log if the resolution succeeds. Warning: this can a real performance hog if your name-resolution mechanism is not configured correctly.

SHOW_SOURCE_PORT

Default Value:

False

Modify Time:

SIGHUP or Postmaster startup

Override:

None

If True, this parameter tells PostgreSQL to log the outgoing port number of all client connections. The PostgreSQL Administrator's Manual says that this option is "pretty useless."

UNIX_SOCKET_DIRECTORY

Default Value:

/tmp

Modify Time:

Postmaster startup

Override:

postmaster -k directory

The postmaster always listens for local connection requests using a Unix domain socket. The socket's device file is normally found in the /tmp directory. You can move the socket device file to a different directory by using the UNIX_SOCKET_DIRECTORY configuration parameter or by invoking the postmaster with the -k directory parameter. You also can change the default value for this parameter by defining the DEFAULT_PGSOCKET_DIR directory when you configure and build PostgreSQL from source code.

UNIX_SOCKET_GROUP

Default Value:

None

Modify Time:

Postmaster startup

Override:

None

This parameter determines the owning group of the Unix local domain socket (see previous entry for more information). If UNIX_SOCKET_GROUP is undefined (or empty), the socket will be created using the default group for the user that starts the postmaster. The PostgreSQL Administrator's Manual suggests that you can use this parameter, along with UNIX_SOCKET_PERMISSION, to restrict local connections to a specific group.

UNIX_SOCKET_PERMISSIONS

Default Value:

0777

Modify Time:

Postmaster startup

Override:

None

This parameter determines the permissions assigned to the Unix local domain socket. By default, the socket is created with permissions of 0777 (meaning readable and writable by anyone). By changing the socket permissions, you can restrict local connection requests by user ID or group ID. For example, if you create a group named postgresusers, set UNIX_SOCKET_GROUP to postgresusers, and set UNIX_SOCKET_PERMISSIONS to 0060. Only users in the postgresusers group will be able to connect through the local domain socket.

VIRTUAL_HOST

Default Value:

None

Modify Time:

Postmaster startup

Override:

postmaster -h host

If the postmaster is running on a host that supports multiple IP addresses (for example, has multiple network adapters), you can use the VIRTUAL_HOST parameter to tell the postmaster to listen for connection requests on a specific IP address. If you don't specify a VIRTUAL_HOST, the postmaster will listen on all network adapters.

KRB_SERVER_KEYFILE

Default Value:

/etc/srvtab or $SYSCONFDIR/krb5.keytab

Modify Time:

Postmaster startup

Override:

None

If you are using Kerberos to authenticate clients, the server keyfile is normally located in /etc/srvtab (for Kerberos 4) or $SYSCONFDIR/krb5.keytab (for Kerberos 5). You can specify an alternate (possibly more secure) location using the KRB_SERVER_KEYFILE parameter.

Operational Parameters

The next set of parameters forms a group of loosely related options that affect how the PostgreSQL server operates. Most of these options affect performance and are therefore related to the options shown in the next section.

SHARED_BUFFERS

Default Value:

64 or DEF_NBUFFERS=nbuffers

Modify Time:

Postmaster startup

Override:

postmaster -B nbuffers

When PostgreSQL reads data from (or writes data to) disk, it first transfers the data into a cache stored in shared memory. This cache is shared by all clients connected to a single cluster. Disk I/O (and cache I/O) is performed in 8KB chunks (each chunk is called a page). The SHARED_BUFFERS parameter determines how many 8KB pages will be created in the shared cache. The default value, 64, is usually sufficient for a small number of users, but should be increased as your user count grows. See Chapter 4 for more information. You can change the default value for SHARED_BUFFERS by defining the DEF_NBUFFERS environment variable when you configure and build PostgreSQL from source code. You can also override SHARED_BUFFERS by invoking the postmaster with the -B nbuffers command-line parameter.

MAX_FSM_RELATIONS

Default Value:

100

Modify Time:

Postmaster startup

Override:

None

When PostgreSQL needs to write new data into a table, it searches the table for free space. If free space cannot be found within the table, the file holding the table is enlarged. The free-space manager caches free-space information in shared memory for better performance. The MAX_FSM_RELATIONS parameter determines the maximum number of tables that the free-space manager will manage at one time. If the cache becomes full, old free-space information will be removed from the cache to make room. This parameter is related to the MAX_FSM_PAGES parameter.

MAX_FSM_PAGES

Default Value:

1000

Modify Time:

Postmaster startup

Override:

None

This parameter (along with MAX_FSM_RELATIONS) determines the size of the free-space cache used by the free-space manager. The free-space cache contains, at most, MAX_FSM_PAGES worth of data from, at most, MAX_FSM_RELATIONS different tables.

These two parameters have no effect on read operations, but can affect the performance of INSERT and UPDATE commands.

MAX_LOCKS_PER_TRANSACTION

Default Value:

64

Modify Time:

Postmaster startup

Override:

None

This parameter, along with MAX_CONNECTIONS, determines the size of PostgreSQL's shared lock table. Any given transaction can hold more than MAX_LOCKS_PER_TRANSACTION locks, but the total number of locks cannot exceed MAX_CONNECTIONS * MAX_LOCKS_PER_TRANSACTION. PostgreSQL locking is described in Chapter 9, "Multi-Version Concurrency Control," of the PostgreSQL User's Manual.

SORT_MEM

Default Value:

512 kilobytes

Modify Time:

per-command

Override:

SET SORT_MEM TO maximum_memory_size

When PostgreSQL processes a query, it transforms the query from string form into an execution plan. An execution plan is a sequence of operations that must be performed in order to satisfy the query. A typical execution plan might include steps to scan through an entire table and sort the results. If an execution plan includes a Sort or Hash operation, PostgreSQL can use two different algorithms to perform the sort. If the amount of memory required to perform the sort exceeds SORT_MEM KB, PostgreSQL will switch from an in-memory sort to a more expensive, disk-based sort algorithm. You can adjust SORT_MEM on a per-command basis using the command SET SORT_MEM TO maximum _memory.

VACUUM_MEM

Default Value:

8192 kilobytes

Modify Time:

per-command

Override:

SET VACUUM_MEM TO maximum_memory_size

This parameter determines the maximum amount of memory that will be used by the VACUUM command. You can improve the performance of the VACUUM command, particularly for tables that are frequently modified, by increasing VACUUM_MEM.

WAL_BUFFERS

Default Value:

8

Modify Time:

Postmaster startup

Override:

None

When a transaction makes a change to a PostgreSQL table, the change is applied to the heap (and/or index) pages that are cached in shared memory. All changes are also logged to a write-ahead log. The write-ahead log is also cached in shared memory. When a transaction is committed, the write-ahead log is flushed to disk, but the changes made to the actual data pages may not be transferred from shared memory to disk until some point in the future. The size of the shared write-ahead cache is determined by WAL_BUFFERS. The default value of 8 creates a shared write-ahead cache of eight 8KB pages.

CHECKPOINT_SEGMENTS

Default Value:

3

Modify Time:

SIGHUP or Postmaster startup

Override:

None

The write-ahead log files are divided into 6MB segments. Every so often, PostgreSQL will need to move all modified data (heap and index) pages from the shared-memory cache to disk. This operation is called a checkpoint. Log entries made prior to a checkpoint are obsolete and the space consumed by those stale entries can be recycled. If PostgreSQL never performed a checkpoint, the write-ahead logs would grow without bound. The interval between checkpoints is determined by the CHECKPOINT_SEGMENTS and CHECKPOINT_TIMEOUT parameters. A checkpoint will occur every CHECKPOINT_TIMEOUT seconds or when the number of newly filled segments reaches CHECKPOINT_SEGMENTS.

CHECKPOINT_TIMEOUT

Default Value:

300 seconds

Modify Time:

SIGHUP or Postmaster startup

Override:

None

This parameter determines the maximum amount of time that can elapse between checkpoints. You may see a checkpoint occur before CHECKPOINT_TIMEOUT seconds has elapsed if the CHECKPOINT_SEGMENTS threshold has been reached.

WAL_FILES

Default Value:

0

Modify Time:

Postmaster startup

Override:

None

This parameter determines how many 16MB log segments are preallocated at each checkpoint. The WAL manager preallocates space to improve performance. If you find that write-ahead log files are being deleted (instead of being recycled), you should increase the value of WAL_FILES.

COMMIT_DELAY

Default Value:

0 microseconds

Modify Time:

SET command

Override:

SET COMMIT_DELAY TO microseconds

When a transaction is committed, the WAL must be flushed from shared-memory to disk. PostgreSQL pauses for COMMIT_DELAY microseconds so that other server processes can sneak their commits into the same flush operation. The default for this parameter is 0, meaning that the WAL will be flushed to disk immediately after each COMMIT.

COMMIT_SIBLINGS

Default Value:

5 transactions

Modify Time:

SET command

Override:

SET COMMIT_SIBLINGS TO transactions

The COMMIT_DELAY (described previously) is a waste of time if there are no other transactions active at the time you COMMIT (if there are no other transactions, they can't possibly try to sneak in a COMMIT). The WAL manager will not delay for COMMIT_DELAY microseconds unless there are at least COMMIT_SIBLINGS transactions active at the time you COMMIT your changes.

WAL_SYNC_METHOD

Default Value:

Dependent on host type

Modify Time:

SIGHUP or Postmaster startup

Override:

None

When the WAL manager needs to flush cached write-ahead pages to disk, it can use a variety of system calls. The legal values for WAL_SYNC_METHOD vary by host type. It's not very likely that you will ever need to adjust this value?the default value is chosen by the configure program at the time PostgreSQL is built from source code. See the PostgreSQL Administrator's Guide for more information.

FSYNC

Default Value:

True

Modify Time:

SIGHUP or Postmaster startup

Override:

postmaster -F

When an application (such as the PostgreSQL server) writes data to disk, the operating system usually buffers the modifications to improve performance. The OS kernel flushes modified buffers to disk at some time in the future. If your host operating system (or hardware) experiences a crash, not all buffers will be written to disk. If you set the FSYNC parameter to True, PostgreSQL will occasionally force the kernel to flush modified buffers to disk. Setting FSYNC to True improves reliability with little performance penalty.

Optimizer Parameters

This section looks at the configuration options that directly influence the PostgreSQL optimizer. The first seven options can be used to enable or disable execution strategies. Some of these options affect how the optimizer estimates execution costs. The last set of options control the PostgreSQL Genetic query optimizer (GEQO).

ENABLE_SEQSCAN

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_SEQSCAN TO [true|false]

This parameter affects the estimated cost of performing a sequential scan on a table. Setting ENABLE_SEQSCAN to False does not completely disable sequential scans; it simply raises the estimated cost so that sequential scans are not likely to appear in the execution plan. A sequential scan may still appear in the execution plan if there is no other way to satisfy the query (for example, if you have defined no indexes on a table).

This parameter is most often used to force PostgreSQL to use an index that it would not otherwise use. If you are tempted to force PostgreSQL to use an index, you probably need to VACUUM ANALYZE your table instead.

ENABLE_INDEXSCAN

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_INDEXSCAN TO [true|false]

Setting ENABLE_INDEXSCAN to False increases the estimated cost of performing an index scan so that it is unlikely to appear in an execution plan.

ENABLE_TIDSCAN

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_TIDSCAN TO [true|false]

Setting ENABLE_TIDSCAN to False increases the estimated cost of performing a TID scan so that it is unlikely to appear in an execution plan. Because a TID scan is generated only when you have a WHERE clause that specifically mentions the CTID pseudo-column, this parameter is seldom used.

ENABLE_SORT

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_SORT TO [true|false]

The ENABLE_SORT parameter is used to increase the estimated cost of a sort operation so that it is unlikely to appear in an execution plan (set ENABLE_SORT to False to increase the estimated cost). Sort operations are often required (in the absence of a useful index) when intermediate results must appear in a specific order. For example, both input sets to the MergeJoin operator must appear in sorted order. Of course, an ORDER BY clause can be satisfied using a sort operation. When results are required in a specific order, the only alternative to a sort operation is to use an index scan, thus it makes little sense to disable sorts and index scans at the same time.

ENABLE_NESTLOOP

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_NESTLOOP TO [true|false]

Setting ENABLE_NESTLOOP to False increases the estimated cost of performing a nested loop operation so that it is unlikely to appear in an execution plan. The Nested Loop operator, described in Chapter 4, is one of three algorithms that PostgreSQL can use to join two tables. Setting ENABLE_NESTLOOP to False makes it more likely that PostgreSQL will choose a MergeJoin or HashJoin operator over a Nested Loop operator.

ENABLE_MERGEJOIN

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_MERGEJOIN TO [true|false]

Setting ENABLE_MERGEJOIN to False increases the estimated cost of performing a MergeJoin operation so that it is unlikely to appear in an execution plan. Setting ENABLE_MERGEJOIN to False makes it more likely that PostgreSQL will choose a NestedLoop or HashJoin operator over a MergeJoin operator.

ENABLE_HASHJOIN

Default Value:

True

Modify Time:

SET command

Override:

SET ENABLE_HASHJOIN TO [true|false]

Setting ENABLE_HASHJOIN to False increases the estimated cost of performing a HashJoin operation so that it is unlikely to appear in an execution plan. Setting ENABLE_HASHJOIN to False makes it more likely that PostgreSQL will choose a NestedLoop or MergeJoin operator over a HashJoin operator.

KSQO

Default Value:

False

Modify Time:

SET command

Override:

SET KSQO TO [true|false]

Setting KSQO to True (the default value for this parameter is False) gives PostgreSQL permission to rewrite certain WHERE clauses in order to optimize queries that involve many OR operators. The Key Set Query Optimizer is largely obsolete as of PostgreSQL release 7.0 so the KSQO parameter is rarely used. See Chapter 3, "Run-time Configuration," of the PostgreSQL Administrator's Guide for more information about the Key Set Query Optimizer.

EFFECTIVE_CACHE_SIZE

Default Value:

1000

Modify Time:

SET command

Override:

SET EFFECTIVE_CACHE_SIZE TO size

When estimating the cost of an execution plan, PostgreSQL needs to make an educated guess about the cost of reading a random page from disk into the shared buffer cache. To do so, it needs to know the likelihood of finding a given page in the OS cache. The EFFECTIVE_CACHE_SIZE parameter tells PostgreSQL how much of the OS disk cache is likely to be given to your server process.

This parameter is used only when estimating the cost of an IndexScan or Sort operator (when the sort will overflow SORT_MEM bytes and switch from an in-memory sort to an on-disk sort).

Increasing the EFFECTIVE_CACHE_SIZE makes the cost estimator assume that any given page is more likely to be found in the cache. Decreasing the EFFECTIVE_CACHE_SIZE tells PostgreSQL that any given page is less likely to be found in the cache (and will therefore incur more expense).

RANDOM_PAGE_COST

Default Value:

4.0

Modify Time:

SET command

Override:

SET RANDOM_PAGE_COST TO float-value

RANDOM_PAGE_COST specifies the cost of loading a random page into the shared buffer cache. A sequential page fetch is assumed to cost 1 unit; the default value for RANDOM_PAGE_COST means that PostgreSQL assumes that it is four times as expensive to load a random page than a sequentially accessed page.

CPU_TUPLE_COST

Default Value:

0.01

Modify Time:

SET command

Override:

SET CPU_TUPLE_COST TO float-value

CPU_TUPLE_COST specifies the cost of processing a single tuple within a heap (data) page. With the default value of 0.01, PostgreSQL assumes that it is 100 times more expensive to load a sequential page from disk than to process a single tuple.

CPU_INDEX_TUPLE_COST

Default Value:

0.001

Modify Time:

SET command

Override:

SET CPU_INDEX_TUPLE_COST TO float-value

CPU_INDEX_TUPLE_COST specifies the cost of processing a single index entry. With the default value of 0.001, PostgreSQL assumes that it is 1000 times more expensive to load a sequential page from disk than to process a single tuple.

CPU_OPERATOR_COST

Default Value:

0.0025

Modify Time:

SET command

Override:

SET CPU_OPERATOR_COST TO float-value

CPU_OPERATOR_COST specifies the cost of processing a single operator (such as >= or !=) in a WHERE clause. With the default value of 0.0025, PostgreSQL assumes that it is 2500 times more expensive to load a sequential page from disk than to process a single operator.

The planner/optimizer works in three phases. The first phase examines the query parse tree and builds a set of execution plans. The second phase assigns a cost to the execution plan by estimating the expense of each step of the plan. The final phase chooses the least expensive alternative and discards the other plans.

Many queries can be evaluated by two or more execution plans. For example, if you have defined an index on the tape_id column, the following query:


SELECT * FROM tapes ORDER BY tape_id;

results in at least two execution plans. One plan scans through the entire table from beginning to end and sorts the results into the desired order (this plan includes a SeqScan operator and a Sort operator). The second plan reads through the entire table using the tape_id index (this plan includes an IndexScan operator). For complex queries, especially queries involving many tables, the number of alternative plans becomes large.

The job of the Genetic Query Optimizer (or GEQO, for short) is to reduce the number of alternatives that must be evaluated by eliminating plans that are likely to be more expensive than plans already seen. The next seven parameters control the GEQO. The GEQO algorithm is too complex to try to describe in the space available, so I will include the descriptions provided in the PostgreSQL Administrator's Guide for each of the GEQO-related parameters.

GEQO

Default Value:

True

Modify Time:

SET command

Override:

SET GEQO TO [true|false]

If GEQO is set to True, PostgreSQL will use the Genetic Query Optimizer to eliminate plans that are likely to be expensive. If GEQO is set to False, the planner/optimizer will produce every possible execution plan and find the least expensive among the alternatives.

GEQO_SELECTION_BIAS

Default Value:

2.0

Modify Time:

SET command

Override:

SET GEQO_SELECTION_BIAS TO float-value

GEQO_SELECTION_BIAS is the selective pressure within the population. Values can be from 1.50 to 2.00; the latter is the default.

GEQO_THRESHOLD

Default Value:

11

Modify Time:

SET command

Override:

SET GEQO_THRESHOLD TO float-value

Use genetic query optimization to plan queries with at least GEQO_THRESHOLD FROM items involved. (Note that a JOIN construct counts as only one FROM item.) The default is 11. For simpler queries, it is usually best to use the deterministic, exhaustive planner. This parameter also controls how hard the optimizer will try to merge subquery FROM clauses into the upper query

GEQO_POOL_SIZE

Default Value:

Number of tables involved in each query

Modify Time:

SET command

Override:

SET GEQO_POOL_SIZE TO number

GEQO_POOL_SIZE is the number of individuals in one population. Valid values are between 128 and 1024. If it is set to 0 (the default), a pool size of 2^(QS+1), where QS is the number of FROM items in the query, is taken.

GEQO_EFFORT

Default Value:

40

Modify Time:

SET command

Override:

SET GEQO_EFFORT TO number

GEQO_EFFORT is used to calculate a default for generations. Valid values are between 1 and 80; 40 being the default.



Part II: Programming with PostgreSQL