Backing Up and Copying Databases

There are two ways to back up your PostgreSQL database. The first method is to create an archive containing the filesystem files that comprise your database. The second method is to create a SQL script that describes how to re-create the data in your database.

In the first method, you use an archiving tool, such as tar, cpio, or backup, to back up all the files in your database cluster. There are a number of disadvantages to this method. First, your entire database cluster must be shut down to ensure that all buffers have been flushed to disk. Second, the size of a filesystem archive will often be larger than the size of the equivalent script because the filesystem archive will contain indexes and partially filled pages that do not have to be archived. Finally, it is not possible to restore a single database or table from a filesystem archive. There are, however, two advantages to using a filesystem archive. First, you may already have a backup scheme in place that will backup a file system; including your database cluster in that scheme is probably pretty easy.

The second (and usually preferred) method is to create a SQL script that can reconstruct the contents of your database from scratch. Then, when you need to restore data from an archive, you simply run the script.

PostgreSQL provides two utilities that you can use to create archive scripts: pg_dump, and pg_dumpall.

Using pg_dump

The pg_dump program creates a SQL script that re-creates the data and metadata in your database. Before I get into too many details, it might help to see the kind of script that pg_dump will create^[7]:

^[7] I've changed the formatting of this script slightly so that it fits on a printed page.


$ pg_dump --inserts -t customers movies

--

-- Selected TOC Entries:

--

\connect - bruce



--

-- TOC Entry ID 2 (OID 518934)

--

-- Name: customers Type: TABLE Owner: bruce

--

CREATE TABLE "customers" (

        "id" integer,

        "customer_name" character varying(50),

        "phone" character(8),

        "birth_date" date

);



--

-- TOC Entry ID 3 (OID 518934)

--

-- Name: customers Type: ACL Owner:

--

REVOKE ALL on "customers" from PUBLIC;

GRANT ALL on "customers" to "bruce";

GRANT ALL on "customers" to "sheila";



--

-- Data for TOC Entry ID 4 (OID 518934)

--

-- Name: customers Type: TABLE DATA Owner: bruce

--

INSERT INTO "customers" VALUES

      (1,'Jones, Henry','555-1212','1970-10-10');

INSERT INTO "customers" VALUES

      (2,'Rubin, William','555-2211','1972-07-10');

INSERT INTO "customers" VALUES

      (3,'Panky, Henry','555-1221','1968-01-21');

INSERT INTO "customers" VALUES

      (4,'Wonderland, Alice N.','555-1122','1969-03-05');

INSERT INTO "customers" VALUES

      (7,'Grumby, Jonas',NULL,'1984-02-21');

INSERT INTO "customers" VALUES

      (8,'Haywood, Rosemary','666-1212','1965-02-03');

In this example, I've asked pg_dump to produce a script that re-creates a single table (-t customers) using INSERT commands rather than COPY commands (--inserts).

If we feed this script back into psql (or some other client application), psql will connect to the database as user bruce, CREATE the customers table, assign the proper privileges to the table, and INSERT all the rows that had been committed at the time that we started the original pg_dump command. You can see that this script contains everything that we need to re-create the customers table starting from an empty database. If we had defined triggers, sequences, or indexes for the customers table, the code necessary to re-create those objects would appear in the script as well.

Now let's look at some of the command-line options for pg_dump. Start with pg_dump --help:


$ pg_dump --help

pg_dump dumps a database as a text file or to other formats.



Usage:

  pg_dump [options] dbname



Options:

  -a, --data-only        dump only the data, not the schema

  -b, --blobs            include large objects in dump

  -c, --clean            clean (drop) schema prior to create

  -C, --create           include commands to create database in dump

  -d, --inserts          dump data as INSERT, rather than COPY, commands

  -D, --column-inserts   dump data as INSERT commands with column names

  -f, --file=FILENAME    output file name

  -F, --format {c|t|p}   output file format (custom, tar, plain text)

  -h, --host=HOSTNAME    database server host name

  -i, --ignore-version   proceed even when server version mismatches

                         pg_dump version

  -n, --no-quotes        suppress most quotes around identifiers

  -N, --quotes           enable most quotes around identifiers

  -o, --oids             include oids in dump

  -O, --no-owner         do not output \connect commands in plain

                         text format

  -p, --port=PORT        database server port number

  -R, --no-reconnect     disable ALL reconnections to the database in

                         plain text format

  -s, --schema-only      dump only the schema, no data

  -S, --superuser=NAME   specify the superuser user name to use in

                         plain text format

  -t, --table=TABLE      dump this table only (* for all)

  -U, --username=NAME    connect as specified database user

  -v, --verbose          verbose mode

  -W, --password         force password prompt

                            (should happen automatically)

  -x, --no-privileges    do not dump privileges (grant/revoke)

  -X use-set-session-authorization, --use-set-session-authorization

                           output SET SESSION AUTHORIZATION commands

                           rather than \connect commands

  -Z, --compress {0-9}     compression level for compressed formats



If no database name is not supplied, then the PGDATABASE environment

variable value is used.



Report bugs to <pgsql-bugs@postgresql.org>.

The most basic form for the pg_dump command is


pg_dump database

In this form, pg_dump archives all objects in the given database. You can see that pg_dump understands quite a number of command-line options. I'll explain the most useful options here.

If you use large-objects in your database, you may want to include the --blobs (or -b) option so that large-objects are written to the resulting script. Needless to say, archiving large-objects increases the size of your archive.

You also might want to include either --clean (-c) or --create (-C) when you are using pg_dump for backup purposes. The --clean flag tells pg_dump to DROP an object before it CREATEs the object?this reduces the number of errors you might see when you restore from the script. The second option, --create, tells pg_dump to include a CREATE DATABASE statement in the resulting archive. If you want to archive and restore an entire database, use the --create option when you create the archive and drop the database before you restore.

In the previous example, I included the --inserts flag. This flag, and the related --column-inserts flag affect how pg_dump populates each table in your database. If you don't include either flag, pg_dump will emit COPY commands to put data back into each table. If you use the --inserts flag, pg_dump will emit INSERT commands rather than COPY commands. If you use the --column-inserts flag, pg_dump will build INSERT commands, which include column lists, such as


INSERT INTO "customers" ("id","customer_name","phone","birth_date")

  VALUES (1,'Jones, Henry','555-1212','1970-10-10');

Emitting COPY commands causes the restore to execute more quickly than INSERT commands, so you should usually omit both flags (--inserts and --column-inserts). You might want to build INSERT commands if you intend to use the resulting script for some other purpose, such as copying data into an Oracle, Sybase, or SQL Server database.

Because pg_dump is a client application, you don't have to be logged in to the server to create an archive script. A few of the pg_dump options (--port, --host, and --username) control how pg_dump will connect to your database.

One of the problems that you may encounter when you run pg_dump is that it can produce scripts that are too large to store as a single file. Many operating systems impose a maximum file size of 2GB or 4GB. If you are archiving a large database, the resulting script can easily exceed the file size limit, even though no single table would (remember, each table is stored in its own file).

There are two (related) solutions to this problem. First, you can decrease the size of the archive script by compressing it. The pg_dump program writes the archive script to its standard output so you can pipe the script into a compression program:


$ pg_dump movies | gzip -9 > movies.gz


$ pg_dump movies | bzip2 -9 > movies.bz2

You also can compress the archive script by telling pg_dump to create the archive in custom format. The custom format is compressed and is organized so that the pg_restore program (described a bit later) can avoid problems caused by order of execution. To choose the custom format, include the --format c flag:


$ pg_dump --format c movies > movies.bak

Using the custom format means that your archive script will be compressed (thus taking less space and possibly fitting within the operating system imposed file size limit). However, you can't restore a custom-format script using psql; you must use pg_restore. That's not a problem per se; it's just something to be aware of.

Unfortunately, compressing the archive script is not really a solution; it simply delays the inevitable because even in compressed form, you may still exceed your OS file size limit. You may need to split the archive script into smaller pieces. Fortunately, the split command (a Unix/Linux/Cygwin utility) makes this easy. You can dump an entire database into a collection of smaller archive scripts (20MB each) with the following command:


$ pg_dump movies | split --bytes=20m movies.bak.

This command causes pg_dump to produce a single script, but when you pipe the script to split, it will split the script into 20MB chunks. The end result is a collection of one or more files with names such as movies.bak.aa, movies.bak.ab, … movies.bak.zz. When you want to restore data from these archives, you can concatenate them using the cat command:


$ cat movies.bak.* | psql -d movies

See the PostgreSQL Reference Manual for complete details on the pg_dump command.

Using pg_dumpall

The pg_dump command can archive individual tables or all the tables in a single database, but it cannot archive multiple databases. To archive an entire cluster, use the pg_dumpall command. pg_dumpall is similar to pg_dump in that it creates SQL scripts that can be used to re-create a database cluster.

pg_dumpall is actually a wrapper that invokes pg_dump for each database in your cluster. That means that pg_dumpall supports the same set of command-line options as pg_dump. Well, almost?pg_dumpall silently ignores any attempts to produce a custom or tar format script. pg_dumpall can produce archive scripts only in plain text format. This introduces two problems. First, you cannot compress the archive script by selecting the custom format; you must pipe the script to an external compression program instead. Second, you cannot archive large-objects using pg_dumpall (pg_dump can archive only large-objects using custom format, which you can't use with pg_dumpall).

Using pg_restore

When you create an archive script using pg_dump or pg_dumpall, you can restore the archive using pg_restore. The pg_restore command processes the given archive and produces a sequence of SQL commands that re-create the archived data. Note that pg_restore cannot process plain text archive scripts (such as those produced by pg_dumpall); you must produce the archive using the --format=c or --format=t options. If you want to restore a plain text archive script, simply pipe it into psql.

A typical invocation of pg_restore might look like this:


$ pg_restore --clean -d movies movies.bak

The --clean flag tells pg_restore to drop each database object before it is restored. The -d movies option tells pg_restore to connect to the movies database before processing the archive: All SQL commands built from the archive are executed within the given database. If you don't supply a database name, pg_restore writes the generated SQL commands to the standard output stream; this can be useful if you want to clone a database.

Like pg_dump, pg_restore can be used from a remote host. That means that you can provide the hostname, username, and password on the pg_restore command line.

The pg_restore program allows you to restore specific database objects (tables, functions, and so on); see the PostgreSQL Reference Manual for more details.

Part I: General PostgreSQL Use

Part II: Programming with PostgreSQL