18.3 Software for Backups

There are a number of software packages that allow you to perform backups. Some are vendor-specific, and others are quite commonly available. Each may have particular benefits in a particular environment. We'll outline a few of the more common ones here, including a few that you might not otherwise consider. You should consult your local documentation to see if there are special programs available with your system.

Beware of Backing Up Files with Holes

Standard Unix files are direct-access files; in other words, you can specify an offset from the beginning of the file, and then read and write from that location. If you have ever had experience with older mainframe systems that only allowed files to be accessed sequentially, you know how important random access is for many things, including building random-access databases.

An interesting case occurs when a program references beyond the "end" of the file and then writes. What goes into the space between the old end-of-file and the data just now written? Zero-filled bytes would seem to be appropriate, as there is really nothing there.

Now consider that the span could be millions of bytes long, and there is really nothing there. If Unix were to allocate disk blocks for all that space, it could possibly exhaust the free space available. Instead, values are set internal to the inode and file data pointers so that only blocks needed to hold written data are allocated. The remaining span represents a hole that Unix remembers. Files with holes are sometimes called sparse files. Attempts to read any of those blocks simply return zero values. Attempts to write any location in the hole results in a real disk block being allocated and written, so everything continues to appear normal. (One way to identify these files is to compare the size reported by ls -l with the size reported by ls -s.)

Small files with large holes can be a serious concern to backup software, depending on how your software handles them. Simple copy programs will try to read the file sequentially, and the result is a stream with lots of zero bytes. When copied into a new file, blocks are actually allocated for the whole span, and lots of space may be wasted. More intelligent programs, like dump or GNU tar with the -S option, bypass the normal file system and read the actual inode and set of data pointers. Such programs only save and restore the actual blocks allocated, thus saving both tape and file storage.

Keep these comments in mind if you try to copy or archive a file that appears to be larger in size than the disk it resides in. Copying a file with holes to another device can cause you to suddenly run out of disk space.

18.3.1 Simple Local Copies

The simplest form of backup is to make simple copies of your files and directories. You might make those copies to local disk, to removable disk, to tape, or to some other media. Some file copy programs will properly duplicate modification and access times, and copy owner and protection information, if you are the superuser or if the files belong to you. They seldom recreate links, however. Examples include:

cp

The standard command for copying individual files. Some versions support a -R or -r option to copy an entire directory tree.

dd

This command can be used to copy a whole disk partition at one time by specifying the names of partition device files as arguments. This process should be done with great care if the source partition is mounted: in such a case, the device should be for the block version of the disk rather than the character version. Never copy onto a mounted partition?unless you want to destroy the partition and cause an abrupt system halt!

Be careful when backing up live filesystems! If you're not going to bring your system down to single-user mode during backups (and few users are willing to tolerate this kind of downtime), you should be aware of how your backup procedure will handle attempts to back up a file that's in use by another process, particularly a process that may lock the file, write to the file, or unlink the file during the backup process. In some cases, you may need to write a script to temporarily stop certain processes (such as relational databases) during the backup and restart them afterwards in order to be sure that the backup file is not corrupted.

18.3.2 Simple Archives

There are several programs that are available to make simple archives packed into disk files or onto tape. These are usually capable of storing all directory information about a file, and restoring much of it if the correct options are used. Running these programs may result in a change of either (or both) the atime and the ctime of items archived, however (see Chapter 6).

ar

Simple file archiver. Largely obsolete for backups (although still used for creating Unix libraries).

tar

Simple tape archiver. Can create archives to files, tapes, or elsewhere. This choice seems to be the most widely used and simple archive program.

cpio

Another simple archive program. This program can create portable archives in plain ASCII of even binary files, if invoked with the correct options.

pax

The portable archiver/exchange tool, which is defined in the POSIX standard. This program combines tar and cpio functionality. It uses tar as its default file format.

18.3.3 Specialized Backup Programs

There are several dedicated backup programs:

dump/restore

This program is the "classic" one for archiving a whole partition at once, and for the associated file restorations.[8] Many versions of this program exist; all back up from the raw disk device, thus bypassing calls that would change any of the times present in inodes for files and directories. This program can also make the backups quickly.

[8] On Linux and BSD-based systems, a "no dump" file attribute can be set on files and directories to exclude them from dump. From a security standpoint, this is probably a bad idea; it's too easy to fail to notice the file attribute until you need to restore a file and discover that you'd made it "no dump." If you are concerned about backing up confidential files, encrypt your backups.

backup

Some SVR4-based systems have a suite of programs named, collectively, backup. These are also designed specifically to do backups of files and whole filesystems.

18.3.4 Network Backup Systems

A few programs can be used to do backups across a network link. Thus, you can do backups on one machine and write the results to another. An obvious example would be using a program that can write to stdout, and then piping the output to a remote shell. Some programs provide for compression (to improve backup speed on slower networks) and/or encryption of the data stream:

rdump/rrestore

A network version of the dump and restore commands. It uses a dedicated process on a machine that has a tape drive, and sends the data to that process. Thus, it allows a tape drive to be shared by a whole network of machines.

rsync

A program designed to remotely synchronize two filesystems. One filesystem is the master; changes in that one are propagated to the slave. rsync is optimized for use with logfiles: if a 100 MB file has 1 megabyte appended, rsync can detect this and copy only over the last megabyte.

scp

Enables you to copy a file or a whole directory tree to a remote machine using the SSH protocol, which avoids sending cleartext passwords over the network and can encrypt the data stream. It is based on the older rcp command, which is unsecure.

unison

Designed for two-way synchronization between two or more filesystems. When unison first runs, it creates a database that describes the current state of both filesystems. Thereafter, it can automatically propagate file additions, changes, and deletions from one filesystem to the other.

There are also several backup programs specifically designed to back up data from clients to a tape drive on a central server over a network. The central server is typically outfitted with a large tape drive or jukebox and is configured to back up the clients at night.

Amanda

The Advanced Maryland Automatic Network Disk Archiver (http://www.amanda.org). Amanda is a free software, client/server backup system that's over 10 years old and still actively maintained. The backup server (the host with the tape drive) connects to each backup client and instructs it to transfer data, which the server writes to tape using standard Unix utilities such as dump or tar. It is compatible with many tape drivers and changers, and has its own tape management system. In conjunction with Samba, it can back up Windows hosts as well.

Commercial solutions

Like Amanda, most commercial backup systems are based on a client/server architecture to allow a backup server to perform unattended backups of Unix, Windows, and Macintosh hosts over a network. Key features in commercial offerings are:

  • Indexing files or databases of files to make backups easier.

  • Staging little-used files to slower storage (such as write-once optical media).

Unfortunately, there are drawbacks for many uses, notably lack of portability across multiple platforms, and compatibility with sites that may not have the software installed. Be sure to fully evaluate the conditions under which you'll need to use the program and decide on a backup strategy before purchasing the software.

18.3.5 Encrypting Your Backups

You can improvise your own backup encryption if you have an encryption program that can be used as a filter and you use a backup program that can write to a file, such as the dump, cpio, or tar commands. For example, to make an encrypted tape archive using the tar command and the OpenSSL encryption program, you might use the following command:

# tar cf - dirs and files | openssl enc -des3 -salt | dd bs=10240 of=/dev/rm8

Although software encryption is not foolproof (for example, the software encryption program can be compromised to record all passwords), this method is certainly preferable to storing sensitive information on unencrypted backups.

Here is an example: suppose that you have the OpenSSL encryption program, which can prompt the user for a passphrase and then encrypt its standard input to standard output. You could use this program with the dump (called ufsdump under Solaris) program to back up the filesystem /u to the device /dev/rmt8 with the command:

# dump f - /u | openssl enc -des3 -salt | dd bs=10240 of=/dev/rmt8
enter des-ede3-cbc encryption password:

If you wanted to back up the filesystem with tar, you would instead use the command:

# tar cf - /u | openssl enc -des3 -salt | dd bs=10240 of=/dev/rmt8
enter des-ede3-cbc encryption password:

To read these files back, you would use the following command sequences:

# dd bs=10240 if=/dev/rmt8 | openssl enc -d -des3 -salt | restore fi -
enter des-ede3-cbc decryption password:

and:

# dd bs=10240 if=/dev/rmt8 | openssl enc -d -des3 -salt | tar xpBfv -
enter des-ede3-cbc decryption password:

In both of these examples, the backup programs are instructed to send the backup of the filesystems to standard output. The output is then encrypted and written to the tape drive.

If you encrypt the backup of a filesystem and you forget the key, the information stored on the backup will be unusable. Also, note that many systems do not encrypt individual files separately; you may have to decrypt (and in some cases restore) the entire partition that you backed up in order to restore a single file.



    Part VI: Appendixes