Section 27.1. Making Backups

Making backups of your system is an important way to protect yourself from data corruption or loss in case you have problems with your hardware, or you make a mistake such as deleting important files inadvertently. During your experiences with Linux, you're likely to make quite a few customizations to the system that can't be restored by simply reinstalling from your original installation media. However, if you happen to have your original Linux CD-ROM or DVD-ROM handy, it may not be necessary to back up your entire system. Your original installation media already serve as an excellent backup.

Under Linux, as with any Unix-like system, you can make mistakes while logged in as root that would make it impossible to boot the system or log in later. Many newcomers approach such a problem by reinstalling the system entirely from backup, or worse, from scratch. This is seldom, if ever, necessary. In "What to Do in an Emergency," later in this chapter, we talk about what to do in these cases.

If you do experience data loss, it is sometimes possible to recover that data using the filesystem maintenance tools described in "Checking and Repairing Filesystems" in Chapter 10. Unlike some other operating systems, however, it's generally not possible to "undelete" a file that has been removed by rm or overwritten by a careless cp or mv command (for example, copying one file over another destroys the file to which you're copying). In these extreme cases, backups are key to recovering from problems.

Backups are usually made to tape, floppy, CD-R(W), or DVD-R(W). None of these media is 100% reliable, although tape, CD-R(W), and DVD-R(W) are more dependable than floppies in the long term. These days, with the cost of hard disks plummeting and the capacity increasing, backing up to a hard disk is also an option.

Many tools are available to help you make backups. In the simplest case, you can use a combination of gzip (or bzip2) and tar to back up files from your hard drive to removable media. This is the best method to use when you make only occasional backupsno more often than, say, once a month.

If you have numerous users on your system or you make frequent changes to the system configuration, it makes more sense to employ an incremental backup scheme. Under such a scheme, you would make a "full backup" of the system only about once a month. Then, every week, you would back up only those files that changed in the last week. Likewise, each night, you could back up just those files that changed over the previous 24 hours. There are several tools to aid you in this type of backup.

The idea behind an incremental backup is that it is more efficient to make backups in small steps; you use fewer tapes or CDs, and the weekly and nightly backups are shorter and easier to run. With this method, you have a backup that is at most a day old. If you were to, say, accidentally delete your entire system, you would restore it from backup in the following manner:

  1. Restore from the most recent monthly backup. For instance, if you wiped the system on July 17, you would restore the July 1 full backup. Your system now reflects the state of files when the July 1 backup was made.

  2. Restore from each weekly backup made so far this month. In our case, we could restore from the two weekly backups from July 7 and 14. Restoring each weekly backup updates all the files that changed during that week.

  3. Restore from each daily backup during the last weekthat is, since the last weekly backup. In this case, we would restore the daily backups from July 15 and 16. The system now looks as it did when the daily backup was taken on July 16; no more than a day's worth of files have been lost.

Depending on the size of your system, the full monthly backup might require 4 GB or more of backup storageoften not more than one DVD-R(W) or tape, but quite a few Zip disks. However, the weekly and daily backups would generally require much less storage space . Depending on how your system is used, you might decide to make the weekly backup on Sunday night and not bother with daily backups for the weekend.

One important characteristic that backups should (usually) have is the ability to select individual files from the backup for restoration. This way, if you accidentally delete a single file or group of files, you can simply restore those files without having to do a full system restoration. Depending on how you make backups , however, this task will be either very easy or painfully difficult.

It's also highly desirable to keep the backup media physically separate from the computer. If you choose to back up to hard disk, consider an external USB, FireWire, or SCSI drive. USB and FireWire are particularly nice because they can be easily plugged into or removed from the system as needed. If you choose to back up to a second, internal hard disk, it would be wise to at least keep the disk unmounted when it's not in use, so that if you were to accidentally delete one or more filesystems, your backup would be spared. An important decision you need to make is evaluating the relative importance of your data's safety and recoverability versus the cost and convenience of the backup media you choose, as well as how you use it.

In this section, we talk about the use of tar, gzip, and a few related tools for making backups to CD and tape. We even cover the use of tape drives, as well as CD-R in the bargain. These tools allow you to make backups more or less "by hand"; you can automate the process by writing shell scripts and even schedule your backups to run automatically during the night using cron. All you have to do is flip tapes. Other software packages provide a nice menu-driven interface for creating backups, restoring specific files from backup, and so forth. Many of these packages are, in fact, nice frontends to tar and gzip. You can decide for yourself what kind of backup system suits you best.

27.1.1. Simple Backups

The simplest means of making a backup is to use tar to archive all the files on the system or only those files in a set of specific directories. Before you do this, however, you need to decide what files to back up. Do you need to back up every file on the system? This is rarely necessary, especially if you have your original installation disks or CD-ROM. If you have made important changes to the system, but everything else is just the way it was found on your installation media, you could get by with only archiving those files you have made changes to. Over time, however, it is difficult to keep track of such changes.

In general, you will be making changes to the system configuration files in /etc. There are other configuration files as well, and it can't hurt to archive directories such as /usr/lib and /etc/X11 (which contains the XFree86 configuration files, as we saw in "Installing X.org" in Chapter 16).

You should also back up your kernel sources (if you have upgraded or built your own kernel); these are found in /usr/src/linux.

During your Linux adventures it's a good idea to keep notes on what features of the system you've made changes to so that you can make intelligent choices when making backups . If you're truly paranoid, go ahead and back up the whole system; that can't hurt, but the cost of backup media might.

Of course, you should also back up the home directories for each user on the system; these are generally found in /home. If you have your system configured to receive electronic mail (see "The Postfix MTA" in Chapter 23), you might want to back up the incoming mail files for each user. Many people tend to keep old and "important" electronic mail in their incoming mail spool, and it's not difficult to accidentally corrupt one of these files through a mailer error or other mistake. These files are usually found in /var/spool/mail. Of course, this applies only if you are using the local mail system, not if you access mail directly via POP3 or IMAP.

27.1.1.1. Backing up to tape

Assuming you know what files or directories to back up, you're ready to roll. You can use the tar command directly, as we saw in "Using tar" in Chapter 12, to make a backup. For example, the command:

tar cvf /dev/qft0 /usr/src /etc /home

archives all the files from /usr/src, /etc, and /home to /dev/qft0. /dev/qft0 is the first "floppy-tape" devicethat is, a tape drive that hangs off of the floppy controller. Many popular tape drives for the PC use this interface. If you have a SCSI tape drive, the device names are /dev/st0, /dev/st1, and so on, based on the drive number. Those tape drives with another type of interface have their own device names; you can determine these by looking at the documentation for the device driver in the kernel.

You can then read the archive back from the tape using a command such as:

tar xvf /dev/qft0

This is exactly as if you were dealing with a tar file on disk, as discussed in "Archive and Compression Utilities" in Chapter 12.

When you use the tape drive, the tape is seen as a stream that may be read from or written to in one direction only. Once tar is done, the tape device will be closed, and the tape will rewind. You don't create a filesystem on a tape, nor do you mount it or attempt to access the data on it as files. You simply treat the tape device itself as a single "file" from which to create or extract archives.

Be sure your tapes are formatted before you use them. This ensures that the beginning-of-tape marker and bad-blocks information have been written to the tape. For formatting QIC-80 tapes (those used with floppy-tape drivers), you can use a tool called ftformat that is either already included with your distribution or can be downloaded from ftp://sunsite.unc.edu/pub/Linux/kernel/tapes as part of the ftape package.

Creating one tar file per tape might be wasteful if the archive requires only a fraction of the capacity of the tape. To place more than one file on a tape, you must first prevent the tape from rewinding after each use, and you must have a way to position the tape to the next file marker, for both tar file creation and extraction.

The way to do this is to use the nonrewinding tape devices , which are named /dev/nqft0, /dev/nqft1, and so on for floppy-tape drivers, and /dev/nst0, /dev/nst1, and so on for SCSI tapes. When this device is used for reading or writing, the tape will not be rewound when the device is closed (that is, once tar has completed). You can then use tar again to add another archive to the tape. The two tar files on the tape won't have anything to do with each other. Of course, if you later overwrite the first tar file, you may overwrite the second file or leave an undesirable gap between the first and second files (which may be interpreted as garbage). In general, don't attempt to replace just one file on a tape that has multiple files on it.

Using the nonrewinding tape device, you can add as many files to the tape as space permits. To rewind the tape after use, use the mt command. mt is a general-purpose command that performs a number of functions with the tape drive. The mt command, although very useful and powerful, is also fairly complicated. There's a lot to keep track of to locate a particular record on the tape, and it's easy to get confused. If you're particularly motivated to use your tapes as efficiently as possible, read up on mt; the manpage is quite concise. We include a few examples here. The command:

mt /dev/nqft0 rewind

rewinds the tape in the first floppy-tape device.

Similarly, the command:

mt /dev/nqft0 reten

retensions the tape by winding it to the end and then rewinding it.

When reading files on a multiple-file tape, you must use the nonrewinding tape device with tar and the mt command to position the tape to the appropriate file.

For example, to skip to the next file on the tape, use the command:

mt /dev/nqft0 fsf 1

This skips over one file on the tape. Similarly, to skip over two files, use:

mt /dev/nqft0 fsf 2

or

mt device fsf 1

to move to the next file.

Be sure to use the appropriate nonrewinding tape device with mt. Note that this command does not move to "file number two" on the tape; it skips over the next two files based on the current tape position. Just use mt to rewind the tape if you're not sure where the tape is currently positioned. You can also skip back; see the mt(1) manual page for a complete list of options.

27.1.1.2. Backing up to CD-R

You can back up your files to recordable CD perhaps more easily than to tape. Blank CDs are very inexpensive, widely available, readable on just about any system, and much easier to transport and store than tapes. In this section, we show you the basics of writing backups to CD-R, as well as a couple of tricks. Almost all of the techniques that work for CD-Rs work equally well for the various flavors of recordable DVDs that are available.

By far the most common way to write data to a CD is to create a CD image file on your hard disk, then burn that to CD. This is easy to do, but has one slight disadvantage: you need at least 650 or 700 MB of free disk space to create a full-sized CD image. On modern systems that generally shouldn't be a problem.

CD-ROMs use the ISO 9660 filesystem standard, which can be mounted and read on just about any operating system in common use today. The program mkisofs is a full-featured and robust tool for creating such filesystems, which can be used in a number of ways, including burning them to CD-R. The actual burning can be done with cdrecord. Both of these programs are usually included with most Linux systems.

Here's how to create an ISO 9660 image and burn it to CD. Let's say you have a directory called /data that you want to put on CD. Enter:

# mkisofs  -T -r -o /tmp/mycd.iso /data
# cdrecord -v -eject -fs=4M speed=8 dev=0,0,0 /tmp/mycd.iso

Some of the parameters of cdrecord are system-specific. You can run cdrecord -scanbus to search for the CD burner on your machine. On the machine used for testing the material in this section, the CD burner shows up as device 0,0,0. Even though the author has a 52 x burner, he still chooses to record CDs at only 8 x, to make sure he doesn't underflow the drive and make a bad disk. Experiment with your hardware and determine what worksyou may or may not be able to burn reliably at higher speeds.

A slick, if somewhat less reliable, way to create a CD without writing an image file first is to simply pipe the output of mkisofs directly to cdrecord:

mkisofs  -T -r  /data  |  cdrecord -v -eject -fs=4M speed=8 dev=0,0,0 -

That's not the only possible optimization. If, for some reason, you wanted to treat a CD like a tape, you could skip creating an ISO 9660 filesystem, and just write a tar file directly to a CD. This won't be mountable as a normal CD, and you won't be able to put it in a Windows system, but if you prefer this, it works:

tar -czf - /data | cdrecord -v -eject -fs=4M speed=8 dev=0,0,0 -

It is important to note that although CD burners are much better than they were a few years ago, it's still quite possible to produce a useless disk if there is anything more than a brief interruption in the flow of data from the source to the burner. These problems are even more apparent when burning from a pipeline as in the last two examples. For this reason, we urge you to check your backups after making them!

In addition to cdrecord, tar, and mkisofs, there are a large number of other programs available on the Web that provide an easy-to-use frontend for creating backups. Some of them are able to span multiple CDs, or manage a rotation of CD-RW disks. If you find that creating CD backups as we've described here doesn't fit your needs, chances are someone has created another program that will work for you.

27.1.1.3. Backing up to hard disks

As hard disks get bigger and cheaper, one problem is that backup media often don't keep up. Now that 500-GB hard disks are available and affordable by normal people, it may have occurred to you that the only thing you can back a disk that big up to is...another disk that big!

Just about all of the techniques you can use for media in general can apply to hard disks, but there are a few special considerations as well.

If you have a disk mounted at /data, and you want to back it up to a second hard disk (of equal or greater size) mounted at /backup, you could do a tar and un-tar pipeline, like this:

cd / ; tar -cvf - /data | (cd /backup ; tar -xf -)

If you have room on your backup disk, you can use the remaining space to store incremental backups using the techniques described elsewhere in this chapter. The nice thing about hard disk backups is that you can create any kind of directory structure that makes sense to you. You could have a disk mounted at /backup, with subdirectories /backup/full and /backup/incremental, or any other scheme you choose.

With a combination of find, cron, tar, and gzip, you could create a fairly small but powerful script that would mount your backup hard disk, tar up the files that have changed since the last time your backup ran, delete the backups older than the last full backup, and unmount the backup disk.

27.1.1.4. To compress or not to compress?

There are good arguments both for and against compression of tar archives when making backups. The overall problem is that neither tar nor the compression tools gzip and bzip2 are particularly fault-tolerant, no matter how convenient they are. Although compression using gzip or bzip2 can greatly reduce the amount of backup media required to store an archive, compressing entire tar files as they are written to CD-R or tape makes the backup prone to complete loss if one block of the archive is corrupted, say, through a media error (not uncommon in the case of CD-Rs and tapes). Most compression algorithms, gzip and bzip2 included, depend on the coherency of data across many bytes in order to achieve compression. If any data within a compressed archive is corrupt, gunzip may not be able to uncompress the file from that point on, making it completely unreadable to tar.

This is much worse than if the tar file were uncompressed on the tape. Although tar doesn't provide much protection against data corruption within an archive, if there is minimal corruption within a tar file, you can usually recover most of the archived files with little trouble, or at least those files up until the corruption occurs. Although far from perfect, it's better than losing your entire backup.

A better solution is to use an archiving tool other than tar to make backups. Several options are available. cpio is an archiving utility that packs files together, similar in fashion to tar. However, because of the simpler storage method used by cpio, it recovers cleanly from data corruption in an archive. (It still doesn't handle errors well on gzipped files.)

The best solution may be to use a tool such as afio. afio supports multivolume backups and is similar in some respects to cpio. However, afio includes compression and is more reliable because each individual file is compressed. This means that if data in an archive is corrupted, the damage can be isolated to individual files, instead of to the entire backup.

These tools should be available with your Linux distribution, as well as from all the Internet-based Linux archives. A number of other backup utilities, with varying degrees of popularity and usability, have been developed or ported for Linux. If you're serious about backups, you should look into them.[*] Among these programs are the freely available taper, tob, and Amanda, as well as commercial programs such as ARKEIA (free for use with up to two computers), BRU, and Arcserve. Lots of free backup tools can also be found at http://www.tucows.com/downloads/Linux/IS-IT/FileManagement/BackupRestore.

[*] Of course, this section was written after the author made the first backup of his Linux system in nearly four years of use!

27.1.2. Incremental Backups

Incremental backups, as described earlier in this chapter, are a good way to keep your system backups up-to-date. For example, you can make nightly backups of only those files that changed in the last 24 hours, weekly backups of all files that changed in the last week, and monthly backups of the entire system.

You can create incremental backups using the tools mentioned previously: tar, gzip, cpio, and so on. The first step in creating an incremental backup is to produce a list of files that have changed since a certain amount of time ago. You can do this easily with the find command.[*] If you use a special backup program, you will most likely not have to do this, but can set some option somewhere that you want to do an incremental backup.

[*] If you're not familiar with find, become so soon. find is a great way to locate files across many directories that have certain filenames, permissions, or modification times. find can even execute a program for each file that it locates. In short, find is your friend, and all good system administrators know how to use it well.

For example, to produce a list of all files that were modified in the last 24 hours, we can use the command:

find / -mtime -1 \! -type d -print > /tmp/filelist.daily

The first argument to find is the directory to start fromhere, /, the root directory. The -mtime -1 option tells find to locate all files that changed in the last 24 hours.

The \! -type d bit is complicated (and optional), but it cuts some unnecessary stuff from your output. It tells find to exclude directories from the resulting file list. The ! is a negation operator (meaning here, "exclude files of type d"), but put a backslash in front of it because otherwise the shell interprets it as a special character.

The -print option causes all filenames matching the search to be printed to standard output. We redirect standard output to a file for later use. Likewise, to locate all files that changed in the last week, use:

find / -mtime -7 -print > /tmp/filelist.weekly

Note that if you use find in this way, it traverses all mounted filesystems. If you have a CD-ROM mounted, for example, find attempts to locate all files on the CD-ROM as well (which you probably do not wish to backup). The -xdev option can be used to limit find's traversal to the local filesystem. Another approach would be to use find multiple times with a first argument other than /. See the manual page for find(1) for details.

Now you have produced a list of files to back up. Previously, when using tar, we specified the files to archive on the command line. However, this list of files may be too long for a single command line (which is usually limited to around 2048 characters), and the list itself is contained within a file.

You can use the -T option with tar to specify a file containing a list of files for tar to back up. In order to use this option, you have to use an alternate syntax to tar in which all options are specified explicitly with dashes. For example, to back up the files listed in /tmp/filelist.daily to the device /dev/qft0, use the following command:

tar -cv -T /tmp/filelist.daily -f /dev/qft0

You can now write a short shell script that automatically produces the list of files and backs them up using tar. You can use cron to execute the script nightly at a certain time; all you have to do is make sure there's a tape in the drive. You can write similar scripts for your weekly and monthly backups. cron is covered in Chapter 10.




Part I: Enjoying and Being Productive on Linux
Part II: System Administration