This is a hardware book, so we don't spend much time on software. But, in our experience, many people who buy a tape drive have no idea how to use it effectively. We won't try to explain how to use your backup software because the specifics vary and nearly any software bundled with a tape drive is sufficient for the task, but we will devote some space to explaining how to get the most from your tape drive and backup software.
If you have a tape drive large enough to back up your entire hard disk and the time necessary to use only complete backups, the status of any particular file doesn't matter. Every file gets backed up every time, whether it was created that day or has been sitting unchanged for a year. But if you need to use some combination of complete and partial backups, the status of each file becomes critical. If a file is unchanged since the last complete backup, you want to ignore it when doing partial backups. If the file was created or changed since the last complete backup, it needs to be copied to the partial backup tape.
Windows maintains a file attribute for each file called the archive bit. When a file is created or changed, Windows toggles the archive bit on, indicating that that file is a candidate for backup. Backup software can manipulate the archive bit, either turning it off after it backs up the file, or leaving it on so that file will again be backed up the next time you do a partial backup.
The archive bit exists to provide a certain indication that a file requires archiving. Early Windows versions stored one timestamp for a file. In theory, that timestamp was changed when the file was created or modified. In practice, it was possible for an application to modify a file without changing the timestamp, which meant that a backup application that depended on the timestamp could fail to back up a file that had changed contents, which meant that the archive bit was the only reliable indicator of whether a file required archiving.
Linux stores more timestamp information about each file, including the date it was created, last accessed, and last modified, as does the Windows NT/2000/XP NTFS filesystem. In theory, that means such systems can be backed up reliably based on timestamp information. In practice, we still prefer using the archive bit as a flag because that bit always indicates the archive status of a file. If a backup is done based on timestamp, no indication remains with the file itself as to when (or whether) it was last backed up.
Backup software can use or ignore the archive bit in determining which files to back up, and can either turn the archive bit off or leave it unchanged when the backup is complete. How the archive bit is used and manipulated determines what type of backup is done, as follows:
A full backup, which Microsoft calls a normal backup, backs up every selected file, regardless of the status of the archive bit. When the backup completes, the backup software turns off the archive bit for every file that was backed up. Note that "full" is a misnomer because a full backup backs up only the files you have selected, which may be as little as one directory or even a single file, so in that sense Microsoft's terminology is actually more accurate. Given the choice, full backup is the method to use because all files are on one tape, which makes it much easier to retrieve files from tape when necessary. Relative to partial backups, full backups also increase redundancy because all files are on all tapes. That means that if one tape fails, you may still be able to retrieve a given file from another tape.
A differential backup is a partial backup that copies a selected file to tape only if the archive bit for that file is turned on, indicating that it has changed since the last full backup. A differential backup leaves the archive bits unchanged on the files it copies. Accordingly, any differential backup set contains all files that have changed since the last full backup. A differential backup set run soon after a full backup will contain relatively few files. One run soon before the next full backup is due will contain many files, including those contained on all previous differential backup sets since the last full backup. When you use differential backup, a complete backup set comprises only two tapes or tape sets: the tape that contains the last full backup and the tape that contains the most recent differential backup.
An incremental backup is another form of partial backup. Like differential backups, Incremental Backups copy a selected file to tape only if the archive bit for that file is turned on. Unlike the differential backup, however, the incremental backup clears the archive bits for the files it backs up. An incremental backup set therefore contains only files that have changed since the last full backup or the last incremental backup. If you run an incremental backup daily, files changed on Monday are on the Monday tape, files changed on Tuesday are on the Tuesday tape, and so forth. When you use an incremental backup scheme, a complete backup set comprises the tape that contains the last full backup and all of the tapes that contain every incremental backup done since the last normal backup. The only advantages of incremental backups are that they minimize backup time and keep multiple versions of files that change frequently. The disadvantages are that backed-up files are scattered across multiple tapes, making it difficult to locate any particular file you need to restore, and that there is no redundancy. That is, each file is stored only on one tape.
A full copy backup (which Microsoft calls a copy backup) is identical to a full backup except for the last step. The full backup finishes by turning off the archive bit on all files that have been backed up. The full copy backup instead leaves the archive bits unchanged. The full copy backup is useful only if you are using a combination of full backups and incremental or differential partial backups. The full copy backup allows you to make a duplicate "full" backup?e.g., for storage offsite, without altering the state of the hard drive you are backing up, which would destroy the integrity of the partial backup rotation.
A tape rotation method is a procedure that specifies when each particular tape will be used, and what will be backed up to it. For example, for a simple tape rotation scheme, you might label five tapes Monday through Friday and then do a complete full backup to the corresponding tape each day. Some tape rotation methods are simple and use only a few tapes. Others are immensely complex and use many tapes. Choosing the most appropriate tape rotation method is a critical step in developing and implementing your backup plan.
On one extreme, you could use the same tape everyday, but that has obvious dangers, including the risk of that one tape being lost or damaged, the inability to retrieve a file that was deleted or corrupted more than a day previous, and the inability to keep an offsite copy. On the other extreme, Robert once did some consulting for a law firm that never reuses a backup tape. Every evening they do a complete backup and compare of their "active" volumes to a new tape, which is then stored indefinitely in their vault. They regard the small daily cost of a new backup tape as trivial relative to the benefit of being able to reconstruct their data exactly for any specified day.
Chances are, the best tape rotation method for you falls somewhere between those extremes. Here are some issues to think about when you choose a tape rotation method:
When you need to do a restore, whether of a single file accidentally deleted or of an entire volume whose hard drive crashed, time is often important. A proper tape rotation scheme ensures that the most recent backup data is immediately available to restore.
The most recent version of your backup data may not be good enough. Perhaps a file was accidentally deleted or a database improperly modified some time ago, but that was only recently discovered. The most recent backup may, for various reasons, be missing the file you need. An ideal tape rotation method allows you to retrieve a version of a file from days, weeks, or months previous, before the file had been deleted or improperly modified. Tape sets created with the best and most powerful tape rotation methods allow you to select from multiple versions of the file so that you can retrieve the most recent good version. A good tape rotation method also makes provision for periodically removing a tape from the rotation and archiving it for historical reasons.
Tapes can break or be misplaced. Someone may overwrite the wrong tape. A good tape rotation scheme recognizes these facts, and uses redundancy to minimize the effect of such problems. If the file can't be retrieved from one tape, it should be retrievable from another.
Ideally, you'd like all tapes in the set to be used equally often to distribute wear evenly across the set. The simpler tape rotation methods usually fall down in this regard. For example, the popular Grandfather-Father-Son rotation, described later in this section, requires writing to some tapes in the set once a week, to others once a month, and to still others only once a year. Although equalizing tape wear is a less important consideration for most users than the others described, doing so is desirable in that it minimizes the chance that a tape will break, stretch, or otherwise become unusable because it has been used too frequently.
Many standard tape rotation methods exist. Some are simple and use few tapes, but fail to meet some of the goals described earlier. Others meet each goal, or nearly so, but are difficult to manage and require many tapes. Some methods use only full backups, others use both full and partial backups, and still others may be modified to use either only full backups or a combination of full and partial backups.
Here are the most common backup rotations:
The simplest rotation is to do a complete full backup each day, assuming you have both adequate tape drive capacity and a long enough backup window. Most sites that use this method use 10 tapes, labeled "Monday A" through "Friday A" and "Monday B" through "Friday B." Using this method offers the considerable advantages of simple administration and extreme data redundancy. It's always obvious which tape you should be backing up to. If you start a restore and your most recent backup tape breaks, you simply use the next most recent tape. All tapes receive equal wear, and can be replaced periodically as a set. You can cycle each backup tape offsite as it is replaced by today's backup, leaving your most recent backup available onsite for easy restores, while having an offsite tape that is only one day old. The sole disadvantage of this rotation is that it limits you to retrieving historical data from only two weeks prior, assuming that you use 10 tapes. This problem is easily addressed. Simply add four Quarterly tapes or 12 Monthly tapes to the rotation, and do a duplicate backup to the appropriate archive tape at the end of each quarter or month.
This is probably the most commonly used rotation on PC-class systems. In its simplest form, it requires only three tapes: "Weekly A," "Weekly B," and "Daily." On "odd" Fridays, you do a full backup to Weekly A. On "even" Fridays, you do a full backup to Weekly B. Monday through Thursday, you do a differential backup to the Daily tape. This rotation is simple to manage and requires few tapes, but has the following disadvantages:
Historical data can be retrieved for a period of at most two weeks. If you accidentally delete a file and don't realize it for a couple of weeks, that file is gone for good.
If the Daily tape fails during a restore, your next most recent tape is the last Weekly tape, which means you may lose as much as four days worth of data.
Only one current copy of the normal backup exists, so you must either keep it onsite for easy retrieval or offsite for safety.
Tape wear is very uneven, since the Daily tape is used eight times more often than the Weekly tape.
Simply adding more tapes and making minor changes to the rotation solves most of these problems. For example, add a tape to do a second full backup each Friday, and store that tape offsite. Add a second Daily tape and alternate using them, or simply use a tape for each workday. To extend historical data, add four Quarterly tapes (or 12 Monthly tapes), do a full backup to the appropriate tape on the final day of the corresponding quarter (or month), and then store the tape.
The Grandfather-Father-Son (GFS) tape rotation method is more commonly used on servers than on personal systems, but it's worth considering if your data is very valuable and you think it's worth going to some trouble and expense to secure it. GFS is the easiest to manage of any of the "complex" tape rotations, requires relatively few tapes, and is supported directly by every backup program on the market. A typical GFS rotation tape set requires 21 tapes, as follows:
Daily tapes. Label four tapes Monday through Thursday. Back up each day to the tape for the corresponding day, overwriting each tape once a week.
Weekly tapes. Label five tapes Friday-1 through Friday-5. Back up each Friday to the corresponding weekly tape, using the Friday-5 tape only in months that have five Fridays. Weekly tapes 1 through 4 are overwritten once a month, with Friday-5 being overwritten less frequently.
Monthly tapes. Label 12 tapes January through December. Back up the first (or last) of each month to the corresponding monthly tape. Monthly tapes are overwritten only once per year.
GFS meets most of the goals of an ideal tape rotation method. You can keep recent tape sets onsite, and migrate others offsite. GFS provides weekly granularity for the preceding month and monthly granularity for the preceding year. GFS provides numerous copies of both recent and older data. The disadvantage to GFS is that tape wear is uneven. Daily tapes are written once a week, weekly tapes once a month, and monthly tapes only once a year. Uneven tape wear is a small price to pay for the other advantages of GFS, however. Most GFS rotations use differential backup for daily tapes and full backup for weekly and monthly tapes, but nothing prevents you from using full backup for all tapes.