In many companies, valuable data is stored on Solaris server systems, in user files and database tables. The variety of information stored is endless—personnel files, supplier invoices, receipts, and all kinds of intellectual property. In addition, many organizations provide some kind of service that relies on server uptime and information availability to generate income or maintain prestige. For example, if a major business- to-consumer web site or business-to-business hub experiences any downtime, every minute that the system is unavailable costs money in lost sales, frustrated consumers, and reduced customer confidence. Alternatively, a government site like the Government Accounting Office (http://www.gao.gov/) provides valuable advice to government, business, and consumers, and is expected to be available continuously.
The reputation of online service providers suffers greatly if servers go down.
On a smaller scale, but just as significant, is the departmental server, which might provide file serving, authentication services, and print access for several hundred PC systems or Sun Rays. If the server hard disk crashes, the affected users who can’t read their mail or retrieve their files are going to be very angry at 9:00 A.M. if system data cannot be restored in a timely fashion. In this section, we will examine the background and rationale for providing a reliable backup and restore service, which will in turn ensure a high level of service provision, even in the event of hardware failure.
The first requirement of a backup service is the ability to rapidly restore a dysfunctional system to a functional state. The relationship between time of restoration and user satisfaction is inverse, as shown in Figure 14-1—the longer a restore takes, the faster users will become angry, while the rapid restoration of service will give users confidence in the service they are using. For this reason, many sites will take incremental backups of their complete file systems each night, but may take a weekly “full dump” snapshot that can be used to rapidly rebuild an entire system from a single tape or disk.
The second requirement for a backup service is data integrity: it is not sufficient just to restore some data and hope that it’s close enough to the original. It is essential that all restored data can actually be used by applications, as if no break in service had occurred. This is particularly important for database applications, which may have several different kinds of files associated with them. Table indices, data files, and rollback segments must all be synchronized if the database is to operate correctly, and user data must be consistent with the internal structure and table ownership rights. If files are simply backed up onto disk while the database is open, these files can be restored, but the database system may not be able to use the files. It is essential to understand the restoration and data integrity requirements for all key applications on your system, and identify any risks to service provision associated with data corruption.
A comprehensive backup and restore plan should include provision for regular cold and warm dumps of databases to a file system that is regularly backed up.
A third requirement for a backup and restore service is flexibility: data should be recorded and compressed on media that can potentially be read on a different machine, using a different operating system. In addition, using alternative media concurrently for concurrent backups is also useful for ensuring availability in case of hardware failure of a backup device. For example, you may use a DDS-3 DAT tape drive as your main backup device for nightly incremental backups, but you may also decide to burn a weekly CD-R containing a full dump of the database. If your server was affected by a power surge, and the DAT drive was damaged, and a replacement would take one week to arrive, then the CD-R dump can be used as a fallback even though it may not be completely up-to-date.
Typical backup and restore strategies employ two related methods for recording data to any medium: incremental and full dumps. A full dump involves taking a copy of an entire file system or set of file systems and copying it to a backup medium. Historically, large file systems have taken a long time to back up because of slow tape speeds and poor I/O performance, leading to the development of the incremental method. An incremental dump is an iterative method that involves taking a baseline dump on a regular basis (usually once every week), and then taking a further dump every day, of files that have changed since the previous full dump. Although this approach can require the maintenance of complex lists of files and file sizes, it reduces the overall time to back up a file system because on most file systems only a small proportion of the total number of files changes from week to week. This reduces the overall load on the backup server, and improves tape performance by minimizing friction on drive heads. However, using incremental backups can increase the time to restore a system, as up to seven backup tapes must be processed in order to restore data files fully. Therefore, a balance must be struck between convenience and the requirement for a speedy restore in the event of an emergency.
Many sites use a combination of incremental and full daily dumps on multiple media to ensure that full restores can be performed rapidly, and to ensure redundant recording of key data.
After deciding on an incremental or full dump backup strategy, it is important to then plan how backups can be integrated into an existing network. There are four possible configurations that can be considered. The simplest approach is to attach a single backup device to each server, so that it acts as its own backup host. A possible configuration is shown in Figure 14-2.
This approach is appealing because it allows data to be backed up and restored using the same device, without any requirement for network connectivity. However, it does not provide for redundancy through the use of multiple backup devices. This can be rectified by including multiple backup devices for a single host. This configuration is shown in Figure 14-3.
The cost of maintaining single or multiple backup devices for each server in an organization can be very expensive. In order to reduce cost, many organizations have moved to centralize the management and storage of data for entire departments or sites on a single server. This approach is detailed in Figure 14-4. Multiple client machines can have their local hard drives backed up to a central Solaris server, whether or not those clients are PCs running windows or other Solaris servers. The central backup server can also be attached to multiple backup devices, providing different levels of redundancy for more or less significant data. For example, data from user PCs may not require double or triple redundancy, which financial records might well deserve.
There is also an increasing trend towards developing “storage area networks” (SANs), where backup management and data storage are distributed across multiple backup hosts and multiple devices. Thus, a client’s data could potentially be stored on many different backup servers, and management of that data could be performed from a remote manager running on the client. This configuration is shown in Figure 14-5. For example, there is a Veritas client for Windows called Backup Exec, which can connect to many different Solaris servers through an SMB service, backing up data to multiple mediums. Other server-side packages, such as Legato Networker, offer distributed management of all backup services. New to the game is Sun’s own Java-based Jiro technology, which implements the proposed Federated Management Architecture (FMA) standard. FMA is one proposal for implementing distributed storage across networks in a standard way, and is receiving support from major hardware manufacturers like Hitachi, Quantum, Veritas, and Fujitsu for future integration with their products.
More information on Jiro and FMA can be found at http://www.jiro.com/.
You should be able to identify the different backup and restore strategies, and their strengths and weaknesses.