VLDB Maintenance Issues

The maintenance plan for your VLDB must be carefully considered. As with all databases, maintenance should be planned with the least impact on users. When a database reaches a point in which maintenance tasks can no longer run in their allotted window, adjustments must be made to the scheduling of tasks, perhaps running them weekly or monthly rather than daily, or to the methods used to perform tasks, such as adjusting the backup methodology.

Backing Up and Restoring Databases

The key to SQL Server's support for the VLDB is its ability to back it up. As the length of time to back up a database is proportional to its size, this can be a major issue with large databases. It wouldn't do to run a daily full backup if the backup ran for 25 hours! Fortunately, other options are available. A full backup can be scheduled less frequently on a VLDB by combining it with log, differential, and file or filegroup backups. When considering which combination of backup methods to use, don't forget to consider restore time as well; if quick recovery is imperative, designing your database so you can leverage file or filegroup restores can vastly improve your time to recover. For example, if a database has 80GB of data spread over four disks in 20GB files, and one of the disks fails, you can restore only the file that is on the disk that failed, which should take roughly one quarter of the time that it would take to restore the entire database. In addition to this, SQL Server, when restoring the logs to bring the file up-to-date, will only process transactions from the log, which recorded data on the affected file, thus speeding the log file restores to a similar degree.

Another time saver for VLDB backups is to back up to multiple devices. Using multiple devices can help improve backup time by almost the reciprocal of the number of devices used; simply put, a backup to three devices could be complete in one third the time it takes to a single device. Results will vary, of course, and they are dependent on hardware limitations such as disk, tape, and bus configuration. You will also see diminishing returns as the number of devices are increased. If three devices are three times as fast, it doesn't always follow that ten devices will be proportionately faster. You will have to test the configuration to find the optimal balance between cost and performance.

All the SQL Server backup methods are dynamic, so the database can be accessed during backup; however, there will be a performance hit during the backup operation. To service the needs of high availability databases with no tolerance for backup-associated performance loss, SQL Server 2000 provides support for snapshot backups as well.

Snapshot backups are implemented in conjunction with independent hardware and software vendors to provide almost instantaneous backups. This is done by implementing a third-party Virtual Device Interface (VDI) that supports split mirror or copy on write operations. Through the VDI, Microsoft has allowed third-party vendors to fully leverage SQL Server 2000's native backup utilities. This means that the vendor can leverage advanced hardware and software backup solutions while still retaining full SQL Server recovery functionality. For example, a vendor could implement a three-way mirror data storage solution. In this solution, as the name implies, three mirror images of the data are maintained. When the backup application performs a backup, it issues commands to SQL Server through the VDI to prepare the database(s) or datafile(s)for backup and issues the backup database with snapshot command. At this point the database or files being backed up are momentarily frozen, so a consistent image of the data can be preserved. What is actually being backed up is not the data, but metadata containing information about the current state of the data. Once this is recorded, the backup application can "split off" one of the mirrored copies containing the actual data files, and the database or files are unfrozen. As the only data that is being backed up is a small amount of metadata, the backup time is reduced to seconds or minutes, rather than hours. Once complete the data is still protected by a two-way mirror, and the third copy that was split off is available to be backed up in a conventional manner such as to tape, used as a snapshot to initiate replication or create a warm standby, or brought online as a copy of the database, perhaps for development purposes. Once the process is complete the third disk can be resynced with the mirror set, and the process is ready to start over.

The specification for the VDI can be downloaded from Microsoft at http://www.microsoft.com/SQL/downloads/virtualbackup.asp; however, this spec is really intended for independent software vendors who are developing SQL Server backup solutions. Microsoft recommends you contact your backup or storage solution provider if you are interested in implementing snapshot backups.

Performing Consistency Checks on Databases

Consistency checks on data are performed with the Database Consistency Checker, or DBCC. Running DBCCs is both disk and memory intensive, as each data page to be checked must be read into memory if it is not already there. Therefore, adding more memory will speed the DBCC, as data is more likely to be found in cache. This solution is not always feasible in VLDBs in which the ratio of data to memory is high. In this case, running DBCCs causes spooling to tempdb, which resides on disk and causes an I/O bottleneck, slowing the DBCC process. By placing tempdb on a separate disk system, preferably RAID 0, you can optimize the tempdb throughput and speed up your DBCC.

The Enterprise and Developer editions of SQL Server can take advantage of multiple processors, and will perform parallel execution of DBCC statements. The DBCC CHECKDB statement performs a serial scan of the database, but parallel scans of multiple objects as it proceeds to the limit of the system "max degree of parallelism" option. This speeds up the operation substantially over previous versions, but in the case of extremely large tables, it may still be advantageous to schedule DBCC CHECKTABLE commands against individual tables or groups of tables on a rotating basis, if DBCC CHECKDB doesn't complete in your allotted time frame.

Another option is to run DBCC CHECKDB or DBCC CHECKTABLE with the NOINDEX option. This speeds the operation of the DBCC statement as it doesn't check the non-clustered indexes on user tables (system table indexes are always checked). Index corruption is not as critical an issue as data corruption, as no data is lost, and the index can always be dropped and re-created if necessary.

Another method would be to not perform DBCCs on your production database. If you are running a standby server, the DBCCs can be run against the standby database. The theory here is that if the consistency checks run okay on the standby database, the production database should be okay as it is the source of the standby. If the standby database reports corruption, then DBCCs or other tests for corruption can be run against the production database.

Updating Statistics

Statistics are used by SQL Server to determine the optimum way to access data. By default, SQL Server maintains statistics automatically, using samples of the data to estimate statistics. The frequency at which statistics are updated is based on the amount of data in the column or index and the amount of changes to the data. SQL Server optimizes the statistics updates to run with the least amount of overhead possible, so unless you find your statistics aren't correctly populated by the automatic update or you can't afford the overhead it incurs, it is not recommended that you disable the automatic updates.

If you opt to manually update statistics for your VLDB, you must plan how to optimize the operation. As updating statistics can cause excessive overhead, you should determine whether statistics actually need to be updated, rather than arbitrarily scheduling updates. For example, in a large data warehouse populated with monthly data loads, the statistics updates should be scheduled after the completion of the data load. The DBCC SHOW_STATISTICS, as well as Query Analyzer and SQL Profiler, are all useful in determining statistics usage. When updating statistics, if your data is evenly distributed, you might also find that you can get away with running UPDATE STATISTICS and specifying a percentage to sample rather than all the rows.

Rebuilding Indexes

Rebuilding indexes on large tables can be extremely time-consuming and I/O intensive. Consideration must be taken for all indexes on the table, as, in the case of a clustered index, dropping and re-creating the index would cause all non-clustered indexes to be rebuilt as well. Fortunately, SQL Server provides options to avoid such a situation. If you are rebuilding an index to reduce fragmentation, use DBCC INDEXDEFRAG instead, as this will compact the pages of the index, removing any empty pages it creates without having to drop and rebuild the full index. DBCC INDEXDEFRAG is fully logged, doesn't hold locks for extended periods, and doesn't block running queries or updates.

The CREATE INDEX statement now has a DROP EXISTING clause that allows the rebuilding of existing indexes without dropping and re-creating them. This can save time on VLDBs, as it accelerates the rebuild by eliminating the sorting process. If you are rebuilding a clustered index, the non-clustered indexes are not dropped and re-created provided the index is rebuilt on the original column(s). Normally, because non-clustered indexes contain the clustered keys, the non-clustered indexes are rebuilt when the clustered index is dropped via DROP INDEX, and then rebuilt again when the clustered index is re-created.

The DBCC DBREINDEX is still available as well, but it doesn't provide as much functionality as DROP EXISTING; its only advantage is the ability to rebuild multiple indexes on a single table simultaneously.

Another option to consider when creating or rebuilding indexes is to specify WITH SORT_IN_TEMPDB in the statement. By default, when indexes are created or rebuilt, they are sorted in the destination filegroup for the index. If tempdb is on a separate file system (a dedicated RAID 0 array for example), SORT_IN_TEMPDB forces the index build to occur in tempdb, and when it is complete, it is then written to its destination file group. This is an often overlooked option that can provide huge reductions in index build times.

Purging and Archiving Data

One of the most important aspects of managing a VLDB is knowing when to let go of your pack-rat instincts and purge unneeded data. Examine your legal and business requirements to store data, and purge data that falls outside these requirements. For example, I once managed a database for a utility company. The data was monitored for 30-and 90-day service commitments, and we were required to keep records for a year. By daily purging completed records into storage as history, the number of active records was reduced, and query time was improved. History records older than 30 days required storage of only a subset of data, so these reduced records were transferred to 90-day storage. This sped up queries against the 30-day history and reduced storage requirements drastically as only about 10% of the original data was still required at this point. Once a week, the data that had aged past 90 days was backed up to tape and deleted from the database. The tape backups were kept for one year. Performed religiously on a daily, weekly, monthly, or yearly basis, these types of purge operations can keep your VLDB from growing into a problematic monster.