Backup and Recovery

This is probably the least understood and often most heatedly debated DBA topic in data warehousing. Without intentionally bashing other books regarding Oracle data warehousing, let me say, that in general, the advice is short-sighted, covering only Oracle methods for backup and recovery. I genuinely mean no disrespect to these other authors, but I've never used and never advise DBAs to use Oracle tools for backup and recovery when dealing with databases this big. There are better methods out there. To simply ask whether it's hot or cold backups and then use Oracle's RMAN to do it is a disservice to your customers. You may have other options that are far superior?if you just look.

Ask yourself what your backup and recovery needs really are. Remember that this is a data warehouse, which is really nothing more than a glorified reporting system. Is point-in-time recovery really a necessity? What time limits do you have to perform backups? What time limits do you have to perform recoveries? And finally, what budget do you have to accomplish these tasks? These are the real and only questions of importance.

Far too often, DBAs think only in Oracle terms. So the questions become more Oracle-centric. Will the database be run in ARCHIVELOG mode? How many and how big must the online redo logs be? Will the database be backed up hot or cold? And will the backups be complete or incremental? Finally, how many tapes will all this take? These are the same questions that are asked in the traditional OLTP database world. But that does not make them the right questions.

If you'll forgive an absurd analogy, it's like planning a family vacation by saying that we'll drive the family sedan from New York to Los Angeles, take the scenic route, drive no more than 500 miles per day, and stay at Holiday Inns along the route. That may be a fine plan, but the first question should be: Do we have sufficient time to drive there and back? The second question should be: Can we fly for about the same money? If so, then the family sedan is neither necessary nor desirable. The key point is that too many DBAs blindly choose the family sedan (i.e., Oracle backup and recovery) when clearly better alternatives exist. You must be very creative and think outside the box.

So what does this mean? If you're using a journalized file system, such as one from Veritas, then you may be able to do hot versus cold and complete versus incremental backup and recovery at the file system level. In other words, you can use one technique for both your database and non-database files. This offers simplicity due to standardization. And in some cases, it may be superior technically as well. For example, Oracle 8.0's RMAN is not very efficient with regard to time (and I'm not convinced that 8i or 9i is any better). Yes, it saves tape space, but it scans entire data files for changes, which takes a long time. A journalized file system maintains log files of the changes, so it saves both space and time. I've used this technique without hitch. It just takes DBAs a while to digest and accept that they can do hot and incremental backups outside the database.

Another excellent option exists if you have the budget: hardware backups. How would you like to perform an online full backup of a multi-terabyte data warehouse in less than a minute? With today's RAID disk arrays, that option sometimes exists. The disk array can split a mirror off for doing the backup so the database remains open. It only takes a moment to separate the mirror. Then after the backup, the mirror is reconnected and also resynchronized for the changes that occurred during the backup. Of course, you may want your RAID 1 or 0+1 to contain two mirrors so that you always maintain data redundancy, even during a backup. Yes, this costs more money for more disks. But, disk space is very cheap and your customer may approve this. For example, I've used EMC's Time Finder for just this purpose. Moreover, I've used it for 24x7 data warehouses to load the data without interrupting production. In both cases, the hardware/software solution was so simple and straightforward that I could concentrate on the business requirements at hand rather than the Oracle implementation. So, the real-world cost was actually much cheaper than architecting something and then supporting it.

Of course, there may still be those occasions where you cannot use either journalized file systems or hardware to solve your backup and recovery issues. Then, RMAN may be your obvious and only solution. Before devising your data warehouse backup and recovery strategy, consider these facts: First, a data warehouse loads massive amounts of data at regular intervals, say nightly. During other times and the majority of the total time, it's essentially a read-only reporting database. Second, many data loading operations and aggregations will be performed in parallel and using direct mode loads (i.e., no logging). Moreover, most index rebuilds will also be done using the NOLOGGING option. Thus, running the database in ARCHIVELOG mode may actually accomplish much less than you expect in terms of actual recoverability. And third, you can keep and reapply batch loading cycle data for re-execution as simply as you can keep redo log files.

Please don't question my intentions here. I'm not pushing for any preferred solution. I'm just making sure you fully consider the data warehousing environment before making your backup and recovery design selections. If you end up running a data warehouse in ARCHIVELOG mode, please make sure to size your log files appropriately, with lots of disk space available for short-term secondary storage. It's not uncommon for a single nightly batch cycle to generate GB of redo logs. If you don't plan for this, then you can add yet another reason for being getting paged at night?redo log devices filling up?and DBAs already have far too many reasons for being paged. Why add another?