Even with the plethora of hardware choices introduced in the previous sections, picking your data warehousing hardware is actually quite easy. Typical needs include:
Scalable, multi-processor server platform
Sufficient memory for parallel processing
Highly scalable disk storage system
Sufficient disk cache for > 50% hit ratio
Striping for better I/O performance
Data redundancy for data protection
Support for both fast reads and writes
Combined with our proposed software architecture requirements for:
Single server
Single instance
Single database
Oracle 8i or 9i
RISC-based UNIX
Parallel load programs
The recommended minimum data warehousing hardware platform is:
SMP or NUMA server
8?16 64-bit RISC CPUs
True 64-bit UNIX OS
4?8 GB RAM
Mid- to large-sized disk array
2?4 GB disk cache
RAID 0+1 via hardware
18GB or 36GB disk drives
Again, let me stress that this is not based on any anti-NT sentiments or UNIX bigotry. From my experience loading and querying terabytes of data, I've found data warehouses generally consume CPU and I/O bandwidth far beyond the capacity of Intel-based SMP servers and Windows NT/2000, even when clustered. The only Intel-based solutions that seriously qualify for a data warehouse implementation are IBM's (a.k.a. Sequent's) NUMA-Q machine with 64 Pentium III Xeons or IA-64s running DYNIX/ptx or a Data General Aviion AV2500 with 64 Pentium III Xeons running DG/UX. Even though both these machines could run Windows NT/2000, they would not be able to scale to the same CPU count as under their respective UNIX OS.
Examples of acceptable SMP and NUMA servers from first-tier vendors are shown in Table 3-4.
Vendor | Server Family | OS | Cluster Software | CPUs |
---|---|---|---|---|
Compaq | AlphaServer | Tru64 UNIX | TruCluster Server | Alpha |
Data General | Aviion | DG/UX | DG/UX Clusters | Xeon / IA-64 |
HP | 9000 V-Class | HP/UX | MC/ServiceGuard | PA-RISC / IA-64 |
IBM | RS/6000 | AIX | HACMP | PowerPC RS64 III |
IBM (Sequent) | NUMA-Q | DYNIX/ptx | ptx/CLUSTERS | Xeon / IA-64 |
Silicon Graphics | SGI 2000 | IRIX | IRIS FailSafe | MIPS RISC R12000 |
Sun | E Line | Solaris | Sun Clusters | UltraSPARC |
Examples of acceptable disk arrays from first-tier vendors are shown in Table 3-5.
Vendor | Array Family | Max Cache GB | Max Drive GB | Max Terabytes |
---|---|---|---|---|
Compaq | Modular Array | 3 | 36 | 2.6 |
EMC | Symmetrix | 32 | 50 | 19.1 |
HP | SureStore | 16 | 47 | 11.0 |
IBM | Storage Server | 16 | 36 | 11.2 |
Sun | StorEdge T3 | 8.5 | 36 | 88.0 |
The stripe size and stripe set size are both a bit too subjective for any universal recommendations. In general, choose a stripe size 4?8 times your Oracle block size and a stripe set size of 4?8 disks. So, for a 16K Oracle block size, a stripe size of 128K and a stripe set size of 8 should work well for 1MB I/O requests. Of course, the following Oracle initialization parameters would have to be set to 64 to guarantee optimal striping performance:
DB_FILE_MULTIBLOCK_READ_COUNT
SORT_MULTIBLOCK_READ_COUNT
HASH_MULTIBLOCK_IO_COUNT
DB_FILE_DIRECT_IO_COUNT