Star Schema Index Design

This is the section where most DBAs' eyes roll back and they begin to question these techniques with heated fervor. This section will discuss the bitmap index design required for getting star transformation explain plans. So without further adieu, let me give the recommendations and then try to explain exactly why it must be done this way:

  • Create a separate bitmap index on each fact table's dimension table's foreign key columns

  • Create a separate bitmap index on each non-key column in the dimension tables

That does not sound too bad, until you think a little more about it. What I am saying is that you fully index your dimension table columns using bitmap indexes, and that you also create bitmap indexes on your fact table columns that refer back to your dimension tables. Let's return to our star schema data model from Chapter 4 and demonstrate what this means. Look at the star schema data model shown in Figure 5-6. I've placed an arrow next to each column that should get its own bitmap index.

Figure 5-6. Example Recommended Indexing for Star Schema Design

graphics/05fig06.jpg

To summarize Figure 5-6, there are:

  • 3 dimension tables

  • 3 fact tables

  • 77 total columns

  • 11 b-tree indexes (for pks and aks)

  • 60 bitmap indexes

That's a total of 71 indexes out of 77 columns! That's one heck of a lot of indexes.

So which of these indexes, if any, are superfluous? Well if you knew that, your ETL data loaders would never try to load duplicate records, and both primary and unique indexes would not be necessary. They are only there to keep the data clean. You never want to see any of these b-tree indexes in an explain plan or queries will run slowly. For a successful star transformation, you only want to see bitmap indexes in the explain plan, period.

I usually get three index-specific questions:

Q: What about fact table bitmap indexes and low cardinality?

This is my biggest data warehousing pet peeve question to date. Again, it is a case of people reading blanket Oracle documentation and not seeing what's being said. According to Oracle: "The advantages of using bitmap indexes are greatest for low cardinality columns in which the number of distinct values is small compared with the number of rows in the table." Most people seem to ignore the phrase "compared with the number of rows in the table." If the POS_DAY fact table has a billion rows and creates a bitmap index on PRODUCT_ID, that is low cardinality. Yes, PRODUCT has 200,000 rows. But that's small in comparison to a billion.

Q: Why index all the dimension columns?

Remember, dimension tables provide your ad-hoc end-users their WHERE clause column restriction selections. They are free to pick anything from that domain. You could try and index just those you thought were most likely to be selected, but you'd find that within a month or so, you'd have them all indexed anyhow. Why? Remember that I said you could judge your warehouse's success with one simple question: Do your users run more reports than they initially expected? If your data warehouse runs reports quickly, business users will drill deeper into "what-if" scenarios and do more than they ever planned. And if they do more, then eventually they'll utilize most if not all the dimension columns as WHERE clause criteria. So why not just index them all from Day One and thereby avoid the whole issue altogether?

Q: Why not use b-tree indexes for some dimension columns based on their data type?

Look back at Figure 5-6. The LOCATION dimension has columns MARKET_NAME and DIVISION_NAME, which are both character, rather lengthy in size, and thus can have lots of unique values. Why not create b-tree indexes on these instead of bitmap indexes? The answer is simple: Star transformation uses bitmap indexes. If you make these b-trees, then the optimizer will add a step at runtime (for each query) to convert the b-tree index to a bitmap index so that it can be used by the star transformation. So why do that runtime conversion for every query? Why not just make it a bitmap and forgo the runtime conversion?

Note that I've included a handy script (below) for creating all these bitmap indexes on your dimension tables. Run this in SQL Plus using @FILE_NAME TABLE_NAME, where FILE_NAME is the name of the script file and TABLE_NAME is the name of the dimension table that you want to fully bitmap index. It skips over columns that are parts of either primary or unique keys and reverts to b-tree indexes for numeric columns longer than NUM_SIZE or character columns greater than CHAR_SIZE. You can, of course, change the DEFINE variables at the top of the script to suit your specific tastes. Note too that this script actually writes another script (idx_dim.tmp) to accomplish the task.

set define '&'

define num_size=20
define char_size=50

set echo off
set tab off
set heading off
set verify off
set feedback off
set pagesize 0
set linesize 256
set term off

spool idx_dim.tmp

select 'create '||
       decode(tc.data_type,
              'DATE','      ',
              'NUMBER',  decode(data_scale,
                               0,decode(sign(&num_size-data_precision),
                                         -1,'      ',
                                         'bitmap'
             ),
                                '      '
                ),
              'CHAR',    decode(sign(&char_size-data_length),
                                -1,'      ',
                                'bitmap'
                               ),
              'VARCHAR2',decode(sign(&char_size-data_length),
                                -1,'      ',
                                'bitmap'
                               ),
              '      '
             )||
       ' index '||tc.table_name||
       decode(tc.data_type,
              'DATE','_N',
              'NUMBER',  decode(data_scale,
                               0,decode(sign(&num_size-data_precision),
                                         -1,'_N',
                                         '_B'
             ),
                                '_N'
                ),
              'CHAR',    decode(sign(&char_size-data_length),
                                -1,'_N',
                                '_B'
                               ),
              'VARCHAR2',decode(sign(&char_size-data_length),
                                -1,'_N',
                                '_B'
                               ),
              '_N'
             )||
       decode(length(tc.column_id),1,'0',null)||to_char(tc.column_id)||
       ' on '||tc.table_name||'('||tc.column_name||')
       pctfree 1 nologging;'
from user_tab_columns tc
where tc.table_name in upper('&1')
  and not exists (select 1
                  from user_cons_columns cc,
                       user_constraints co
                  where co.table_name = tc.table_name
                    and co.constraint_name = cc.constraint_name
                    and co.owner = cc.owner
                    and co.table_name = cc.table_name
                    and co.constraint_type in ('P','U')
                    and co.status = 'ENABLED'
                    and cc.column_name = tc.column_name
                  )
  and not exists (select 1
                  from user_ind_columns ic,
                       user_indexes ix
                  where ix.table_name = tc.table_name
                    and ix.index_name = ic.index_name
                    and ix.table_name = ic.table_name
                    and ix.status = 'VALID'
                    and ic.column_name = tc.column_name
                  )
order by tc.table_name, tc.column_id;

spool off

set term on
set heading on
set feedback on

@idx_dim.tmp

Finally, there is a little known Oracle table parameter to control the bitmap index creation algorithm. I say "little known" because you do not specify it on the create index command, but instead on the alter table command (prior to creating any bitmap indexes on that table) to control this behavior. The syntax is:

ALTER TABLE table_name MINIMIZE_RECORDS_PER_BLOCK

This tells Oracle to optimize the mapping of bitmaps to ROWIDs when creating any bitmap index on the table. You only want to do this when you know that the bitmap indexes will be fairly static, meaning not updated. Of course, in a data warehouse, this makes sense. And, setting it for your fact tables (and thus all their bitmap indexes) can save a fair amount of space.