2.4 Uncommon Database Objects

Simple tables and B-tree indexes suffice for almost all database needs. However, you should at least have a passing awareness of less common database object types, if only to argue against futile attempts to solve problems with wrong and exotic tools. This section describes the more popular special object types.

2.4.1 Index-Organized Tables

Index-organized tables are just indexes that don't point to tables. They are a nifty feature in Oracle, but you can approximate them in any database. Occasionally, a database will find all the columns it needs for a query in an index and will use that index without even touching the corresponding table. If you created an index that happened to have all the columns of a table, you might like to dispose of the table altogether. Index-organized tables handle just this scenario, saving space and the cost of maintaining a separate table. Since index-organized tables have no table to point to, they also avoid the need to include rowids, packing in more rows per block than ordinary indexes on the same columns. This better compactness makes index-organized tables easier to cache than ordinary indexes on the same columns. When you have an index just past the point at which it acquires an extra index level, replacing the index-table combination with a more compact, index-organized table will eliminate that extra level.

Consider using index-oriented organized tables under the following circumstances:

  • Rows are not much longer than their index key. Tables generally store data more compactly and efficiently than an index with the same data. If rows are long, compared to their key, a much more compact key index read followed by reads of a plain table will work better, or at least as well, and more flexibly.

  • You almost always reach rows through a single index, probably through all or part of the primary key index. You can create ordinary, secondary indexes on index-organized tables, but when you use those secondary indexes, you defeat the purpose of the index-organized table.

  • You sometimes read rows in large ranges based on the sort order of an index. Large range reads within an index are particularly efficient compared to reading the same rows scattered randomly throughout an ordinary table. This factor in favor of index-organized tables tends to be in conflict with the previous factor, since access through a primary key is usually unique access, not access over a large range. With multipart primary keys, though, you sometimes read row ranges on partial keys, so you will occasionally see both of these factors in favor of index-organized tables.

  • You add, delete, and modify rows infrequently. Ordinary tables handle frequent data changes better than index-organized tables. In Online Transaction Processing (OLTP) applications, large tables tend to have frequent data changes; otherwise, they would not grow large. This tends to argue against large, index-organized tables in the OLTP world.

If you like the idea of an index-organized table but you are not using Oracle, you can get almost the same read-time benefits by building ordinary indexes with all the columns you need in your common queries. Even on Oracle, you can follow this strategy when you prefer to leave large, rarely needed columns out of an index for greater compactness.

The biggest drawback of adding columns to ordinary indexes comes when adding columns to already-unique indexes. You can specify a leading subset of the columns in an index-organized table as the unique key, but ordinary unique indexes enforce uniqueness only over the whole combination of columns. Unique indexes usually exist both to speed performance and to enforce uniqueness over the key. However, if you add non-key columns to an ordinary unique key index, you defeat correct enforcement of key uniqueness. You can solve this problem with two indexes: a narrow one just to enforce key uniqueness and a wider one to provide index-only access to table columns. However, databases tend not to choose a wider index when a narrower one already has all the columns needed for a unique read, so you might need to put forth extra effort to force the use of the wider index.

2.4.2 Single-Table Clusters

As a feature, single-table clusters have been around longer than their closely related index-organized tables. A single-table cluster physically arranges table rows in the order of some key value. When interest in rows correlates well with that sort value, such an arrangement improves the hit ratio by keeping hot rows close together. The problem, a showstopper usually, is that it is hard to keep a table organized in sorted order unless rows arrive in sorted order. (And if they arrive in sorted order, you do not need to cluster; natural table ordering will work fine!) If rows do not arrive in sorted order, the database must leave room to shoehorn in new rows where they belong; otherwise, it must periodically reorder rows. Reordering is expensive, and leaving space for later rows wastes space and harms cache efficiency. The bottom line on single-table clusters is that I have never needed them to get good performance, and I do not expect you will either.

2.4.3 Multitable Clusters

Like single-table clusters, multitable clusters are presorted on some key, but multitable clusters have rows, in the same blocks, from multiple tables that join on that key, making the join between those tables extra fast. If you do not access the clustered tables together, having the other table's rows in the blocks you read just gets in the way, so a key question is how often you join the different tables in your application queries. If you have two or more always-joined tables that map one-to-one to each other (i.e., tables that share a unique key), multitable clusters probably work pretty well. But, in that case, why make them separate tables at all? Instead, just put the superset of all columns into a single table. More often, some master table has a one-to-many relationship with its details, such as Orders and Order_Details. Here, the problem becomes the fluidity of most one-to-many relationships. In the cluster block, you must allow for the possibility of many details and at the same time avoid wasting too much space when it turns out there is just one (or even no) detail row. As for single-table clusters, this leads to a tradeoff between wasted space and reordering costs. Just as for single-table clusters, I have never needed multitable clusters for good performance, and you probably will not either.

2.4.4 Partitioned Tables

Partitioned tables are another Oracle feature, originated largely to handle maintenance on truly huge tables. Imagine you have a trade-tracking system for a major brokerage. You probably have an immense history of individual trades, but you rarely have interest in trades older than one year. You want to keep efficient, continuous access to the recent trades, while having some chance to eventually archive really old trades without disrupting your system.

Partitioned tables look like a bunch of subtables, or partitions, that you can maintain independently. For example, you can take one partition offline without disrupting access to the others. Query statements need to refer explicitly only to the partitioned table name that represents the whole collection. The partitions are organized according to some range condition that determines which partition owns which rows. This range condition often uses a date, such as Trade_Date in our example, for which each partition covers a substantial date range, such as a month or a year. As long as the query conditions exclude some of the partitions, the query can skip access to those partitions; it can even run if those partitions are entirely offline. Partitioned tables have some great advantages for ease of administration, but in my own experience I have never needed them just to solve a performance problem. I expect, though, that they help for some very large tables, like those in the trade-tracking example, for which the rows naturally organize along a partition condition, usually a date. From the point of view of SQL tuning, you can treat partitioned tables essentially as normal large tables.

2.4.5 Bit-Mapped Indexes

Bit-mapped indexes are designed largely to address special concerns in data warehousing. The chief virtue of bit-mapped indexes is that they allow you to efficiently combine multiple, not-very-selective conditions covered by different indexes to obtain a single short list of rows that satisfy all conditions at once. However, a single, multicolumn index will allow much the same thing, but without the disadvantages that I describe in the next paragraph, so I do not expect that bit-mapped indexes are often useful; certainly, I have never needed one to tune a query. (I have done relatively little data-warehousing work, so bit-mapped indexes might be more useful to you than to me, if data warehousing is your venue.)

Each stored value of a bit-mapped index points to what amounts to a list of yes/no bits that map to the whole list of table rows, with yes bits mapping to table rows that have that value for the indexed column. These bit strings are easy to AND and OR together with other bit strings of other bit-mapped indexes to combine conditions across bit-mapped values on multiple indexes. The big catch is that such bit strings are expensive to maintain in sync with frequently changing table contents, especially updates in the middle of a table. Bit-mapped indexes work best for tables that are mostly read-only; however, large tables do not grow large by being read-only, and small tables do not require special indexes for efficient data access. The best scenario for success is precisely the data-warehouse scenario for which bit-mapped indexes were designed: a data-warehouse database that is read-only during the day and periodically, probably during nights or weekends, does whole-table refreshes from some transactional database in which the tables grow steadily.