Choosing Indexes: Query Versus Update Performance

I/O is the primary factor in determining query performance. The challenge for the database designer is to build a physical data model that provides efficient data access. Creating indexes on database tables allows SQL Server to access data with a reduced number of I/Os. Defining useful indexes during the logical and physical data modeling step is crucial. The SQL Server optimizer relies heavily on index key distribution and index density to determine which indexes to use for a query. The optimizer in SQL Server can use multiple indexes in a query (through index intersection) to reduce the number of I/Os to retrieve information. In the absence of indexes, the optimizer performs a table scan, which can be costly from an I/O standpoint.

Although indexes provide a means for faster access to data, they slow down data modification statements due to the extra overhead of having to maintain the index during inserts, updates, and deletes (for more information on index maintenance during updates, see Chapter 33).

In a Decision Support System (DSS), defining many indexes can help your queries and does not create much of a performance issue because the data is relatively static and doesn't get updated frequently. You typically load the data, create the indexes, and forget about it until the next data load. As long as you have the necessary indexes to support the user queries, and they're getting decent response time, the penalties of having too many indexes in a DSS environment is the space wasted for indexes that possibly won't be used, the additional time required to create the excessive indexes, and the additional time required to back up and run DBCC checks on the data.

In an OLTP environment, on the other hand, too many indexes can lead to significant performance degradation, especially if the number of indexes on a table exceeds four or five. Think about it for a second. Every single-row insert is at least one data page write and one or more index page writes (depending on whether a page split occurs) for every index on the table. With eight nonclustered indexes, that would be a minimum of nine writes to the database for a single-row insert. Therefore, for an OLTP environment, you want as few indexes as possible?typically only the indexes required to support the update and delete operations and your critical queries, and to enforce your uniqueness constraints.

So the natural solution, in a perfect world, would be to create a lot of indexes for a DSS environment and as few indexes as possible in an OLTP environment. Unfortunately, in the real world, you typically have an environment that must support both DSS and OLTP applications. How do you resolve the competing indexing requirements of the two environments? To meet the indexing needs of DSS and OLTP applications requires a bit of a balancing act, with no easy solution. It often involves making hard decisions as to which DSS queries might have to live with table scans, and which updates have to contend with additional overhead.

One solution is to have two separate databases?one for DSS applications and another for OLTP applications. Obviously, this method requires some method of keeping the databases in sync. The method chosen depends on how up-to-date the DSS database has to be. If you can afford some lag time, you could consider using a dump-and-load mechanism, such as Log Shipping or periodic full database restores. If the DSS system requires up-to-the-minute concurrency, you might want to consider using replication (see Chapter 22, "Data Replication").

TIP

Although replication might seem like a slick solution, do not enter into this type of decision lightly?setting up and maintaining a replicated environment can be complex and time consuming. Consider your options fully before you take this approach.

Also, I wouldn't recommend using triggers as a method to keep two databases in sync. I once saw a system in which such a design had been implemented. The performance overhead of the trigger in the OLTP environment was much greater than any overhead caused by having the additional indexes on the tables to support the DSS queries. Replication is a much cleaner, behind-the-scenes approach with minimal impact on query performance in the OLTP database.

Another possible alternative is to have only the required indexes in place during normal processing periods to support the OLTP requirements. At the end of the business day, create the indexes necessary to support the DSS queries and reports, which can run as batch jobs after normal processing hours. When the DSS reports are complete, drop the additional indexes, and you're ready for the next day's processing. Note that this solution assumes that the time required to create the additional indexes is offset by the time saved by the faster running of the DSS queries. If the additional indexes do not result in substantial time savings, they probably are not necessary and need not be created in the first place. The queries need to be more closely examined to select the appropriate indexes that will best support your queries.

It is therefore important to choose indexes carefully to provide a good balance between data search and data modification performance. The application environment usually governs the choice of indexes. For example, if the application is mainly OLTP with transactions requiring fast response time, creating too many indexes might have an adverse impact on performance. On the other hand, the application might be a decision support system (DSS) with few transactions doing data modifications. In that case, it makes sense to create a number of indexes on the columns frequently used in the queries.