Optimization for Administrators

The previous sections have described optimizations that can be performed by unprivileged MySQL users. But there are also optimizations that can be performed only by administrators who have control of the MySQL server or the machine on which it runs. Some server parameters pertain to query processing and can be tuned, and certain hardware configuration issues have a direct effect on query processing speed. In general, the primary principles to keep in mind when performing administrative optimizations are as follows:

  • Accessing data in memory is faster than accessing data from disk.

  • Keeping data in memory as long as possible reduces the amount of disk activity.

  • Retaining information from an index is more important than retaining contents of data records.

Specific ways you can apply these principles are discussed next.

Increase the size of the server's caches. The server has many parameters (variables) that you can change to affect its operation, several of which directly affect the speed of query processing. The most important parameters you can change are the sizes of the table cache and the caches used by the table handlers for indexing operations. If you have memory available, allocating it to the server's cache buffers will allow information to be held in memory longer and reduce disk activity. This is good, because it's much faster to access information from memory than to read it from disk.

  • The table cache is used to hold information about open tables. Its size is controlled by the table_cache server variable. If the server accesses lots of tables, this cache fills up and the server must close tables that haven't been used for a while to make room for opening new tables. You can assess how effective the table cache is by checking the Opened_tables status indicator:

    SHOW STATUS LIKE 'Opened_tables'; 

    Opened_tables indicates the number of times a table had to be opened because it wasn't already open. (This value is also displayed as the Opens value in the output of the mysqladmin status command.) If the number remains stable or increases slowly, it's probably set to about the right value. If the number grows at a high rate, it means the cache is full a lot and that tables have to be closed to make room to open other tables. If you have file descriptors available, increasing the table cache size will reduce the number of table opening operations.

  • The key buffer is used by the MyISAM and ISAM table handlers to hold index blocks for index-related operations. Its size is controlled by the key_buffer_size server variable. Larger values allow MySQL to hold more index blocks in memory at once, which increases the likelihood of finding key values in memory without having to read a new block from disk. The default size of the key buffer is 8MB. If you have lots of memory, that's a very conservative value and you should be able to increase it substantially and see a considerable improvement in performance for index-based retrievals and for index creation and modification operations.

  • The InnoDB and BDB handlers have their own caches for buffering data and index values. The sizes are controlled by the innodb_buffer_pool_size and bdb_cache_size variables. The InnoDB handler also maintains a log buffer, the size of which is controlled by the innodb_log_buffer_size variable.

  • Another special cache is the query cache, described later in its own section, "The Query Cache."

Instructions for setting server variables can be found in Chapter 11. When you change parameter values, adhere to the following guidelines:

  • Change one parameter at a time. Otherwise, you're varying multiple independent variables and it becomes more difficult to assess the effect of each change.

  • Increase server variable values incrementally. If you increase a variable by a huge amount on the theory that more is always better, you may run your system out of resources, causing it to thrash or slow to a crawl because you've set the value too high.

  • To get an idea of the kinds of parameter variables that are likely to be appropriate for your system, take a look at the my-small.cnf, my-medium.cnf, my-large.cnf, and my-huge.cnf option files included with MySQL distributions. (You can find them under the support-files directory in source distributions and under the share directory in binary distributions.) These files will give you some idea of which parameters are best to change for servers that receive different levels of use and also some representative values to use for those parameters.

Other strategies you can adopt to help the server operate more efficiently include the following:

Disable table handlers that you don't need. The server won't allocate any memory for disabled handlers, allowing you to devote it elsewhere. The ISAM, InnoDB, and BDB handlers can be disabled entirely if you build the server from source, and the InnoDB and BDB handlers can be disabled at server startup time. See Chapter 11 for details.

Keep grant table permissions simple. Although the server caches grant table contents in memory, if you have any rows in the tables_priv or columns_priv tables, the server must check table- and column-level privileges for every query. If those tables are empty, the server can optimize its privilege checking to skip those levels.

If you build MySQL from source, configure it to use static libraries rather than shared libraries. Dynamic binaries that use shared libraries save on disk space, but static binaries are faster. (However, you cannot use static binaries if you want to load user-defined functions because the UDF mechanism relies on dynamic linking.)

The Query Cache

As of MySQL 4.0.1, the server can use a query cache to speed up processing of SELECT statements that are executed repeatedly. The resulting performance improvement often is dramatic. The query cache works as follows:

  • The first time a given SELECT statement is executed, the server remembers the text of the query and the results that it returns.

  • The next time the server sees that query, it doesn't bother to execute it again. Instead, the server pulls the query result directly from the cache and returns it to the client.

  • Query caching is based on the literal text of query strings as they are received by the server. Queries are considered the same if the text of the queries is exactly the same. Queries are considered different if they differ in lettercase or come from clients that are using different character sets or communication protocols. They also are considered different if they are otherwise identical but do not actually refer to the same tables (for example, if they refer to identically named tables in different databases).

  • When a table is updated, any cached queries that refer to it become invalid and are discarded. This prevents the server from returning out-of-date results.

Support for the query cache is built in by default. If you don't want to use the cache and want to avoid incurring even the minimal overhead that it involves, you can build the server without it by running the configure script with the --without-query-cache option.

For servers that include query cache support, cache operation is based on the values of three variables:

  • query_cache_size determines the size of the query cache. A value of zero disables the cache, which is the default setting. (In other words, the cache is not used unless you turn it on explicitly.) To enable the cache, set query_cache_size value to the desired size of the cache, in bytes. For example, to allocate 16MB, use the following setting in an option file:

    set-variable = query_cache_size=16M
  • query_cache_limit sets the maximum result set size that will be cached; query results larger than this value are never cached.

  • query_cache_type determines the operating mode of the query cache. The possible mode values are as follows:

    Mode Meaning
    0 Don't cache
    1 Cache queries except those that begin with SELECT SQL_NO_CACHE
    2 Cache on demand only those queries that begin with SELECT SQL_CACHE

Individual clients begin with query caching behavior in the state indicated by the server's default caching mode. A client may change how its queries are cached by the server by using the following statement:


val can be 0, 1, or 2, which have the same meaning as for the query_cache_type variable. The symbolic values OFF, ON, and DEMAND are synonyms for 0, 1, and 2.

A client can also control caching of individual queries by adding a modifier following the SELECT keyword. SELECT SQL_CACHE causes the query result to be cached if the cache is operating in demand mode. SELECT SQL_NO_CACHE causes the result not to be cached.

Suppression of caching can be useful for queries that retrieve information from a constantly changing table. In that case, the cache is unlikely to be of much use. Suppose you're logging Web server requests to a table in MySQL, and also that you periodically run a set of summary queries on the table. For a reasonably busy Web server, new rows will be inserted into the table frequently and thus any query results cached for the table become invalidated quickly. The implication is that although you might issue the summary queries repeatedly, it's unlikely that the query cache will be of any value for them. Under such circumstances, it makes sense to issue the queries using the SQL_NO_CACHE modifier to tell the server not to bother caching their results.

Hardware Issues

The earlier part of this chapter discusses techniques that help improve your server's performance regardless of your hardware configuration. You can of course get better hardware to make your server run faster. But not all hardware-related changes are equally valuable. When assessing what kinds of hardware improvements you might make, the most important principles are the same as those that apply to server parameter tuning. Put as much information in fast storage as possible, and keep it there as long as possible.

Several aspects of your hardware configuration can be modified to improve server performance:

  • Install more memory into your machine. This enables you to increase the server's cache and buffer sizes, which allows it to keep data in memory longer and with less need to fetch information from disk.

  • Reconfigure your system to remove all disk swap devices if you have enough RAM to do all swapping into a memory file system. Otherwise, some systems will continue to swap to disk even if you have sufficient RAM for swapping.

  • Add faster disks to improve I/O latency. Seek time is typically the primary determinant of performance here. It's slow to move the heads laterally; when the heads are positioned, reading blocks off the track is fast by comparison. However, if you have a choice between adding more memory and getting faster disks, add more memory. Memory is always faster than your disks, and adding memory allows you to use larger caches, which reduces disk activity.

  • Take advantage of parallelism by redistributing disk activity across physical devices. If you can split reading or writing across multiple physical devices, it will be quicker than reading and writing everything from the same device. For example, if you store databases on one device and logs on another, writing to both devices at once it will be faster than if databases and logs share the same device. Note that using different partitions on the same physical device isn't sufficient. That won't help because they'll still contend for the same physical resource (disk heads). The procedure for moving logs and databases is described in Chapter 10, "The MySQL Data Directory."

    Before you relocate data to a different device, make sure you understand your system's load characteristics. If there's some other major activity already taking place on a particular physical device, putting a database there may actually make performance worse. For example, you may not realize any overall benefit if you process a lot of Web traffic and move a database onto the device where your Web server document tree is located. (If you have only a single drive, you can't perform much disk activity redistribution, of course.)

    Use of RAID devices can give you some advantages of parallelism as well.

  • Use multi-processor hardware. For a multi-threaded application like the MySQL server, multi-processor hardware can execute multiple threads at the same time.