9.5 Unsolvable Problems

So far, I've described strategies for solving what I would call pure SQL tuning problems, where the text of a slow query constitutes essentially the whole problem statement and where you can tune that query out of context with the rest of the application. The following narrowly defined performance problem statement illustrates the boundaries of the problem I've attacked so far:

Alter a given SQL statement to deliver precisely the same rows it already delivers (thus requiring no change to the rest of the application) faster than some predefined target runtime.

Assuming the predefined target runtime is set rationally according to real business needs, there are four basic ways that this problem can be unsolvable:

  • The query runs so many times that the target runtime must be set very low. This is particularly true of queries that the application repeats hundreds of times or more to resolve a single, online, end-user request.

  • The query must return too many table rows (taking into account the number of tables) to have any hope of meeting the target, even with excellent caching in the database.

  • The query must aggregate too many table rows (taking into account the number of tables) to have any hope of meeting the target, even with excellent caching in the database.

  • The layout of the data provides no potential execution plan that avoids high running rowcounts at some point in the execution plan, even while the final returned rowcount is moderate.

In this section, I describe how to recognize each type of problem. In Chapter 10, I will discuss solving these "unsolvable" problems by stepping outside the box that the too-narrow problem statement imposes.

The first problem type is self-explanatory: the target is way lower than an end user should care about. The explanation turns out to be that a given end-user action or batch process requires the query to run hundreds or even millions of times to perform what, to the end user, is a single task.

The second and third problem types are easy to recognize in a query diagram. Like Figure 9-5, the query, which reads at least one large table, turns out to have no filters or only a couple of weakly selective filters. If you have no aggregation (no GROUP BY, for example), you have the second problem. When you test the best nested-loops plan you can find, with any row-ordering steps removed so you can see rows before sorting, you discover that the query returns rows at a very high rate but returns so many rows that it runs interminably. The third problem, with a GROUP BY or other aggregation, looks just like the second problem type, but with the addition of aggregation.

The fourth problem type is the subtlest. A query diagram with at least one large table has several semiselective filters spread throughout a query diagram, often with subqueries, in such a way that large fractions of one or more large tables must be read before you can reach enough filters to reduce the rowcount to something moderate. This case might sound like it should be quite common, but it actually turns out to be rare; I've run into it probably less than once a year, on average.