3.3 Reading SQL Server Execution Plans

Microsoft SQL Server uses multiple approaches to generate and display execution plans. These approaches create execution plans sent to your screen, in either graphical or text form, rather than place execution-plan data into tables, as DB2 and Oracle do.

If you bought this book hoping to tune Sybase Adaptive Server, take heart. The two databases share a common heritage, and almost all of what I will say about Microsoft SQL Server applies to Sybase as well, except for the SQL Server Query Analyzer graphical plan facility, which Microsoft added after the two versions split.

3.3.1 Displaying Execution Plans

SQL Server has two approaches to displaying execution plans: a graphical approach, built into the SQL Server Query Analyzer, and a text-based approach, driven by the underlying database. The graphical display does not fit a whole execution plan of even a modest-sized multitable query onto a single screen. Therefore, I find it difficult to use the graphical display to answer the key questions about a long execution plan:

What is the join order?
What method is used for each join?
What method is used for each table access?

The text-based execution-plan display provides the answers to all three of these questions easily and fairly compactly.

3.3.1.1 Displaying execution plans graphically

To see execution plans graphically, you click on the Display Estimated Execution Plan button in SQL Server Query Analyzer. In the window where you usually see query results, you see a diagram of arrows connecting a series of icons that indicate the type of action (nested loops, indexed read, table access, etc.). Text goes with each icon, but the text is generally truncated so that it contains nothing useful until you point to it with your mouse, at which point you get a window that shows the missing details. Furthermore, for even a simple four-way join, the whole diagram doesn't fit on the screen, even with Query Analyzer maximized to fill the screen. I find the graphical approach less useful than the text-based approach, which tells me everything I need to know at a glance.

3.3.1.2 Displaying execution plans textually

In the alternative to graphical execution-plan display, you place the query in the Query window of the SQL Server Query Analyzer, preceded by SET SHOWPLAN_TEXT ON, then click on Query Analyzer's Execute Query button:

SET SHOWPLAN_TEXT ON
GO
SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LE, Locations LM, Employees M, Employees E
WHERE E.Last_Name = 'Stevenson'
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID
  AND UPPER(LE.Description) = 'SAN FRANCISCO'

If you're on Sybase, just replace SHOWPLAN_TEXT with SHOWPLAN. If you want an even more detailed execution plan, you can also use SHOWPLAN_ALL on Microsoft SQL Server.

When I run the preceding commands with empty tables having statistics, I find the following output in my results window (insignificant text is replaced with ..., and, to fit the output on the page, I've added (wrapped line) to show continued lines):

StmtText          
-----------------------------------------------------------------------
SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LE, Locations LM, Employees M, Employees E
WHERE E.Last_Name = 'Stevenson'
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID
  AND UPPER(LE.Description) = 'SAN FRANCISCO'

(1 row(s) affected)

StmtText             
----------------------------------------------------------------------- 
  |--Bookmark Lookup(BOOKMARK:([Bmk1001]), OBJECT:([my_acct].[dbo].[Locations] AS [LM]))
       |--Nested Loops(Inner Join)
            |--Bookmark Lookup(...(...[Employees] AS [M]))
            |    |--Nested Loops(Inner Join)
            |         |--Filter(WHERE:(upper([LE].[Description])='SAN FRANCISCO'))
            |         |    |--Bookmark Lookup(...(...[Locations] AS [LE]))
            |         |         |--Nested Loops(Inner Join)
            |         |              |--Bookmark Lookup(...(...[Employees] AS [E]))
            |         |              |    |--Index Seek(...(...[Employees].
(wrapped line) [Emp_Last_Name] AS [E]), SEEK:([E].[Last_Name]='Stevenson') ORDERED)
            |         |              |--Index Seek(...(...[Locations].[Location_PKey] 
(wrapped line) AS [LE]), SEEK:([LE].[Location_ID]=[E].[Location_ID]) ORDERED)
            |         |--Index Seek(...(...[Employees].[Employee_PKey] 
(wrapped line) AS [M]), SEEK:([M].[Employee_ID]=[E].[Manager_ID]) ORDERED)
            |--Index Seek(...(...[Locations].[Location_PKey] 
(wrapped line) AS [LM]), SEEK:([LM].[Location_ID]=[M].[Location_ID]) ORDERED)

(12 row(s) affected)

After you have executed the query analysis once, you no longer need the top two lines activating SHOWPLAN_TEXT. All future queries will show plans only, until you click on Query Analyzer's Execute Query button for:

SET SHOWPLAN_TEXT OFF
GO

3.3.2 How to Interpret the Plan

Here is how you read the execution-plan output:

All joins are nested loops, based on the series of rows that state Nested Loops(Inner Join). If you have a mix of join methods, the first join executed will be the innermost one, the last one listed. You read the order of join methods executed from the bottom up.
The order of table access is Employees (E), Locations (LE), Employees (M), Locations (LM)the reverse order that the tables appear in the execution plan output, from the innermost nesting out. When SQL references the same tables multiple times, aliases for those tables are mandatory. Since the execution plan explicitly references the aliases, there is no ambiguity regarding which alias each step refers to.
All four table reads are through some index, as shown by the phrase Bookmark Lookup in front of each table name. The indexes used come in the lower entries with matched indentation for each Bookmark Lookup. Thus, you know that the driving table E is reached through an index range scan (a read that at least potentially touches multiple rows at a time) on the index EMPLOYEE_LAST_NAME. The rest of the table accesses are unique reads through the tables' primary keys. Since all reads after the driving table are for unique joins, you know that the query will read at most the same number of rows for each of these other tables as it reads for the driving table.

When you find scans on an index, the condition following the index name makes clear how much of a range the scan will cover. If you need to see other indexes available on a table, the simplest way is to use the sp_helpindex stored procedure. Here are the command and its result:

sp_helpindex Employees

index_name                  index_description                        index_keys
--------------------------- ---------------------------------------- -----------
Employee_Manager_ID         nonclustered located on PRIMARY          Manager_ID
Employee_Last_Name          nonclustered located on PRIMARY          Last_Name
Employee_Location_ID        nonclustered located on PRIMARY          Location_ID
Employee_Department_ID      nonclustered located on PRIMARY          Department_ID
Employee_Hire_Date          nonclustered located on PRIMARY          Hire_Date
Employee_PKey               nonclustered, unique located on PRIMARY  Employee_ID
Employee_First_Name         nonclustered located on PRIMARY          First_Name
Employee_Nickname           nonclustered located on PRIMARY          Nickname

When an index covers multiple columns, they are listed in order in the index_keys column. You can also use sp_help to see a complete description of a table, which includes a list of the table's indexes.

3.3.3 Narrative Interpretation of the Execution Plan

I just explained how to find the join order, the join methods, and the table-access methods for the robust execution plan I showed earlier. If you combine that with the basics covered in Chapter 2, you should understand how SQL Server will reach the data, from end to end. To test your understanding, try constructing a narrative that explains the full execution plan in English, as a set of instructions to the database. Compare your result with what follows. If it does not match well, try again later, after you have read a few more execution plans, to see if your understanding has improved. Here is the execution plan expressed in narrative form, as instructions to the database:

Using the condition E.Last_Name = 'Stevenson', go to the index EMPLOYEE_LAST_NAME and find the list of rowids that correspond to employees with the requested last name.
For each of these rowids, go to the table Employees (E) with a single-block read (logical read, physical when necessary) according to each rowid from the previous step, using the block-address part of the rowid. Using the row-address part of the rowid, find the specific row that the rowid points to and read all necessary data (requested data for alias E) from that row.
For each such row, using the join condition E.Location_ID=LE.Location_ID, go to the primary-key index LOCATION_PKEY to find a single matching rowid that corresponds to the location record that matches the employee whose record you already read. If no matching row is found, discard the result row being built.
Otherwise, for the matching rowid, go to the table Locations (LE) with a single-block read (logical read, physical when necessary) according to the rowid from the previous step, using the block-address part of the rowid. Using the row-address part of the rowid, find the specific row that the rowid points to and read all necessary data (requested data for alias LE) from that row. Append the applicable data to the incoming row from the earlier table read to build a partial result row.
For each such row, discard the whole result row if it contains data that fails to meet the condition UPPER(LE.Description)= 'SAN FRANCISCO'.
For each remaining row, using the join condition E.Manager_ID=M.Employee_ID, go to the primary-key index EMPLOYEE_PKEY to find a single matching rowid that corresponds to the employee record of the manager for the employee whose record you already read. If no matching row is found, discard the result row being built.
Otherwise, for the matching rowid, go to the table Employees (M) with a single-block read (logical read, physical when necessary) according to the rowid from the previous step, using the block-address part of the rowid. Using the row-address part of the rowid, find the specific row that the rowid points to and read all necessary data (requested data for alias M) from that row. Append the applicable data to the incoming row from the earlier table reads to build a partial result row.
For each such row, using the join condition M.Location_ID=LM.Location_ID, go to the primary-key index LOCATION_PKEY to find a single matching rowid that corresponds to the location record that matches the manager for the employee whose record you already read. If no matching row is found, discard the result row being built.
Otherwise, for the matching rowid, go to the table Locations (LM) with a single-block read (logical read, physical when necessary) according to the rowid from the previous step, using the block-address part of the rowid. Using the row-address part of the rowid, find the specific row that the rowid points to and read all necessary data (requested data for alias LM) from that row. Append the applicable data to the incoming row from the earlier table reads to complete and immediately return the fully built result row.

3.3.4 Interpreting Nonrobust Execution Plans

Execution plans often use join methods other than nested loops, especially as the starting plans you will need to tune, so I next show an example that performs hash joins in place of robust nested-loops joins. If I drop all the indexes and add an OPTION(HASH JOIN) hint at the end of the query, SQL Server delivers a new execution plan:

StmtText                      
------------------------------------------------------------------------- 
SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
       M.First_Name, M.Last_Name, LM.Description
FROM Locations LE, Locations LM, Employees M, Employees E
WHERE E.Last_Name = 'Stevenson'
  AND E.Manager_ID=M.Employee_ID
  AND E.Location_ID=LE.Location_ID
  AND M.Location_ID=LM.Location_ID
  AND UPPER(LE.Description) = 'SAN FRANCISCO'
  OPTION(HASH JOIN)

(1 row(s) affected)

StmtText
------------------------------------------------------------------------------
  |--Hash Match(Inner Join, ...([LM].[Location_ID])=([M].[Location_ID]),...)
       |--Table Scan(...(...[Locations] AS [LM]))
       |--Hash Match(Inner Join, ...([M].[Employee_ID])=([E].[Manager_ID]),...) 
            |--Table Scan(...(...[Employees] AS [M]))
            |--Hash Match(Inner ...([E].[Location_ID])=([LE].[Location_ID]),...)
                 |--Table Scan(...(...[Employees] AS [E]), 
(wrapped line) WHERE:([E].[Last_Name]='Stevenson'))
                 |--Filter(WHERE:(upper([LE].[Description])='SAN FRANCISCO'))
                      |--Table Scan(...(...[Locations] AS [LE]))

(8 row(s) affected)

This shows table scans for every table access. The query drives from LE and filters for locations with the correct description. The second table accessed is E, which is filtered for employees with the correct last name. The remaining rows from these two tables are hashed and joined. This result is next hash-joined to a full scan of M and, finally, to a full scan of LM.

3.3.5 Complex Execution Plans

There are other execution-plan features that you will see regularly, such as indicators of which joins are outer joins and steps for sorts and sort-unique operations that discard duplicates, but these are fairly self-explanatory and are not usually important to performance. The main remaining important subtlety that you will often see deals with subqueries. I'll cover this with one final example:

SELECT E.First_Name, E.Nickname, E.Last_Name,
E.Phone_Number, L.Description
FROM Employees E 
     INNER JOIN Locations L ON E.Location_ID=L.Location_ID
WHERE (E.First_Name= ? OR E.Nickname= ?)
AND EXISTS (SELECT 1 FROM Wage_Payments P
            WHERE P.Employee_ID=E.Employee_ID
            AND P.Payment_Date > CURRENT DATE - 31 DAYS);

Leave all tables empty. Place indexes on:

Employees(First_Name)
Employees(Nickname)
Locations(Location_ID)
Wage_Payments(Employee_ID)

You then find the following execution plan:

StmtText  
----------------------------------------------------------------------------- 
SELECT E.First_Name, E.Nickname, E.Last_Name,
       E.Phone_Number, L.Description
FROM Employees E, Locations L
WHERE (E.First_Name='Kathy' OR E.Nickname='Kathy')
  AND E.Location_ID=L.Location_ID
  AND EXISTS (SELECT null 
              FROM Wage_Payments P
              WHERE P.Employee_ID=E.Employee_ID
              AND P.Payment_Date > DATEADD(DAY,-31,GETDATE(  )));

(1 row(s) affected)

StmtText     
------------------------------------------------------------------------------- 
  |--Nested Loops(Left Semi Join)
       |--Filter(WHERE:([E].[First_Name]='Kathy' OR [E].[Nickname]='Kathy'))
       |    |--Bookmark Lookup(...(...[Employees] AS [E]))
       |         |--Nested Loops(Inner Join)
       |              |--Table Scan(...(...[Locations] AS [L]))
       |              |--Index Seek(...[Employees].[Employee_Location_ID] 
(wrapped line) AS [E]), SEEK:([E].[Location_ID]=[L].[Location_ID]) ORDERED)
       |--Filter(WHERE:([P].[Payment_Date]>dateadd(4, -31, getdate(  ))))
            |--Bookmark Lookup(...(...[Wage_Payments] AS [P]))
                 |--Index Seek(...(...[Wage_Payments].[Wage_Payment_Employee_ID] 
(wrapped line) AS [P]), SEEK:([P].[Employee_ID]=[E].[Employee_ID]) ORDERED)

(9 row(s) affected)

The execution plan shows a full table scan on Locations as the driving table, since it is the first data-access step at the innermost level of nesting. SQL Server then follows nested loops into the foreign-key index Employee_Location_ID to join to Employees. Once SQL Server reaches Employees, it discards rows that fail to meet the conditions on First_Name and Nickname. SQL Server then performs a special join called a semi-join to reach the correlated subquery on the join on matching Employee_IDs, with the index Wage_Payment_Employee_ID. That index provides rowids to reach Wage_Payments, following which the final filter condition on Payment_Date discards nonrecent rows that do not satisfy the EXISTS subquery. The join to the correlated EXISTS subquery is shown as the top step, which is described as a Left Semi Join. This result is not the optimum execution plan for fully populated tables, but since the test was with empty tables, I would not usually expect an execution plan that scales well to high data volumes.