SQL Statement Processing Steps

The internal mechanisms of SQL statement processing are different for each RDBMS, but generally, there are always these five steps involved to process an SQL statement.

The RDBMS parses the SQL statement by breaking it up into individual words (tokens) and validating the statement syntax. The purpose of this step is to make sure the statement does not have typos and logical errors in it (if it does, the processing stops right there, saving time and system resources). For example, the following statements would be rejected at the first step because the first one has keyword SELECT misspelled, and in the second one the order of the FROM and WHERE clauses is incorrect:

SELCT * FROM
			 CUSTOMER SELECT * WHERE CUST_ID_N = 65 FROM CUSTOMER

Note

Generally, any SQL statement consists of three types of tokens: keywords like SELECT, UPDATE, WHERE, etc.; table, view, constraint, or other database object names and column names; and values assigned to the columns or used in WHERE and HAVING clause predicates (or special variables for these values that could be substituted with real values in the runtime). In Step 1 of the process described here, only the first group of tokens is validated.

Parsing is usually the quickest step because it does not require actual database access.

The statement is then validated by the RDBMS by checking the statement against the information schema. In fact, this step validates the second group of tokens by making sure all the table and column names exist in the database. In addition, it checks the user privileges to execute the statement, ensures the column names are not ambiguous, and so on. For example, the following statement would fail at Step 2 (if you try to execute it against ACME sample database) because column CUST_ID does not exist in the CUSTOMER table:
```
UPDATE CUSTOMER SET
			 CUST_NAME_S = 'THE BEST CUSTOMER' WHERE CUST_ID = 65
```
The next step is to optimize the statement. The query optimization process differs significantly between different RDBMS vendors, but the idea is the same: to find the most efficient way to execute the SQL statement. For example, it determines in which order the tables should be joined, when search conditions are to be applied, whether use the existing indexes on certain columns, and so on. Optimization is usually very CPU-intensive; the RDBMS may need to choose from dozens of different ways to process the same query, especially if it is complex and involves multiple table joins and conditions; however, when done in the right way, optimization could improve the query performance dramatically.
The next step is to generate an execution plan for the statement based on the optimization process performed during Step 3. The execution plan is simply a binary representation of the optimized sequence of actions.

Finally, the set of binary instructions created in Step 4 is executed by the RDBMS. Most database vendors provide mechanisms that allow you to substitute the third group of token variables with the actual values at this point.

Note

In addition to the standard five steps to process any SQL statement, an additional step is necessary when processing a SELECT statement, namely returning the query results in some organized form.

We will be talking about the five steps of SQL statement processing again in this chapter when discussing dynamic SQL.