Glossary

Proper words in proper places, make the true definition of a style.
Jonathan Swift, Letter to a Young Clergyman

Aggregation: The summarization of details into a smaller number of summary datapoints (sums, counts, and averages, usually), usually using GROUP BY.

Anti-join: A correlation join applied to a NOT EXISTS-type subquery.

Apples-and-oranges tables: Tables that hold somewhat different, but related entity types within the same physical table.

B-tree index: A balanced, branched, sorted structure that allows the database to rapidly locate a row or set of rows that match conditions of the indexed column or columns.

Block: The smallest unit of physical storage or cached storage in a database. Blocks are usually just a few kilobytes in size and most often contain from less than one hundred rows to at most a few hundred rows.

Block buffer cache: The cache that stores recently used table and index blocks in shared memory for logical I/O, avoiding the need for physical I/O of the cached blocks. Any user's SQL can read blocks into the block buffer cache, and any other user's SQL automatically takes advantage of these cached blocks. See LRU caching.

Branch block: An index block that the database reaches from a root block or a higher branch block. The branch block, in turn, points to leaf blocks or lower branch blocks that contain entries in the desired range. See root block and leaf block.

Cache-hit ratio: The fraction of logical I/Os that avoid physical I/Os.

Cartesian join: A join between two tables or two combinations of tables in which the SQL fails to specify any condition or combination of conditions that relate the joined tables to each other.

Cartesian product: The set of all possible combinations of rows from two or more rowsets. A Cartesian product results when SQL contains a Cartesian join.

Cold: A database block is said to be cold when it is rarely accessed. A block might be cold in the context of a particular type of I/O. For example, a block that is hot with respect to logical I/O might be so well cached that it is cold with respect to physical I/O.

Complex query: A multitable query that is not a simple query. See simple query.

Correlation joins: The joins in subqueries that correlate the subquery rows to values from the outer query.

Correlation preference ratio: The ratio of the runtime of the IN form (which drives from the subquery out) of a query with an EXISTS-type subquery divided by the runtime of the EXISTS form (which drives from the outer query to the subquery) of the same query. A correlation preference ratio greater than 1.0 implies that the best execution plan drives from the outer query to the subquery, since the alternative takes longer.

CPU: Central Processing Unit. The computer component where in-memory software instructions execute. CPU operations are fast compared to physical I/O to and from the database, but databases can nonetheless spend long intervals simply consuming CPU (these intervals are called CPU time) to service inefficient queries against well-cached data. Logical I/O requires CPU time, while physical I/O consumes time in the disk subsystem.

Cyclic join graph: A query diagram that has links that form a closed loop.

Denormalization: Storage of denormalized data in one or more tables.

Denormalized data: Data that is redundant with other data already available in the database. For example, storing Customer_ID in Order_Lines would be denormalized data if a join from Order_Lines to Orders would yield the same Customer_ID stored as a column in Orders.

Detail join ratio: The join ratio on the nonunique end of the join. See join ratio.

Detail table: A table that potentially has more than one matching row for any given row in a another table (usually a master table) that joins to it. A table might be a master table to one table and a detail table to another, so the term detail table describes a table's position in a relationship with another table with respect to a particular join.

Distributed filters: Filter conditions spread across multiple tables that are cumulatively more selective than the filter conditions of any one of the individual tables.

Driving table: The first table reached as the database executes a query. The database must find a path into the driving table that does not depend on having data from any other table.

Execution plan: The path that the database will follow to reach the data the query requires. The execution plan consists mainly of table-access methods for each table queried; a join order, beginning with the driving table; and join methods for each table joined, following the driving table.

EXISTS -type subquery: A subquery related to the outer query with an EXISTS condition or with an IN condition capable of being converted to an EXISTS condition.

Filter condition: A condition in the WHERE clause that can be evaluated as true or false with data from only a single table, used to narrow a query to a subset of the rows of that table. Selective filter conditions are key to efficient execution plans.

Filter independence: The assumption, usually true, that you can calculate multiple-condition selectivity as the simple product of individual-condition filter ratios. For example, a condition on a person's first name and a condition on a person's Zip Code are logically independent. You can assume that the fraction of rows that have both the correct first name and the correct Zip Code will be roughly the product of the fraction of rows that have the correct name and the fraction that have the correct Zip Code. For example, if 1/100th of the rows contain the desired first name and 1/500^th of the rows contain the desired Zip Code, then the multiple-condition filter selectivity is 1/50,000 ((1/100) x (1/500)).

Filter ratio: The fraction of rows of a table for which a set of filter conditions on that single table is true. Mathematically, this is the rowcount from the table with the filter conditions applied divided by the complete table rowcount.

Filter redundancy: A relationship between filters wherein the truth of one condition guarantees the truth of another (the opposite of filter independence). For example, a condition on a Zip Code likely guarantees a single value for telephone area code, so the selectivity of conditions on both would be no better than the selectivity of the Zip Code alone. You can always test for complete or partial filter redundancy by calculating the multicondition filter selectivity based on filter independence and seeing if it yields the actual selectivity of the combination of the conditions.

Focus table: The table that is the current point from which to add further details to the query diagram, as you build the query diagram. When a query diagram has no missing elements around the current focus table, you choose a new focus table.

Foreign key: A value or tuple stored in a table row that matches a unique key in some other table row, which is identified by the matching primary key.

Full index scan: The operation of reading every index entry across all leaf blocks.

Full table scan: The operation of reading an entire table directly, without first obtaining table rowids from an index.

Hash join: A join method wherein the two rowsets being joined are reached independently, once, and matched according to the output of a randomizing hash function applied to the join columns. The smaller rowset (or at least the rowset the database expects will be smaller) is generally presorted into hash buckets in memory. Then, the database calculates the hash function on the fly while reading the larger rowset and matches rows from the larger rowset to the hashed rows in memory from the smaller rowset.

High-water mark: The block address in a table that shows the highest block that has contained table rows since the table was created or since it was last truncated. Full table scans must read from the beginning of a table up to the high-water mark, including every block in the table extents between those two points, even when deletes have emptied most of those blocks.

Hot: A database block is said to be hot when it is frequently accessed. A block might be hot in the context of a particular type of I/O. For example, a block that is hot with respect to logical I/O might be so well cached that it is cold with respect to physical I/O.

I/O: Input/output. Usually refers to physical I/O, but it can also refer to logical I/O.

Index: A database structure that helps the database efficiently reach just a desired subset of the table rows without having to read the whole table. See B-tree index (the most common type, by far).

Index range scan: The operation of reading (usually with logical I/O from cache) an index range (a set that might include pointers to multiple rows) over as many leaf blocks as necessary.

Individual-condition filter selectivity: The fraction of table rows that satisfy a single condition on that table.

Inner join: An operation that matches rows between two data sources, usually tables, and returns combinations that satisfy one or more join conditions that relate the data sources. Rows are returned only in combination. Any given source row that does not have a match in the joined table is discarded.

Join: An operation that matches rows between two data sources, usually tables. See inner join and outer join.

Join condition: A condition in the WHERE clause that requires values from two (or, rarely, more) tables for the condition to be evaluated as true or false. Conditions in a WHERE clause that are not filter conditions are join conditions. Join conditions typically provide an efficient path to other tables, once you reach rows from the best possible driving table.

Join filter: A join ratio that turns out to be less than 1.0, resulting in a loss of rows when performing the join itself. Only inner joins can have join filters.

Join ratio: For a join between table A and table B, the join ratio on the A end of the join is the rowcount of the join of A and B divided by the rowcount of B. If A is a detail table to master table B, the join ratio on the A end (the detail join ratio) conveys "how many" details A contains in its many-to-zero or many-to-one relationship to B. On the B end of the same join, the master join ratio conveys "how often it is one" on the zero-to-one end of the same many-to-zero or many-to-one relationship.

Join skeleton: Same as query skeleton, that part of the query diagram that shows how the tables join but does not include filter ratios or join ratios.

Leaf block: A lowest-level index block that the database reaches from a root block or a branch block. Leaf blocks do not point to lower-level index blocks. Instead, they point to table blocks that hold the indexed rows. A leaf block contains tuples of indexed column values and rowids that point to rows that contain those column values.

Logical I/O: Any read or write of a database block from or to the shared cache during the execution of SQL, even if the database must first read the block from disk to place it in the shared cache.

LRU caching: The usual form of caching a database uses to keep blocks in the shared cache. In LRU caching, the database overwrites the least recently used (LRU) cached block (at the tail of the cached list of blocks) whenever the database needs to read a new block off disk into cache. Subsequently, logical I/O to a cache moves blocks to the head of the list, where the most recently used blocks reside. Databases sometimes treat I/O for large full table scans as a special case, leaving blocks from full table scans at the tail of the LRU cache to avoid pushing more useful blocks off the cache.

Many-to-one join: A join from a detail table to a master table.

Master join ratio: The join ratio on the unique end of the join. See join ratio.

Master table: A table that has at most one matching row for every row in another table (usually a detail table) that joins to it. A table might be a master table to one table and a detail table to another, so the term master table describes a table's position in a relationship with another table with respect to a particular join.

Middleware: Software that moves data around within a system or systems, without sending that data to the end users. Since end users aren't part of the picture and computers have plenty of patience for large data volumes, these batch processes sometimes legitimately need to handle data volumes that are too large for human consumption.

Multiple-condition filter selectivity: The fraction of table rows that satisfy the combination of conditions that refer exclusively to that table.

Nested-loops join: A join method that uses each row of the query result reached so far to drive into the joined-to table, usually through an index on the join key.

Normalized data: Wholly nonredundant data (data that lacks denormalized data). See denormalized data.

NOT EXISTS-type subquery: A subquery related to the outer query with a NOT EXISTS condition, or a NOT IN condition capable of being converted to a NOT EXISTS condition.

Outer join: An operation that combines rows between two data sources, usually tables. An outer join returns rows that satisfy either the inner case of the outer join or the outer case of the outer join. Rows that satisfy the inner case of the outer join are identical to the result the database would return if the outer join were replaced by an inner join. In the outer case of the outer join, the database finds no matching row in the joined-to table but returns a row that appends columns from the joined-from row to artificial all-null columns wherever the SQL references the joined-to table.

Physical I/O: The subset (usually small) of logical I/O that fails to find the necessary block cached, resulting in a call for a physical read or write. Although the database views any call to the access disk as physical I/O, the operating system and the disk subsystem normally maintain caches of their own and often rapidly satisfy disk-access calls without true physical I/O that requires slow physical disk reads.

Post-read filter: A filter condition that can be evaluated for truth only after the database reads a table row in a given execution plan. The index used to reach the table does not contain the data necessary to evaluate the truth of the post-read filter condition, so the database must first read the table row to find columns that the post-read filter references.

Primary key: A value or tuple stored in a table row that uniquely identifies a row in a table. Foreign keys point to primary keys. In a one-to-one relationship, a primary key can also serve as a foreign key.

Query diagram: A diagram in the form of nodes and links, with associated numerical values, such as filter ratios and join ratios. The query diagram concisely conveys the mathematical essentials of a query tuning problem. Chapter 5 describes how to create a query diagram from a SQL query, and Chapter 7 adds some refinements for complex queries.

Query skeleton: Same as join skeleton, that part of the query diagram that shows how the tables join but does not include filter ratios or join ratios.

Referential integrity: A property of a foreign key such that every row in the table has a foreign-key value that points to a matching primary key for a row in the corresponding master table. When foreign keys fail to have referential integrity, you can usually infer a defect in the application or database design, since foreign keys become meaningless when they fail to point to primary keys that are still in existence.

Robust execution plan: An execution plan that reaches the driving table efficiently, usually through an indexed read, and that reaches the rest of the tables through nested loops to the join-key indexes.

Root detail table: A table in a query diagram that joins to other tables only through foreign keys, lying at the detail end of every join it takes part in. Most query diagrams take the form of a tree, with a single root detail table at the top. Rows returned from such a query map one-to-one with rows that the query returns from the root detail table. The root detail table is usually the largest table in the query.

Root block: The first block read for an index range scan or unique scan on a B-tree index. The root block contains pointers to the subranges covered by each index block at the level below, when the index does not fit into a single block. Occasionally (usually when the index covers fewer than 300 rows), the entire index fits within the root block. Root blocks of useful indexes are almost invariably cached, since they require logical I/O too frequently to reach the tail end of the LRU cache.

Rowid: The internal address of a physical table row, consisting of a block address that points to the table block that contains the row and a row address within the block, leading directly to the actual row.

Rowcount: The count of rows in a rowset.

Rowset: Any set of rowseither table rows or result rows from a complete or partial query. During execution of a query, the database builds return rowsets as it joins queried tables, discarding rows along the way when inner joins fail to find matches or when joined table rows fail to meet query conditions, finally returning rows that satisfy all conditions of the query.

Self-caching: Most queries find most of the blocks they need already cached by earlier queries, usually from other sessions. Self-caching happens when a query performs repeated logical I/O to the same database blocks that might not have been cached before the query started. The initial I/O to each of these blocks might be physical, but the tendency of queries to reuse the same blocks provides self-caching, in which the query itself ensures that the cache becomes loaded with blocks that are needed repeatedly by that query. The effectiveness of self-caching depends on how well-clustered the queried rows are (i.e., how closely they reside in the physical tables and indexes). Self-caching is especially effective for index blocks, especially higher-level index blocks.

Semi-join: A correlation join applied to an EXISTS-type subquery.

Simple query: A query that meets the following conditions: (1) The query maps to one tree. (2) The tree has exactly one root (one table with no join to its primary key). All nodes other than the root node have a single downward-pointing arrow that links them to a detail node above, but any node can be at the top end of any number of downward-pointing arrows. (3) All joins have downward-pointing arrows (joins that are unique on one end). (4) Outer joins are unfiltered, pointing down, with only outer joins below outer joins. (5) The question that the query answers is basically a question about the entity represented at the top (root) of the tree or about aggregations of that entity. (6) The other tables just provide reference data that is stored elsewhere for normalization.

Sort-merge join: A join method in which the two rowsets being joined are reached independently, once, sorted, and matched by a coordinated merge of the rowsets sorted on the join keys.

Subquery adjusted filter ratio: An estimated value that helps you choose the best point in the join order to test the subquery condition.

Subquery root detail table: The root detail table of the query diagram of the subquery, isolated from the outer query.

Tuple: An ordered grouping of a fixed number of values, especially column values or expressions. For example, two-column primary keys consist of two-value tuples that map to unique table rows.