'SELECT', 'INSERT', 'UPDATE', and 'DELETE'

Four basic SQL statements allow you to retrieve and modify data in tables. SELECT retrieves data from one or more tables, INSERT inserts rows into one table, UPDATE modifies rows in one table, and DELETE removes rows from one table.

NOTE

If you are already familiar with these statements, you might want to just skim this section.

You could easily fill a book with examples and explanations of these statements (in fact, many authors already have). This section covers the major parts of syntax and shows some basic examples.

The `SELECT` Statement

The SELECT statement has the following basic syntax:

SELECT [DISTINCT] [TOP N [PERCENT]] column1 [AS column_heading] 
       [, column2 [AS column_heading], ...]
 [INTO new_table_name]
 FROM table1 [ [AS] table_alias ]
 [ [INNER | { LEFT | RIGHT | FULL} [OUTER] ] JOIN table2 [ [AS] table_alias2 ]
       on ( join_conditions )] [...]
 [WHERE search_conditions]
 [GROUP BY aggregate_free_expression]
 [HAVING search_condition]
 [ORDER BY order_expression [ASC | DESC]]
 [ compute row_aggregate(column_name) [, ... ]
      [ by column_name [ , column_name ] ... ] ]

To return all the Utah authors' first and last names from the authors table and rename the column heading in your result, you would execute the following:

SELECT au_lname AS 'First', au_fname AS 'Last' 
   FROM authors
   WHERE STATE = 'UT'

By default, SELECT returns all the rows that meet the search conditions. If you specify SELECT DISTINCT, any duplicate rows will be removed. Be careful not to use DISTINCT unnecessarily because it can slow query performance having to put the resultset into a work table to remove duplicates before returning the final resultset.

The `WHERE` Clause

The columns that you base your search condition on do not have to be returned in the resultset. You can filter rows in several ways with the WHERE clause. The following expressions are available for the WHERE clause:

Operators: =, <> (not equals), <, >, >=, and >=.

BETWEEN expression1 AND expression2. Between is inclusive.

IN(element1, element2, …). Returns all rows whose values are equal to the elements specified in the list.

LIKE string_expression. Used for pattern matching. Table 26.2 lists the available wildcard characters.

Table 26.2. Wildcards and `LIKE`
Wildcard	Meaning
`%`	Any number of characters
`_`	Any single character
`[]`	Any character listed in the bracket

Logical OR and AND are used to connect multiple search arguments. AND takes precedence over OR, so use parentheses to provide the proper logical grouping. For example, you might want to write a query to return all the business books for which the price is less than $10 or ytd_sales greater than $10,000. If you don't use parentheses, notice what happens when you run the following query:

SELECT substring(title, 1, 30) as title, type, price, ytd_sales 
   from titles
   where type = 'business'
     and price < $10 or ytd_sales > 10000
go

title                             type          price     ytd_sales
--------------------------------- ------------- --------- ---------
You Can Combat Computer Stress    business      2.9900    18722
The Gourmet Microwave             mod_cook      2.9900    22246
Fifty Years in Buckingham Pala    trad_cook     11.9500   15096

You didn't get what you wanted. You only got business books with a price less than $20 and all books with ytd_sales greater than $10,000. Using parentheses, you get the result you want:

SELECT substring(title, 1, 30) as title, type, price, ytd_sales 
   from titles
   where type = 'business'
     and (price < $10 or ytd_sales > 10000)
go

title                           type        price    ytd_sales
------------------------------- ----------- -------- ----------
You Can Combat Computer Stress  business    2.9900   18722

The `ORDER BY` Clause

The ORDER BY clause sorts the resultset by the specified column or columns. Ascending sorting is the default, but you can use ORDER BY column_name DESC to specify descending ordering. You should always specify ORDER BY if you require a certain order for your results. No specific ordering of resultsets are guaranteed otherwise, even when using DISTINCT or GROUP BY. The following is an example of a query using the ORDER BY clause to sort the resultset by au_lname:

SELECT au_lname, au_fname, state 
   FROM authors
   WHERE state IN('CA', 'KS')
     AND au_lname LIKE 'S%'
   ORDER BY au_lname

As an alternative to specifying the column expression in the ORDER BY clause, you can specify the positional number of the expression in the select list or the column heading:

SELECT au_lname AS 'Last', au_fname As 'First', state 
   FROM authors
   WHERE state IN('CA', 'KS')
     AND au_lname LIKE 'S%'
   ORDER BY 1, First DESC

The `TOP` Keyword

You can use the TOP keyword to restrict the number of rows returned. You can specify a fixed number of rows or a percentage of the number of rows in the resultset. The query shown in Listing 26.1 uses TOP along with ORDER BY to retrieve the title and price for the five most expensive books.

Listing 26.1 Using `TOP` to Restrict the Number of Rows Returned

SELECT TOP 5 price, title
   FROM titles
   ORDER BY price DESC
go

price                 title
--------------------- --------------------------------------
22.9500               But Is It User Friendly?
21.5900               Computer Phobic AND Non-Phobic Indi...
20.9500               Onions, Leeks, and Garlic: Cooking ...
20.0000               Secrets of Silicon Valley
19.9900               The Busy Executive's Database Guide

NOTE

Be aware that the TOP keyword will not speed up a query if the query also contains an ORDER BY clause. This is because the entire resultset is selected into a work table and sorted before the top N rows in the ordered resultset are returned.

When using the TOP keyword, you can add the WITH TIES option to specify that additional rows should be returned from the resultset if duplicates are on the last values returned that are specified in an ORDER BY clause. The WITH TIES option can only be specified if an ORDER BY clause is specified. In Listing 26.2, you add WITH TIES so that all books that match the last price of the top five are returned.

Listing 26.2 Using `TOP WITH TIES`

SELECT TOP 5 WITH TIES price, title
   FROM titles
   ORDER BY price DESC
go

price         title
------------- ---------------------------------------------------------------
22.9500       But Is It User Friendly?
21.5900       Computer Phobic AND Non-Phobic Individuals: Behavior Variations
20.9500       Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean
20.0000       Secrets of Silicon Valley
19.9900       Prolonged Data Deprivation: Four Case Studies
19.9900       Silicon Valley Gastronomic Treats
19.9900       The Busy Executive's Database Guide
19.9900       Straight Talk About Computers

If you don't use ORDER BY with TOP, the rows that will be returned cannot be predicted?it is based on the execution plan chosen by the optimizer. You can also specify TOP n PERCENT to restrict the number of rows based on a percentage of the number of rows in the final resultset instead of specifying an absolute value.

NOTE

The TOP keyword was introduced in version 7.0. To get similar functionality, SET ROWCOUNT n was often used in previous versions. The optimizer recognizes when TOP is used in a query and may choose to optimize the query differently. This can result in better query performance than when using ROWCOUNT. Another difference between the two is that SET ROWCOUNT n specifies the number of rows to be affected by all commands. The TOP keyword applies only to the SELECT statement in which it is specified.

`SELECT INTO`

You can create a table and store the resultset into it by using SELECT column(s) INTO table_name .... The table specified will be created with the same columns and datatypes as columns in the resultset. Most SELECT options (aggregates, GROUP BY, ORDER BY) are allowed except for the COMPUTE clause. If a column is a calculated expression such as an aggregate function or string concatenation, you will need to provide a column heading that can be used as a column name. (The column heading will have to conform to the rules for SQL Server identifiers.)

If you precede the table name with one or two hash signs (#), a temporary table is created in tempdb. If you want to create a permanent table with SELECT...INTO, you must have create table permission in the destination database, and the database option select into/bulkcopy must be set to TRUE.

The following example selects the type column and the average price from the titles table into a temporary table with the resultsets ordered in descending order by average price:

select type, avg(price) AS avg_price 
   into #type_avg_prices
   from titles
   group by type
   order by avg_price DESC

`UNION`

When you use the UNION keyword, a logical union between two or more resultsets is returned. Each select list must have the same number of columns, and the corresponding columns must be of the same datatype or must allow implicit datatype conversions. The column headings specified in the first select list will be the column headings used for the entire resultset.

This query returns the city and state of each author and publisher as a single resultset:

SELECT city, state FROM authors 
UNION ALL
SELECT city, state FROM publishers

By default, SQL Server removes all the duplicate rows in a UNION. You can add the keyword ALL if you do not want the duplicates to be removed. Using the ALL keyword will speed up the return of the resultset because a work table is not required to remove the duplicate rows?the results are simply appended to the previous resultset.

To sort the resultset for a UNION, specify an ORDER BY clause in the last select statement in the union. The column names specified in the ORDER BY clause must match those in the first select statement in the union, or you can specify the column number.

`GROUP BY` and `HAVING`

GROUP BY and HAVING are used with aggregate functions (which are described in the section "SQL Server Functions," later in this chapter). GROUP BY allows you to calculate aggregates for groups within your tables. The following example calculates the average price for each book category in the titles table:

SELECT type, AVG(price) 
   FROM titles
   GROUP BY type
go

type
---------------- -----------
business         13.7300
mod_cook         11.4900
popular_comp     21.4750
psychology       13.5040
trad_cook        15.9633
UNDECIDED        NULL

If a WHERE clause is used, it is applied before the grouping takes place. The following query calculates the average price per book category for books published by the publisher with the ID 1389:

SELECT type, AVG(price) 
   FROM titles
   WHERE pub_id = 1389
   GROUP BY type
go

type
---------------- -----------
business         17.3100
popular_comp     21.4750

The HAVING lets you restrict the number of rows returned by filtering on the aggregate values calculated. The clause is applied after the grouping is applied and the aggregate values are determined. Perhaps you want to return the average price for book categories, but only the categories with an average that is higher than $14:

SELECT type, AVG(price) 
   FROM titles
   GROUP BY type
   HAVING AVG(price) > $14
go

type
---------------- -----------
popular_comp     21.4750
trad_cook        15.9633

`CUBE`, `ROLLUP`, and `GROUPING`

CUBE, ROLLUP, and GROUPING are used in conjunction with GROUP BY.

`ROLLUP`

The ROLLUP operator provides aggregates and super-aggregates for elements within a GROUP BY statement. The ROLLUP operator can be used to extract running aggregates as well as cumulative aggregates within a resultset. The ROLLUP operator creates groupings from right to left, along the list of columns in the GROUP BY clause, applying the aggregate function to each grouping superset. A rollup value for a level displays NULL for that column, as shown in Listing 26.3.

Listing 26.3 Using `ROLLUP` to Calculate Super-Aggregates

select pub_id, type, t.title_id, sum(qty) as total_qty
   from titles t, sales s
   where t.title_id = s.title_id
     and type in ('business', 'popular_comp')
   group by pub_id, type, t.title_id
   with rollup
go

pub_id type         title_id total_qty
------ ------------ -------- -----------
0736   business     BU2075   35
0736   business     NULL     35
0736   NULL         NULL     35
1389   business     BU1032   15
1389   business     BU1111   25
1389   business     BU7832   15
1389   business     NULL     55
1389   popular_comp PC1035   30
1389   popular_comp PC8888   50
1389   popular_comp NULL     80
1389   NULL         NULL     135
NULL   NULL         NULL     170

Notice that you first get the aggregates rolled up by pub_id, type, and title_id, then by pub_id and type, then by pub_id, and finally a rollup for the entire resultset.

`CUBE`

The CUBE operator is used with GROUP BY to generate a cross-referenced superset of groups to generate aggregates and super-aggregates. Unlike ROLLUP, which rolls up super-aggregate values from right to left, CUBE rolls up super-aggregates for every possible combination of the columns or expressions in the GROUP BY clause. The number of non-aggregate columns in the select list determines the number of groups in the resultset. If n columns or expressions exist, 2 (n) - 1 possible super-aggregate combinations will be present.

If you add CUBE to the query that returns the average price for book categories, you get an extra row with the average price of all books, as shown in Listing 26.4.

Listing 26.4 Using `CUBE` to Calculate Super-Aggregates

SELECT type, AVG(price) AS average
   FROM titles
   GROUP BY type
   WITH CUBE
go

type         average
------------ ---------
business     13.7300
mod_cook     11.4900
popular_comp 21.4750
psychology   13.5040
trad_cook    15.9633
UNDECIDED    NULL
NULL         14.7662

The book type is returned as NULL for the extra row.

CUBE is more useful if you group over several columns. In Listing 26.5, you want to return the average price grouped by book type and publisher.

Listing 26.5 Grouping Over Several Columns

SELECT type, pub_id, AVG(price) AS average
   FROM titles
   GROUP BY type, pub_id
go

type         pub_id average
------------ ------ --------
business     0736   2.9900
psychology   0736   11.4825
mod_cook     0877   11.4900
psychology   0877   21.5900
trad_cook    0877   15.9633
UNDECIDED    0877   NULL
business     1389   17.3100
popular_comp 1389   21.4750

In Listing 26.6, you add WITH CUBE, which provides the total average, the average for each book type, and the average for each publisher.

Listing 26.6 Grouping Over Several Columns with `CUBE`

SELECT type, pub_id, AVG(price) AS average
   FROM titles
   GROUP BY type, pub_id
   WITH CUBE
go

type         pub_id    average
------------ --------- --------
business     0736      2.9900
business     1389      17.3100
business     NULL      13.7300
mod_cook     0877      11.4900
mod_cook     NULL      11.4900
popular_comp 1389      21.4750
popular_comp NULL      21.4750
psychology   0736      11.4825
psychology   0877      21.5900
psychology   NULL      13.5040
trad_cook    0877      15.9633
trad_cook    NULL      15.9633
UNDECIDED    0877      NULL
UNDECIDED    NULL      NULL
NULL         NULL      14.7662
NULL         0736      9.7840
NULL         0877      15.4100
NULL         1389      18.9760

The `GROUPING` Function

When working with the CUBE or ROLLUP operator, you can use the GROUPING function to distinguish between real null values and null values that represent a rollup of all values for a column in the resultset.

The GROUPING function returns 1 when the value is grouped, and 0 when the column contains a null value.

In Listing 26.7, you use the GROUPING function to replace null values for the rolled up columns with ALL.

Listing 26.7 Using the `GROUPING` Function

SELECT CASE when GROUPING(type) = 1 then 'ALL'
            else isnull(type, 'Other')
            END AS type,
       CASE when (grouping(pub_id) = 1) then 'ALL'
            else isnull(pub_id, 'Unknown')
            END as pub_id,
       AVG(price) AS average
   FROM titles
   GROUP BY type, pub_id
   WITH CUBE
go

type         pub_id average
------------ ------ ------------
business     0736   2.9900
business     1389   17.3100
business     ALL    13.7300
mod_cook     0877   11.4900
mod_cook     ALL    11.4900
popular_comp 1389   21.4750
popular_comp ALL    21.4750
psychology   0736   11.4825
psychology   0877   21.5900
psychology   ALL    13.5040
trad_cook    0877   15.9633
trad_cook    ALL    15.9633
UNDECIDED    0877   NULL
UNDECIDED    ALL    NULL
ALL          ALL    14.7662
ALL          0736   9.7840
ALL          0877   15.4100
ALL          1389   18.9760

You can also use the GROUPING function to order the resultsets to move all the rollups toward the bottom, as shown in Listing 26.8.

Listing 26.8 Using the `GROUPING` Function to Order the Resultset

SELECT CASE when GROUPING(type) = 1 then 'ALL'
            else isnull(type, 'Unknown')
            END AS type,
       CASE when (grouping(pub_id) = 1) then 'ALL'
            else isnull(pub_id, 'Unknown')
            END as pub_id,
       AVG(price) AS average
   FROM titles
   GROUP BY type, pub_id
   WITH CUBE
   ORDER by GROUPING(type), GROUPING(pub_id)
go

type         pub_id average
------------ ------ ----------
business     0736   2.9900
business     1389   17.3100
mod_cook     0877   11.4900
popular_comp 1389   21.4750
psychology   0736   11.4825
psychology   0877   21.5900
trad_cook    0877   15.9633
UNDECIDED    0877   NULL
UNDECIDED    ALL    NULL
trad_cook    ALL    15.9633
psychology   ALL    13.5040
popular_comp ALL    21.4750
mod_cook     ALL    11.4900
business     ALL    13.7300
ALL          0736   9.7840
ALL          0877   15.4100
ALL          1389   18.9760
ALL          ALL    14.7662

Joining Tables

Transact-SQL allows you to correlate data between two or more tables by performing a join. Generally, you connect the tables using a common column, which is most often a column for which a foreign key and primary key relationship has been specified.

You can specify a join in two ways. First, you can specify the join condition in the WHERE clause. This is an older way of specifying a join, but it is still supported. Those of you who have been using SQL for a while are probably more familiar with this method.

You can also specify the join condition in the FROM clause. This method complies with the ANSI-92 standard.

NOTE

The ANSI-92 join syntax (or ANSI join for short) was introduced in version 6.5 and is now the preferred method of expressing joins.

One advantage with the ANSI join syntax is that the actual join operation performed is easier to identify because it is explicitly stated in the FROM clause. However, it can be harder to identify the list of tables involved in the query because you now have to wade through all the join verbiage to identify the tables involved. The old style syntax specified a more easily readable list of tables in the FROM clause.

The ANSI JOIN syntax has a few advantages over the old syntax:

The new join syntax supports double outer joins, which the old style syntax does not.
The new style syntax allows you to specify optimizer hints for how to join the tables in a query, which you cannot provide using the old style syntax.
It is harder to create a Cartesian product by leaving out a join condition because the lack of an ON statement in the JOIN clause leads to a syntax error.
Microsoft has stated that the old style join syntax might not be supported in future releases. Because Microsoft encourages use of the new join syntax, you can probably safely assume that Microsoft will be more willing to fix problems regarding ANSI joins than those of the older T-SQL join syntax.

The following example shows both ways of expressing a join. Both statements return the same resultset:

SELECT title, qty 
   FROM titles t, sales s
   WHERE t.title_id = s.title_id

SELECT title, qty
   FROM titles t INNER JOIN sales s ON t.title_id = s.title_id

A table alias was also introduced in the example. The titles table was aliased to the name t and sales to s. Aliasing is useful when you have to refer to a table in several places in the query; you don't have to type the entire table name each time. Although defined in the FROM clause, a table alias can be used anywhere in the query that you have to qualify the column name.

The different types of joins are INNER, OUTER, and CROSS. An INNER join is based on equality between the column values. Only rows with matching values on the join columns between the two tables will be returned in the resultset. The following example lists all authors and the average royalty paid to them for all matching records between authors and titleauthor:

select au_lname, au_fname, avg(royaltyper) AS avg_royalty 
   from authors a join titleauthor ta on a.au_id = ta.au_id
   group by au_lname, au_fname
   order by 3
go

au_lname                                 au_fname             avg_royalty
---------------------------------------- -------------------- -----------
Gringlesby                               Burt                 30
O'Leary                                  Michael              35
Ringer                                   Anne                 37
Yokomoto                                 Akiko                40
MacFeather                               Stearns              42
Hunter                                   Sheryl               50
Dull                                     Ann                  50
Bennet                                   Abraham              60
Green                                    Marjorie             70
DeFrance                                 Michel               75
Karsen                                   Livia                75
Ringer                                   Albert               75
Panteley                                 Sylvia               100
Locksley                                 Charlene             100
Carson                                   Cheryl               100
Blotchet-Halls                           Reginald             100
del Castillo                             Innes                100
Straight                                 Dean                 100
White                                    Johnson              100

The OUTER join returns all the rows from the specified outer table (specified with LEFT OUTER, RIGHT OUTER, or FULL OUTER), even if the other table has no match. Rows returned from the outer table that have no corresponding match in the inner table will display the NULL symbol for any columns retrieved from the inner table. For example, you might want to display the names of all authors along with the average royalty paid if available:

select au_lname, au_fname, avg(royaltyper) 
   from authors a left outer join titleauthor ta on a.au_id = ta.au_id
   group by au_lname, au_fname
   order by 3
go

au_lname                                 au_fname             avg_royalty 
---------------------------------------- -------------------- -----------
Greene                                   Morningstar          NULL
Greenfield                               Tom                  NULL
McBadden                                 Heather              NULL
Smith                                    Meander              NULL
Stringer                                 Dirk                 NULL
Gringlesby                               Burt                 30
O'Leary                                  Michael              35
Ringer                                   Anne                 37
Yokomoto                                 Akiko                40
MacFeather                               Stearns              42
Hunter                                   Sheryl               50
Dull                                     Ann                  50
Bennet                                   Abraham              60
Green                                    Marjorie             70
DeFrance                                 Michel               75
Karsen                                   Livia                75
Ringer                                   Albert               75
Panteley                                 Sylvia               100
White                                    Johnson              100
Straight                                 Dean                 100
Locksley                                 Charlene             100
Carson                                   Cheryl               100
Blotchet-Halls                           Reginald             100
del Castillo                             Innes                100

Note that no real difference exists between a left or right outer join except for specifying which table on which side of the join condition is to be the controlling, or outer table. For example, the previous query would provide the same result if you reversed tables in the join clause and made it a right outer join:

select au_lname, au_fname, avg(royaltyper) 
   from titleauthor ta right outer join authors a on ta.au_id = a.au_id
   group by au_lname, au_fname
   order by 3

A full outer join will return all matching rows from both tables along with all rows from each table without a corresponding match in the other table.

A CROSS join returns all possible combinations of rows, also called a Cartesian product. Essentially, you are joining each row from one table with all the rows from another table. The total number of rows returned is the product of the number of rows in each table. Be careful using cross joins because the resultsets can become quite large, even for relatively small tables. For example, a cross join between a table with 1,000 rows and a table with 5,000 rows would return 1,000 * 5,000, or 5,000,000 rows.

With the ANSI syntax, you specify the join type explicitly in the FROM clause, but the join type in the older join syntax is specified in the WHERE clause. The old style syntax allowed you to specify only a left outer join or a right outer join using the *= or =* operators, respectively. For example, the preceding outer join would be written as follows using the old style syntax:

select au_lname, au_fname, avg(royaltyper) 
   from authors a, titleauthor ta
   where a.au_id *= ta.au_id
   group by au_lname, au_fname
   order by 3

Subqueries

A subquery is essentially a query contained in another query. You can use a subquery in place of an expression. Depending on the context of the subquery, there might be restrictions on the data a subquery can return. Subqueries can only return a single column of data, and in some cases, they can only return a single row.

If the subquery returns only one row and one column, it can be used in place of any expression. This example returns all books published by Binnet & Hardley:

SELECT title FROM titles 
   WHERE pub_id =
        (SELECT pub_id FROM publishers
            WHERE pub_name = 'Binnet & Hardley')

An error message is returned if the subquery would have returned several rows.

A subquery must always appear in parentheses.

NOTE

You will often find that you can achieve the same result with a subquery or a join. A join is often more efficient than a subquery (with the exception of when you want to remove duplicates, in which a subquery with NOT EXISTS is more efficient).

You can use a subquery that returns one column and several rows with the IN predicate. The following example returns all publishers of business books:

SELECT pub_name FROM publishers 
   WHERE pub_id IN
        (SELECT pub_id FROM titles
            WHERE type = 'business')

You can also use a subquery that returns several rows with the EXISTS keyword. The following example returns the same resultset as the preceding example:

SELECT pub_name FROM publishers p 
   WHERE EXISTS
        (SELECT * FROM titles t
            WHERE p.pub_id = t.pub_id
              AND type = 'business')

Although the subquery specifies the * in the SELECT list, the subquery actually returns no column data. When the EXISTS keyword is used, the subquery returns a true or false condition depending on whether a matching record was found.

This type of subquery is a correlated subquery. The inner query refers to a table in the outer query in the WHERE clause of the subquery (WHERE p.pub_id = t.pub_id). SQL Server executes the inner query for each row in the outer query, testing for a match on pub_id. After a match is found, SQL Server stops looking for additional matching rows and returns a true condition, adding the outer row to the resultset.

Although this query could be represented as well with an IN predicate, you would have to use a correlated subquery like this when checking for the existence of matching rows between two tables where the common key is made up of more than one column. The IN predicate can only match against a single column in the subquery.

A subquery can also be used in a column expression in query. For example, the following query substitutes the average price in place of price for any rows where price is null:

select title, type, isnull(price, (select avg(price) from titles)) 
   from titles

Subqueries Versus Joins

SQL Server treats the subquery with the IN predicate or an EXISTS subquery in much the same way as a join, with one major exception. For example, following is a query using a join condition to return a resultset:

select pub_name 
   from publishers p, titles t
   where p.pub_id = t.pub_id
     and type = "business"
go

pub_name
----------------------------------------
New Age Books
Algodata Infosystems
Algodata Infosystems
Algodata Infosystems

Notice that duplicate rows are in the resultset. This is because the nature of a join is to find all matching rows. To remove the duplicates, you would need to specify the DISTINCT clause, which adds overhead of putting the rows into a work table to sort and remove the duplicate rows. With a large resultset, this could be expensive to do. The second method is to use the exists keyword to find only rows matching the condition, and to stop looking for results after the first match is found for a particular value. Following is the same query with an exists clause:

select pub_name 
from publishers p
where exists
    (select *
     from titles t
     where p.pub_id = t.pub_id
     and type = "business")
go

pub_name
----------------------------------------
New Age Books
Algodata Infosystems

TIP

Always consider using exists rather than joining tables and using distinct. The performance benefits can be dramatic.

However, if you must include columns from both tables in the resultset, the query will have to be written as a join.

`not exists` and `not in`

Certain results can be defined only with a subquery; for example, non-membership and non-existence can be expressed only in that way. For example, which publishers do not publish business books? This query cannot be expressed with a join, but it can be expressed using not exists or not in:

select pub_name 
   from publishers p
   where not exists
               (select *
                   from titles t
                   where t.pub_id = p.pub_id
                     and type = 'business')

select pub_name
   from publishers
   where pub_id not in
               (select pub_id
                   from titles
                   where type = 'business')

TIP

In most cases, not in and not exists are identical in their behavior. However, by using not exists, you can compare tables in which the join condition consists of multiple columns. You can't use not in to compare more than a single column.

Subqueries with Aggregates in `where` Clauses

One specific case in which a subquery is required appears when you want to use an aggregate function in a where clause. The next example returns the type and price of all books whose price is below the average price. This condition in a where clause would be illegal:

where price < avg(price)

The server requires that the average price be derived in a subquery, as follows:

select type, price 
   from titles
   where price <
         (select avg(price)
             from titles)

Subqueries as Derived Tables

SQL Server provides the ability to use derived tables in your queries. A derived table is a subquery contained in a FROM clause that can be referred to by an alias and used as a table in the query. A derived table can be thought of as sort of a dynamic view that exists only for the duration of the query. Derived tables are handy if you don't need to use a resultset in a temp table more than once.

One benefit of using a subquery as a derived table is that unlike a normal subquery, the derived table can return multiple columns that can be referenced by the outer query like normal table columns in the select list, where clause, and so on. The subquery can contain aggregates, group by, UNION, and so on if needed.

For example, perhaps you need to write a query to return the average, by publisher, of the largest sale for each book. SQL Server doesn't allow an aggregate within an aggregate, so a query like the following wouldn't work:

select pub_name, sum(max(qty)) as top_sales 
   from sales s
   join titles t on s.title_id = t.title_id
   join publishers p on t.pub_id = p.pub_id
   group by title_id, pub_name

You could break it up into two queries, putting intermediate results of the max(qty) from the titles and sales table into a temporary table, and then run a query against the temp table to calculate the avg of the max(qty). You could also write a more elegant solution using a derived table:

select pub_name, avg(top_sales) as avg_top_sales 
   from publishers p
   join titles t on p.pub_id = t.pub_id
   join (select title_id, max(qty) as top_sales
            from sales
            group by title_id) as tot_sales
     on t.title_id = tot_sales.title_id
   group by pub_name

Adding Rows with `INSERT`

You use the INSERT statement to add rows to a table. The following example adds one row to the authors table:

INSERT authors (au_id, au_lname, au_fname, phone, contract) 
   VALUES('123-65-7635', 'Johnson', 'Lisa', '408 342 7845', 1)

The number of values in the VALUES list must match the number in the column list. You can omit the column list only if you are providing values for all columns in the table, except for any identity column that might exist on the table. It is strongly recommended that you always provide a column list in your insert statement so that the mapping of values to columns is explicitly defined. Without a column list, your INSERT statement depends on the column order not changing or no columns being added or removed form the table. If the number of columns in the table no longer matches the number of values specified, the INSERT will fail with the following error message:

Server: Msg 213, Level 16, State 4, Line 1 
Insert Error: Column name or number of supplied values does not match table
definition.

If you want to omit table columns in your insert statement, you must provide a column list. In addition, you can only omit column(s) from the insert statement if the columns allow NULL, have a default value associated with them, are of the timestamp datatype, or have the identity property defined for them.

You cannot insert more than one row at a time using the VALUES clause. If you need to add multiple rows, you will need to specify a separate insert statement for each row. The only time an insert statement can insert more than one row is when the INSERT uses a SELECT statement in place of the values clause. The following query creates a table called authors_archive and inserts all authors from California into it:

if object_id('authors_archive') is null  -- create the archive table 
    create table authors_archive
                         (au_id id NOT NULL ,
                          au_lname varchar 40) NOT NULL ,
                          au_fname varchar(20) NOT NULL ,
                          phone char(12) NOT NULL,
                          address varchar(40) NULL ,
                          city varchar(20) NULL ,
                          state char(2) NULL ,
                          zip char(5) NULL)
go

INSERT authors_archive
   (au_id, au_lname, au_fname, phone, city, state, zip)
   SELECT au_id, au_lname, au_fname, phone, city, state, zip
      FROM authors
      WHERE state = 'CA'
go

Another useful feature is the ability to insert data from the resultset(s) of a stored procedure, user-defined function, or dynamic query. This can be done as long as the stored procedure, user-defined function, or dynamic query returns one or more resultsets that are compatible with the destination table structure. To be compatible, the resultset(s) must contain the same number of columns as the destination table, and the datatypes of the resultset columns must be compatible with the corresponding datatype columns.

Listing 26.9 creates a table to hold information from DBCC SQLPERF(logspace) and inserts the resultset returned by that command into the table.

Listing 26.9 Using `INSERT` with the Resultset from a Dynamic Query

CREATE TABLE log_space
(cap_date DATETIME DEFAULT GETDATE(),
 db sysname,
 log_size FLOAT,
 space_used FLOAT,
 status BIT)

INSERT log_space(db, log_size, space_used, status)
   EXEC ('DBCC SQLPERF(logspace)')

select * from log_space
go

cap_date                 db        log_size  space_used         status
------------------------ --------- --------- ------------------ ------
2002-08-24 18:13:37.010  master    3.3671875 29.698375701904297 0
2002-08-24 18:13:37.010  tempdb    0.4921875 39.186508178710938 0
2002-08-24 18:13:37.010  model     0.4921875 48.809524536132813 0
2002-08-24 18:13:37.010  msdb      2.2421875 32.186412811279297 0
2002-08-24 18:13:37.010  pubs      0.7421875 62.105262756347656 0
2002-08-24 18:13:37.010  Northwind 0.9921875 48.868110656738281 0
2002-08-24 18:13:37.010  bigpubs   0.484375  43.850807189941406 0

Modifying Rows with `UPDATE`

The update statement is straightforward. You specify the table to be updated, the columns to modify, the new values, and the rows to be updated. The following statement changes the royalty to 15% and price to $25 for a book with a title_id of 'BU1032':

UPDATE titles 
   SET royalty = 15, price = $25
   WHERE title_id = 'BU1032'

If you omit the WHERE clause, all rows will be updated.

You can use a join in an UPDATE statement (this is a T-SQL extension) to qualify the rows in a table to be modified:

/* increase prices by 5 times 
** where the author's last name is Smythe */
update titles
   set price = price * 5
   from authors a
   join titleauthor ta on (a.au_id = ta.au_id )
   join  titles t  on (t.title_id = ta.title_id )
   where au_lname = 'White'

Notice that the titles table is specified twice in the update statement, both in the UPDATE and FROM clauses. The table name is required in the UPDATE statement to specify which table is to be modified, and again in the FROM clause to specify the join.

A subquery can provide a similar result as a join:

update titles 
   set advance = advance * 1.5
        where pub_id in (select pub_id from publishers
                            where state = 'CA')

You can also use a correlated subquery in an update. The following type of query is common for generating/refreshing a rollup quantity in a separate table:

update titles 
   set ytd_sales = (select sum(qty)
                     from sales s
                     where titles.title_id = s.title_id
                     and datediff(yy, ord_date, getdate()) = 0)

Removing Rows with `DELETE`

To remove rows from a table, use the DELETE statement. To remove the rows with an author last name of "Smith", type the following:

DELETE authors WHERE au_lname = 'Smith'

If you omit the WHERE clause, all rows are removed.

If you really want to remove all rows, it is much more efficient to use the TRUNCATE TABLE statement, which does not log each deleted row to the transaction log.

T-SQL also allows for a join to be used in a delete statement to qualify the rows in a table to be deleted:

/* remove titleauthor where author last name 
** is 'White' */

delete titleauthor
   from titleauthor t join authors a on ( a.au_id = t.au_id)
   where au_lname = 'White'

A subquery will work similar to a join:

/* delete titleauthor records for authors with last name "Ringer" */ 
delete titleauthor
   where au_id  in (select au_id from authors where au_lname = 'Ringer')