In the relational databases the SELECT statement selects values in the columns, literal values, or expressions. The returned values themselves could be of any valid data types. These values can be displayed in the client application, or written into a file, used in the intermediate calculations, or entered into database tables.
Prior in this chapter, we've mentioned that the FROM clause of the SELECT statement is mandatory, whereas all other clauses are optional. This still holds true for the SQL99 standard, though the rules are somewhat more relaxed with the vendor-specific implementations.
You can select as many or as few columns as you wish from a table (or a view) for which you have SELECT privileges (see Chapter 12 on SQL security). The following example selects only a customer name (column CUST_NAME_S from the table CUSTOMER in the ACME database).
SELECT cust_name_s FROM customer CUST_NAME_S -------------------------------------------------- DLH INDUSTRIES FAIR AND SONS AIR CONDTNG KILBURN GLASS INDUSTRIES BOSWELL DESIGNS CORP. WILE ELECTROMATIC INC. FABRITEK INC. ... DALCOMP INC. INTEGRATED POWER DESIGNS GUARDIAN MANUFACTURING INC. WILE BESS COMPANY
The return result is a set of the CUST_NAME_S column values from all rows in the CUSTOMER table. The values in the set are not ordered in any way; it is possible to order the returned values though; see later in the chapter for information on ordering data in the resultset (section on GROUP BY clause).
The syntax is identical across all three RDBMS products as well as for the SQL99 standard.
A single column SELECT, while useful, is certainly not the limit of the SQL capabilities. It's very likely that you'll be selecting more than one column at a time in your queries.
The following query selects three columns at the same time from the same CUSTOMER table in the ACME database:
SELECT cust_id_n, cust_status_s, cust_name_s FROM customer CUST_ID_N C CUST_NAME_S ---------- - ----------------------------------------------- 51 Y DLH INDUSTRIES 5 Y FAIR AND SONS AIR CONDTNG 12 Y KILBURN GLASS INDUSTRIES 61 Y BOSWELL DESIGNS CORP. 55 Y WILE ELECTROMATIC INC. 6 Y FABRITEK INC. ... 16 Y DALCOMP INC. 89 Y INTEGRATED POWER DESIGNS 85 Y GUARDIAN MANUFACTURING INC. 152 Y WILE BESS COMPANY
As in the case with a single column SELECT, the result returned is a set of values, with a distinction that it is rather a set of sets — one set for each column mentioned in the SELECT statement. The sequence in which these sets appear directly corresponds to the sequence in which the column names were mentioned in the SELECT clause.
It is possible to select a column more than once within a single query. The result will simply be duplicate sets of values.
The syntax is identical across all three RDBMS products as well as for the SQL99 standard.
Selecting all columns in the table could be achieved by listing every single column from the table in the SELECT clause of the query, or using the convenient shortcut — asterisk (*) — provided by the SQL99 standard and implemented by virtually every RDBMS on the planet.
SELECT * FROM status STATUS_ID_N ST STATUS_DESC_S ----------- -- ------------------------------ 6 60 SHIPPED 2 20 COMPLETE 8 70 INVOICED 9 80 CANCELLED
As with any multirow query, the returned result comprises the sets of the selected values of all rows for each column. The sequence in which columns appear in the resultset is exactly the same as in the underlying table (view).
In a relatively rare case where you need to select all the columns from the table and a duplicate of a column(s), you may do so in all three RDBMS, but this is where vendor's implementations differ from the SQL99 standard as well as from each other.
Oracle and DB2 require the asterisk to be prefixed by a table name or a table alias (the order in which asterisk is listed in relation to other columns is not important):
SQL> SELECT status.*, status_desc_s 2 FROM status; STATUS_ID_N ST STATUS_DESC_S STATUS_DESC_S ----------- -- ------------------------------ ------------- 6 60 SHIPPED SHIPPED 2 20 COMPLETE COMPLETE 8 70 INVOICED INVOICED 9 80 CANCELLED CANCELLED 4 rows selected.
This full name qualification is not required in MS SQL Server:
1> SELECT *, status_desc_s FROM status 2> GO STATUS_ID_N STATUS_CODE_S STATUS_DESC_S STATUS_DESC_S ----------- ------------- --------------- -------------- 2 20 COMPLETE COMPLETE 6 60 SHIPPED SHIPPED 8 70 INVOICED INVOICED 9 80 CANCELLED CANCELLED (4 rows affected)
This may seem a superfluous feature, but imagine a situation when you need to see all 200 rows in the table and wish to change the default sequence in which the columns appear in the final resultset. You have an option to list all the columns in the desired order (i.e., typing in all the columns in the SELECT clause of your query) or to do it the easier way at the expense of having a duplicate set of values — but in the place where you would rather see it. Of course, you may combine all/any rows in the table into a single resultset more than once.
As you become more selective in terms of what data is expected to be returned by a query, the need may arise to eliminate duplicates. The SQL99 standard provides an easy and elegant way to eliminate any duplicate values from the final resultset.
The table PAYMENT_TERMS in the ACME database contains data about discounts, in terms of percentages, given to customers. While the particulars of each discount might differ, it is quite conceivable that the actual percentage might be the same.
The following example selects all the rows for the PAYTERMS_DISCPCT_N column from that table:
SELECT payterms_discpct_n FROM payment_terms PAYTERMS_DISCPCT_N ------------------ .00 .02 .02 .00 .00 .00 .00 .00 .00 .00 .02 (11 rows affected)
As you can see, there are quite a few duplicates among these 11 records. If the goal is to find out what are the percentage rates used within the company, the DISTINCT keyword might be used.
SELECT DISTINCT payterms_discpct_n FROM payment_terms PAYTERMS_DISCPCT_N ------------------ 0 .02 2 rows selected.
Now, all the duplicate values have been eliminated and the resultset contains only two distinct values.
It is important to understand that DISTINCT refers to the entire row, not just a single column; if DISTINCT precedes multiple columns the entire set — row values in all columns — needs to be distinct. Here is an example run in MS SQL Server 2000 (applicable to Oracle and DB2 UDB as well):
SELECT DISTINCT payterms_discpct_n, payterms_code_s FROM payment_terms PAYTERMS_DISCPCT_N PAYTERMS_CODE_S ------------------ --------------- .00 N10 .00 N120 .00 N15 .00 N20 .00 N30 .00 N45 .00 N60 .00 N90 .02 CADV .02 N21530 .02 N21531 (11 row(s) affected)
By adding the PAYTERMS_CODE_S column to PAYTERMS_DISCPCT_N, we've made the pair of the columns to be distinct. As a result, duplicate values in a single column are allowed as long as the pair of the values is distinct.
Columns are not the only things that you can select in the relational database world, and the SELECT statement does not always involve a table. Selecting columns from a table is a very straightforward concept, much more so than selecting expressions and literals.
When a value we are after does not exist up to the moment we call it, because it is returned by a function, or being calculated on the fly, you still need to SELECT from somewhere. This "somewhere" could be any table existing within the database and to which you have select privilege (see Chapter 12 for more information on the privileges), but it would be rather inconvenient, and hard to maintain, as the user must know what tables are present in the database, and change the query every time the table in use is dropped or renamed; besides, if you SELECT a literal or an expression from an actual table, the resultset will have as many rows as there are in the table.
Oracle supports the SQL99 standards to the extent of providing a special table, DUAL, when there is no physical object (view or table) to select from. Suppose you need to find out a system time of your Oracle database; the RDBMS provides you with a function SYSDATE (more about the functions in Chapter 10), which returns this information. While not being SQL99 standard compliant, this function is a legitimate Oracle 9i SQL dialect statement and must be treated as such. Since all queries must start with SELECT, you would start with it, but the information is not stored anywhere in the database — it is generated dynamically at the request's moment. That's where the table DUAL comes in handy — you can "select" virtually anything from this table.
SQL> SELECT SYSDATE FROM dual; SYSDATE --------- 21-OCT-03
The DUAL table itself has only one column (not without irony named DUMMY) and one row containing value X.
The DUAL table was introduced by Chuck Weiss of Oracle as an underlying object in the Oracle Data Dictionary. It was never meant to be seen by itself, but rather to be used in some complex JOIN operations (discussed Chapter 9); it logically fits the SELECT notion as it implies that the FROM clause is to be used with each statement.
For selecting noncolumn values, DB2 UDB 8.1 has essentially the same concept — table SYSDUMMY1, located in the SYSIBM system schema. This table has a single column called IBMREQD, and a single row containing value Y. The following example demonstrates using this table for calculating the sum of two numbers:
db2=> SELECT (5+5) FROM sysibm.sysdummy1 1 --------- 10 1 record(s) selected.
The sum of these numbers is not stored anywhere, but it has to be selected following the SQL99 standard rules. This becomes more important, when SQL functions (Chapter 10) are used in the procedural language (like PL/SQL or Transact-SQL).
Microsoft SQL Server 2000 actually allows you to forgo the FROM clause altogether in the situation like this. Here is an example:
SELECT (5+5) num_sum num_sum ----------- 10
The "dummy" tables, as well as ability to select from nothing, is not a part of the SQL standard, but rather manifests of vendors' ingenuity in customers' satisfaction quest.
In the most recent example we've used alias NUM_SUM as the name for the non-existent column. This is alias – a standard feature, supported by all three RDBMS vendors as well as mandated by the SQL standard. The idea behind alias is very simple — giving columns selected in the SQL query more descriptive names as they appear in the final result set. Every single SELECT example in this book could be rewritten with the column names substituted by aliases. For instance, the example from above that selects a distinct value of the PAYTERMS_DISCPCT_N, could be written as follows
SELECT DISTINCT payterms_discpct_n AS discount_percent FROM payment_terms discount_percent ------------------ 0 .02
The new name, while not being shorter, is significantly more descriptive for the user, just as the column name is more informative for the programmer (as it tells what data type the column contains — N — for NUMERIC). The use of the 'AS' operator is optional, and the syntax (either with or without 'AS') is valid in all three databases.
The aliasing becomes even more useful when literals and expressions are selected. An example of selecting an expression (5 + 5) was given previously in the chapter; here is the SELECT statement involving literals:
SELECT cust_name_s, 100 AS NUMERIC_CONSTANT, 'ABC' AS STRING_CONSTANT FROM customer CUST_NAME_S NUMERIC_CONSTANT STRING_CONSTANT ---------------------------- ---------------- --------------- WILE SEAL CORP. 100 ABC WILE ELECTRONICS INC. 100 ABC WILE ELECTROMUSICAL INC. 100 ABC WILE ELECTROMATIC INC. 100 ABC WILE BESS COMPANY 100 ABC MAGNETOMETRIC DEVICES INC. 100 ABC MAGNETICS USA INC. 100 ABC . . . . . . BURNETTE WILLIAM CORP. 100 ABC BOSWELL DESIGNS CORP. 100 ABC 37 record(s) selected.
In the default behaviour, if no alias is specified for the constructed column — be it a literal, expression, or a function — Oracle will use the literal itself or an expression in the place of the alias; IBM DB2 UDB will number the columns starting with 1; and Microsoft SQL Server simply leaves the name blank.
Literals can be a part of a "standard" SELECT query, where they are listed together with the column names, as in the example above.
Aliasing is very convenient when a resulting value combines different sources — a column, a function, an expression, or a literal. Here is an example in Oracle and DB2 UDB syntax that concatenates product ID (column PROD_ID_N) from the PRODUCT table of the ACME database with an empty space and product brand (column PROD_BRAND_N) into a single value:
SELECT CAST(prod_id_n AS CHAR(5)) || ' ' || prod_brand_s AS ID_AND_BRAND FROM product ID_AND_BRAND -------------------------- 990 SPRUCE LUMBER 1880 STEEL NAILS 2871 STOOL CAPS 3045 STOOL CAPS 4000 HAND RAILS ROUNDS 4055 GAZEBOS 4761 BAR RAILS 4906 BAR RAILS 4964 BASES 5786 CRATING MATERIAL 10 record(s) selected.
Concatenation operators and functions are covered in detail in Chapters 11 and 10, respectively.
To run this query in MS SQL Server, replace the concatenation (||) operator to a plus (+) operator.
See Chapter 11 for more about operators.
As a part of the SELECT statement, the functions or expressions could be used for insert or update purposes. Imagine that you would like to keep track of every insert and update with a date/time stamp, using a SELECT query as an input (see INSERT and UPDATE statements in Chapter 6); the following Oracle 9i syntax provides this functionality using SYSDATE and USER functions. (See Chapter 10 for equivalent DB2 UDB and Microsoft SQL Server 2000 syntax.)
SELECT salesman_code_s, salesman_name_s, sysdate, user FROM salesman; SA SALESMAN_NAME_S SYSDATE USER -- -------------------------------- --------- ------- 02 FAIRFIELD BUGS ASSOCIATION 15-OCT-03ACME 03 AMERICA GONZALES LIMITED 15-OCT-03 ACME 04 HUNTER COMPONENTS 15-OCT-03 ACME 07 ELMERSON INDUSTRIES INCORPORATED 15-OCT-03 ACME 09 SAM KRISTEL INC 15-OCT-03 ACME 10 HENERY INCORPORATED 15-OCT-03 ACME 6 rows selected.
The concept of a subquery is simple — it is a query within a query that supplies necessary values for the first query. A SELECT query could have an embedded subquery as a way to retrieve unknown values, and the nesting level (how many subqueries you could have within each other) is limited only by the RDBMS capability.
To illustrate the concept, some preliminary work is required (since the required table is not within the ACME database). Let's say that in full accordance with database design guidelines you have created a table that contains state tax amounts for each state, as well as each state's full name and two-letter postal abbreviation.
First you need to create a table:
CREATE TABLE sales_tax ( stax_id_n INTEGER NOT NULL, stax_amt_n DECIMAL(6,3), stax_state_s CHAR(2), CONSTRAINT pk_sales_tax PRIMARY KEY (stax_id_n ) )
The next step is to insert some meaningful data into your new table.
INSERT INTO sales_tax ( stax_id_n, stax_amt_n, stax_state_s ) VALUES (1, 8.5,'WA')
When you wish to produce a report that would contain a product ID and price, and the amount of tax applied to unit of the product in the state of Washington, the following query would be useful:
SELECT prod_num_s, prod_price_n, (SELECT stax_amt_n FROM sales_tax WHERE stax_state_s = 'WA') AS TAX_RATE, prod_price_n * (SELECT stax_amt_n FROM sales_tax WHERE stax_state_s = 'WA')/100 AS SALES_TAX FROM product prod_num_s prod_price_n TAX_RATE SALES_TAX ---------- ------------ -------- ----------------------- 990 18.32 8.500 1.557200000 1880 34.09 8.500 2.897650000 2871 26.92 8.500 2.288200000 3045 15.98 8.500 1.358300000 4000 11.84 8.500 1.006400000 4055 16.09 8.500 1.367650000 4761 23.20 8.500 1.972000000 4906 27.10 8.500 2.303500000 4964 23.20 8.500 1.972000000 5786 17.98 8.500 1.528300000 (10 row(s) affected)
This example uses the state abbreviation (WA) as a parameter to find the amount of the applicable state tax and calculates the necessary values on the fly using subquery to extract this value from the SALES_TAX table; note that the data retrieval is conducted for each and every row in the resulting set.