SELECT Clause: What Do We Select?

SELECT Clause: What Do We Select?

In the relational databases the SELECT statement selects values in the columns, literal values, or expressions. The returned values themselves could be of any valid data types. These values can be displayed in the client application, or written into a file, used in the intermediate calculations, or entered into database tables.

Prior in this chapter, we've mentioned that the FROM clause of the SELECT statement is mandatory, whereas all other clauses are optional. This still holds true for the SQL99 standard, though the rules are somewhat more relaxed with the vendor-specific implementations.

Single-column select

You can select as many or as few columns as you wish from a table (or a view) for which you have SELECT privileges (see Chapter 12 on SQL security). The following example selects only a customer name (column CUST_NAME_S from the table CUSTOMER in the ACME database).

SELECT cust_name_s
FROM   customer
CUST_NAME_S
--------------------------------------------------
DLH INDUSTRIES
FAIR AND SONS AIR CONDTNG
KILBURN GLASS INDUSTRIES
BOSWELL DESIGNS CORP.
WILE ELECTROMATIC INC.
FABRITEK INC.
...
DALCOMP INC.
INTEGRATED POWER DESIGNS
GUARDIAN MANUFACTURING INC.
WILE BESS COMPANY

The return result is a set of the CUST_NAME_S column values from all rows in the CUSTOMER table. The values in the set are not ordered in any way; it is possible to order the returned values though; see later in the chapter for information on ordering data in the resultset (section on GROUP BY clause).

The syntax is identical across all three RDBMS products as well as for the SQL99 standard.

Multicolumn SELECT

A single column SELECT, while useful, is certainly not the limit of the SQL capabilities. It's very likely that you'll be selecting more than one column at a time in your queries.

Selecting several columns

The following query selects three columns at the same time from the same CUSTOMER table in the ACME database:

SELECT   cust_id_n, 
         cust_status_s, 
         cust_name_s
FROM     customer
   
CUST_ID_N  C CUST_NAME_S
---------- - -----------------------------------------------
        51 Y DLH INDUSTRIES
         5 Y FAIR AND SONS AIR CONDTNG
        12 Y KILBURN GLASS INDUSTRIES
        61 Y BOSWELL DESIGNS CORP.
        55 Y WILE ELECTROMATIC INC.
         6 Y FABRITEK INC.
...
        16 Y DALCOMP INC.
        89 Y INTEGRATED POWER DESIGNS
        85 Y GUARDIAN MANUFACTURING INC.
       152 Y WILE BESS COMPANY

As in the case with a single column SELECT, the result returned is a set of values, with a distinction that it is rather a set of sets — one set for each column mentioned in the SELECT statement. The sequence in which these sets appear directly corresponds to the sequence in which the column names were mentioned in the SELECT clause.

Note 

It is possible to select a column more than once within a single query. The result will simply be duplicate sets of values.

The syntax is identical across all three RDBMS products as well as for the SQL99 standard.

Selecting all columns

Selecting all columns in the table could be achieved by listing every single column from the table in the SELECT clause of the query, or using the convenient shortcut — asterisk (*) — provided by the SQL99 standard and implemented by virtually every RDBMS on the planet.

SELECT *
FROM status
   
STATUS_ID_N ST STATUS_DESC_S
----------- -- ------------------------------
          6 60 SHIPPED
          2 20 COMPLETE
          8 70 INVOICED
          9 80 CANCELLED

As with any multirow query, the returned result comprises the sets of the selected values of all rows for each column. The sequence in which columns appear in the resultset is exactly the same as in the underlying table (view).

Selecting all columns plus an extra column

In a relatively rare case where you need to select all the columns from the table and a duplicate of a column(s), you may do so in all three RDBMS, but this is where vendor's implementations differ from the SQL99 standard as well as from each other.

Oracle and DB2 require the asterisk to be prefixed by a table name or a table alias (the order in which asterisk is listed in relation to other columns is not important):

SQL>  SELECT status.*, status_desc_s
2     FROM status;
STATUS_ID_N ST STATUS_DESC_S                  STATUS_DESC_S
----------- -- ------------------------------ -------------
          6 60 SHIPPED                        SHIPPED
          2 20 COMPLETE                       COMPLETE
          8 70 INVOICED                       INVOICED
          9 80 CANCELLED                      CANCELLED
   
4 rows selected.

This full name qualification is not required in MS SQL Server:

1>  SELECT *, status_desc_s  FROM status
2>  GO
 STATUS_ID_N STATUS_CODE_S STATUS_DESC_S   STATUS_DESC_S
 ----------- ------------- --------------- --------------
           2 20            COMPLETE        COMPLETE
           6 60            SHIPPED         SHIPPED
           8 70            INVOICED        INVOICED
           9 80            CANCELLED       CANCELLED
   
(4 rows affected)

This may seem a superfluous feature, but imagine a situation when you need to see all 200 rows in the table and wish to change the default sequence in which the columns appear in the final resultset. You have an option to list all the columns in the desired order (i.e., typing in all the columns in the SELECT clause of your query) or to do it the easier way at the expense of having a duplicate set of values — but in the place where you would rather see it. Of course, you may combine all/any rows in the table into a single resultset more than once.

Selecting distinct values

As you become more selective in terms of what data is expected to be returned by a query, the need may arise to eliminate duplicates. The SQL99 standard provides an easy and elegant way to eliminate any duplicate values from the final resultset.

The table PAYMENT_TERMS in the ACME database contains data about discounts, in terms of percentages, given to customers. While the particulars of each discount might differ, it is quite conceivable that the actual percentage might be the same.

The following example selects all the rows for the PAYTERMS_DISCPCT_N column from that table:

SELECT payterms_discpct_n
FROM   payment_terms
PAYTERMS_DISCPCT_N
 ------------------
                .00
                .02
                .02
                .00
                .00
                .00
                .00
                .00
                .00
                .00
                .02
   
 (11 rows affected)

As you can see, there are quite a few duplicates among these 11 records. If the goal is to find out what are the percentage rates used within the company, the DISTINCT keyword might be used.

SELECT DISTINCT payterms_discpct_n
FROM            payment_terms
PAYTERMS_DISCPCT_N
------------------
                 0
               .02
   
2 rows selected.

Now, all the duplicate values have been eliminated and the resultset contains only two distinct values.

It is important to understand that DISTINCT refers to the entire row, not just a single column; if DISTINCT precedes multiple columns the entire set — row values in all columns — needs to be distinct. Here is an example run in MS SQL Server 2000 (applicable to Oracle and DB2 UDB as well):

SELECT DISTINCT payterms_discpct_n,
                payterms_code_s
FROM            payment_terms
PAYTERMS_DISCPCT_N PAYTERMS_CODE_S 
------------------ --------------- 
.00                N10
.00                N120
.00                N15
.00                N20
.00                N30
.00                N45
.00                N60
.00                N90
.02                CADV
.02                N21530
.02                N21531
   
(11 row(s) affected)

By adding the PAYTERMS_CODE_S column to PAYTERMS_DISCPCT_N, we've made the pair of the columns to be distinct. As a result, duplicate values in a single column are allowed as long as the pair of the values is distinct.

Using literals, functions, and calculated columns

Columns are not the only things that you can select in the relational database world, and the SELECT statement does not always involve a table. Selecting columns from a table is a very straightforward concept, much more so than selecting expressions and literals.

When a value we are after does not exist up to the moment we call it, because it is returned by a function, or being calculated on the fly, you still need to SELECT from somewhere. This "somewhere" could be any table existing within the database and to which you have select privilege (see Chapter 12 for more information on the privileges), but it would be rather inconvenient, and hard to maintain, as the user must know what tables are present in the database, and change the query every time the table in use is dropped or renamed; besides, if you SELECT a literal or an expression from an actual table, the resultset will have as many rows as there are in the table.

Oracle supports the SQL99 standards to the extent of providing a special table, DUAL, when there is no physical object (view or table) to select from. Suppose you need to find out a system time of your Oracle database; the RDBMS provides you with a function SYSDATE (more about the functions in Chapter 10), which returns this information. While not being SQL99 standard compliant, this function is a legitimate Oracle 9i SQL dialect statement and must be treated as such. Since all queries must start with SELECT, you would start with it, but the information is not stored anywhere in the database — it is generated dynamically at the request's moment. That's where the table DUAL comes in handy — you can "select" virtually anything from this table.

SQL> SELECT SYSDATE FROM dual; 
   
SYSDATE
---------
21-OCT-03

The DUAL table itself has only one column (not without irony named DUMMY) and one row containing value X.

Note 

The DUAL table was introduced by Chuck Weiss of Oracle as an underlying object in the Oracle Data Dictionary. It was never meant to be seen by itself, but rather to be used in some complex JOIN operations (discussed Chapter 9); it logically fits the SELECT notion as it implies that the FROM clause is to be used with each statement.

For selecting noncolumn values, DB2 UDB 8.1 has essentially the same concept — table SYSDUMMY1, located in the SYSIBM system schema. This table has a single column called IBMREQD, and a single row containing value Y. The following example demonstrates using this table for calculating the sum of two numbers:

db2=> SELECT (5+5) FROM sysibm.sysdummy1
   
1
---------
       10
   
1 record(s) selected.

The sum of these numbers is not stored anywhere, but it has to be selected following the SQL99 standard rules. This becomes more important, when SQL functions (Chapter 10) are used in the procedural language (like PL/SQL or Transact-SQL).

Microsoft SQL Server 2000 actually allows you to forgo the FROM clause altogether in the situation like this. Here is an example:

SELECT (5+5) num_sum
num_sum     
----------- 
10

The "dummy" tables, as well as ability to select from nothing, is not a part of the SQL standard, but rather manifests of vendors' ingenuity in customers' satisfaction quest.

In the most recent example we've used alias NUM_SUM as the name for the non-existent column. This is alias – a standard feature, supported by all three RDBMS vendors as well as mandated by the SQL standard. The idea behind alias is very simple — giving columns selected in the SQL query more descriptive names as they appear in the final result set. Every single SELECT example in this book could be rewritten with the column names substituted by aliases. For instance, the example from above that selects a distinct value of the PAYTERMS_DISCPCT_N, could be written as follows

SELECT DISTINCT payterms_discpct_n AS discount_percent
FROM            payment_terms
discount_percent 
------------------
                 0
               .02

The new name, while not being shorter, is significantly more descriptive for the user, just as the column name is more informative for the programmer (as it tells what data type the column contains — N — for NUMERIC). The use of the 'AS' operator is optional, and the syntax (either with or without 'AS') is valid in all three databases.

The aliasing becomes even more useful when literals and expressions are selected. An example of selecting an expression (5 + 5) was given previously in the chapter; here is the SELECT statement involving literals:

SELECT cust_name_s, 
       100 AS NUMERIC_CONSTANT, 
       'ABC' AS STRING_CONSTANT 
FROM   customer
   
CUST_NAME_S                  NUMERIC_CONSTANT STRING_CONSTANT
---------------------------- ---------------- ---------------
WILE SEAL CORP.                           100 ABC
WILE ELECTRONICS INC.                     100 ABC
WILE ELECTROMUSICAL INC.                  100 ABC
WILE ELECTROMATIC INC.                    100 ABC
WILE BESS COMPANY                         100 ABC
MAGNETOMETRIC DEVICES INC.                100 ABC
MAGNETICS USA INC.                        100 ABC
. . .                                     . . .
BURNETTE WILLIAM CORP.                    100 ABC
BOSWELL DESIGNS CORP.                     100 ABC
   
37 record(s) selected.

In the default behaviour, if no alias is specified for the constructed column — be it a literal, expression, or a function — Oracle will use the literal itself or an expression in the place of the alias; IBM DB2 UDB will number the columns starting with 1; and Microsoft SQL Server simply leaves the name blank.

Literals can be a part of a "standard" SELECT query, where they are listed together with the column names, as in the example above.

Aliasing is very convenient when a resulting value combines different sources — a column, a function, an expression, or a literal. Here is an example in Oracle and DB2 UDB syntax that concatenates product ID (column PROD_ID_N) from the PRODUCT table of the ACME database with an empty space and product brand (column PROD_BRAND_N) into a single value:

SELECT CAST(prod_id_n AS CHAR(5)) || ' ' || prod_brand_s 
 AS ID_AND_BRAND 
FROM   product
   
ID_AND_BRAND
--------------------------
990   SPRUCE LUMBER
1880  STEEL NAILS
2871  STOOL CAPS
3045  STOOL CAPS
4000  HAND RAILS ROUNDS
4055  GAZEBOS
4761  BAR RAILS
4906  BAR RAILS
4964  BASES
5786  CRATING MATERIAL
   
10 record(s) selected.
Cross-References 

Concatenation operators and functions are covered in detail in Chapters 11 and 10, respectively.

To run this query in MS SQL Server, replace the concatenation (||) operator to a plus (+) operator.

Cross-References 

See Chapter 11 for more about operators.

As a part of the SELECT statement, the functions or expressions could be used for insert or update purposes. Imagine that you would like to keep track of every insert and update with a date/time stamp, using a SELECT query as an input (see INSERT and UPDATE statements in Chapter 6); the following Oracle 9i syntax provides this functionality using SYSDATE and USER functions. (See Chapter 10 for equivalent DB2 UDB and Microsoft SQL Server 2000 syntax.)

SELECT  salesman_code_s, 
        salesman_name_s, 
        sysdate, 
        user 
FROM    salesman;
   
SA SALESMAN_NAME_S                  SYSDATE   USER
-- -------------------------------- --------- -------
02 FAIRFIELD BUGS ASSOCIATION       15-OCT-03ACME
03 AMERICA GONZALES LIMITED         15-OCT-03 ACME
04 HUNTER COMPONENTS                15-OCT-03 ACME
07 ELMERSON INDUSTRIES INCORPORATED 15-OCT-03 ACME
09 SAM KRISTEL INC                  15-OCT-03 ACME
10 HENERY INCORPORATED              15-OCT-03 ACME
   
6 rows selected.

Using subqueries in a SELECT clause

The concept of a subquery is simple — it is a query within a query that supplies necessary values for the first query. A SELECT query could have an embedded subquery as a way to retrieve unknown values, and the nesting level (how many subqueries you could have within each other) is limited only by the RDBMS capability.

To illustrate the concept, some preliminary work is required (since the required table is not within the ACME database). Let's say that in full accordance with database design guidelines you have created a table that contains state tax amounts for each state, as well as each state's full name and two-letter postal abbreviation.

First you need to create a table:

CREATE TABLE sales_tax
(
  stax_id_n     INTEGER NOT NULL,
  stax_amt_n    DECIMAL(6,3),
  stax_state_s  CHAR(2),
  CONSTRAINT pk_sales_tax PRIMARY KEY (stax_id_n )
)

The next step is to insert some meaningful data into your new table.

INSERT INTO sales_tax
( 
 stax_id_n, 
 stax_amt_n, 
 stax_state_s 
)
VALUES (1, 8.5,'WA')

When you wish to produce a report that would contain a product ID and price, and the amount of tax applied to unit of the product in the state of Washington, the following query would be useful:

SELECT prod_num_s, 
       prod_price_n, 
       (SELECT stax_amt_n 
        FROM sales_tax 
        WHERE stax_state_s = 'WA') AS TAX_RATE,
       prod_price_n * 
       (SELECT stax_amt_n 
        FROM sales_tax 
        WHERE stax_state_s = 'WA')/100 AS SALES_TAX 
FROM    product
   
prod_num_s prod_price_n TAX_RATE SALES_TAX               
---------- ------------ -------- ----------------------- 
990        18.32        8.500    1.557200000
1880       34.09        8.500    2.897650000
2871       26.92        8.500    2.288200000
3045       15.98        8.500    1.358300000
4000       11.84        8.500    1.006400000
4055       16.09        8.500    1.367650000
4761       23.20        8.500    1.972000000
4906       27.10        8.500    2.303500000
4964       23.20        8.500    1.972000000
5786       17.98        8.500    1.528300000
   
(10 row(s) affected)

This example uses the state abbreviation (WA) as a parameter to find the amount of the applicable state tax and calculates the necessary values on the fly using subquery to extract this value from the SALES_TAX table; note that the data retrieval is conducted for each and every row in the resulting set.