4.1 Aggregate Functions

An aggregate function summarizes the results of an expression over a number of rows, returning a single value. The general syntax for most of the aggregate functions is as follows:

aggregate_function([DISTINCT | ALL] expression)

The syntax elements are:


aggregate_function

Gives the name of the function?e.g., SUM, COUNT, AVG, MAX, MIN


DISTINCT

Specifies that the aggregate function should consider only distinct values of the argument expression.


ALL

Specifies that the aggregate function should consider all values, including all duplicate values, of the argument expression. The default is ALL.


expression

Specifies a column, or any other expression, on which you want to perform the aggregation.

Let's look at a simple example. The following SQL uses the MAX function to find the maximum salary of all employees:

SELECT MAX(salary) FROM employee;



MAX(SALARY)

-----------

       5000

In subsequent sections, we use a series of slightly more involved examples that illustrate various aspects of aggregate function behavior. For those examples, we use the following cust_order table:

DESC cust_order



 Name                             Null?    Type

 -------------------------------- -------- --------------

 ORDER_NBR                        NOT NULL NUMBER(7)

 CUST_NBR                         NOT NULL NUMBER(5)

 SALES_EMP_ID                     NOT NULL NUMBER(5)

 SALE_PRICE                                NUMBER(9,2)

 ORDER_DT                         NOT NULL DATE

 EXPECTED_SHIP_DT                 NOT NULL DATE

 CANCELLED_DT                              DATE

 SHIP_DT                                   DATE

 STATUS                                    VARCHAR2(20)



SELECT order_nbr, cust_nbr, sales_emp_id, sale_price,

order_dt, expected_ship_dt

FROM cust_order;



ORDER_NBR CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT  EXPECTED_ 

--------- -------- ------------ ---------- --------- --------- 

     1001        1         7354         99 22-JUL-01 23-JUL-01

     1000        1         7354            19-JUL-01 24-JUL-01

     1002        5         7368            12-JUL-01 25-JUL-01

     1003        4         7654         56 16-JUL-01 26-JUL-01

     1004        4         7654         34 18-JUL-01 27-JUL-01

     1005        8         7654         99 22-JUL-01 24-JUL-01

     1006        1         7354            22-JUL-01 28-JUL-01

     1007        5         7368         25 20-JUL-01 22-JUL-01

     1008        5         7368         25 21-JUL-01 23-JUL-01

     1009        1         7354         56 18-JUL-01 22-JUL-01

     1012        1         7354         99 22-JUL-01 23-JUL-01

     1011        1         7354            19-JUL-01 24-JUL-01

     1015        5         7368            12-JUL-01 25-JUL-01

     1017        4         7654         56 16-JUL-01 26-JUL-01

     1019        4         7654         34 18-JUL-01 27-JUL-01

     1021        8         7654         99 22-JUL-01 24-JUL-01

     1023        1         7354            22-JUL-01 28-JUL-01

     1025        5         7368         25 20-JUL-01 22-JUL-01

     1027        5         7368         25 21-JUL-01 23-JUL-01

     1029        1         7354         56 18-JUL-01 22-JUL-01



20 rows selected.

4.1.1 NULLs and Aggregate Functions

Notice that the column sale_price in the cust_order table is nullable, and that it contains NULL values for some rows. To examine the effect of NULLs in an aggregate function, execute the following SQL:

SELECT COUNT(*), COUNT(sale_price) FROM cust_order;



COUNT(*) COUNT(SALE_PRICE)

-------- -----------------

      20                14

Notice the difference in the output of COUNT(*) and COUNT(sale_price). This is because COUNT(sale_price) ignores NULLs, whereas COUNT(*) doesn't. The reason COUNT(*) doesn't ignore NULLs is because it counts rows, not column values. The concept of NULL doesn't apply to a row as a whole. Other than COUNT(*), there is only one other aggregate function that doesn't ignore NULLs, and that is GROUPING. All other aggregate functions ignore NULLs. We will discuss GROUPING in Chapter 13. For now, let's examine the effect of NULLs when they are ignored.

SUM, MAX, MIN, AVG, etc., all ignore NULLs. Therefore, if you are trying to find a value such as the average sale price in the cust_order table, the average will be of the 14 rows that have a value for that column. The following example shows the count of all rows, the total of all sale prices, and the average of all sale prices:

SELECT COUNT(*), SUM(sale_price), AVG(sale_price)

FROM cust_order;



       COUNT(*) SUM(SALE_PRICE) AVG(SALE_PRICE)

--------------- --------------- ---------------

             20             788      56.2857143

Note that AVG(sale_price) is not equal to SUM(sale_price) / COUNT(*). If it were, the result of AVG(sale_price) would have been 788 / 20 = 39.4. But, since the AVG function ignores NULLS, it divides the total sale price by 14, and not by 20. AVG(sale_price) is equal to SUM(sale_price) / COUNT(sale_price) (788 / 14 = 56.2857143).

There may be situations where you want an average to be taken over all the rows in a table, not just the rows with non-NULL values for the column in question. In those situations you have to use the NVL function within the AVG function call to assign 0 (or some other useful value) to the column in place of any NULL values. (DECODE, CASE, or the COALESCE function can be used in place of NVL. See Chapter 9 for details.) Here's an example:

SELECT AVG(NVL(sale_price,0)) FROM cust_order;



AVG(NVL(SALE_PRICE,0))

----------------------

                  39.4

Notice that the use of NVL causes all 20 rows to be considered for average computation, and the rows with NULL values for sale_price are assumed to have a 0 value for that column.

4.1.2 Use of DISTINCT and ALL

Most aggregate functions allow the use of DISTINCT or ALL along with the expression argument. DISTINCT allows you to disregard duplicate expression values, while ALL causes duplicate expression values to be included in the result. Notice that the column cust_nbr has duplicate values. Observe the result of the following SQL:

SELECT COUNT(cust_nbr), COUNT(DISTINCT cust_nbr), COUNT(ALL cust_nbr)

FROM cust_order;



COUNT(CUST_NBR) COUNT(DISTINCTCUST_NBR) COUNT(ALLCUST_NBR)

--------------- ----------------------- ------------------

             20                       4                 20

There are four distinct values in the cust_nbr column. Therefore, COUNT(DISTINCT cust_nbr) returns 4, whereas COUNT(cust_nbr) and COUNT(ALL cust_nbr) both return 20. ALL is the default, which means that if you don't specify either DISTINCT or ALL before the expression argument in an aggregate function, the function will consider all the rows that have a non-NULL value for the expression.

An important thing to note here is that ALL doesn't cause an aggregate function to consider NULL values. For example, COUNT(ALL SALE_PRICE) in the following example still returns 14, and not 20:

SELECT COUNT(ALL sale_price) FROM cust_order;



COUNT(ALLSALE_PRICE)

--------------------

                  14

Since ALL is the default, you can explicitly use ALL with every aggregate function. However, the aggregate functions that take more than one argument as input don't allow the use of DISTINCT. These include CORR, COVAR_POP, COVAR_SAMP, and all the linear regression functions.

In addition, some functions that take only one argument as input don't allow the use of DISTINCT. This category includes STTDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, and GROUPING.

If you try to use DISTINCT with an aggregate function that doesn't allow it, you will get an error. For example:

SELECT STDDEV_POP(DISTINCT sale_price)

FROM cust_order;



SELECT STDDEV_POP(DISTINCT sale_price)

       *

ERROR at line 1:

ORA-30482: DISTINCT option not allowed for this function

However, using ALL with such a function doesn't cause any error. For example:

SELECT STDDEV_POP(ALL sale_price)

FROM cust_order;



STDDEV_POP(ALLSALE_PRICE)

-------------------------

               29.5282639