GROUP BY and HAVING Clauses: Summarizing Results

GROUP BY and HAVING Clauses: Summarizing Results

Grouping records in the resultset based on some criteria could provide a valuable insight into data that has accumulated in the table. For example, you would like to see the final resultset of your orders (where there could be one or more order items per order) not in the random order they were entered in, but rather in groups of items that belong to the same order:

SELECT   ordline_ordhdrid_fn, 
         ordline_ordqty_n AS QTY_PER_ITEM
FROM     order_line
GROUP BY ordline_ordhdrid_fn,
         ordline_ordqty_n;
   
ordline_ordhdrid_fn QTY_PER_ITEM 
------------------- ------------ 
30607               90
30607               500
30608               275
30608               340
30609               560
. . .               . . .
30666               560
30667               560
30668               72
30668               90
. . .               . . .
30669               120
30669               480
. . .               . . .
30670               126
30670               450
   
97 records selected.

Note the repeating values (groups) in the ORDLINE_ORDHDRID_FN field, representing the order header ID for which there could be one or more order items and for which there are different quantities. This information might become more concise with the use of aggregate functions that could sum the quantity for the order or calculate the average, and so on (see the example in this paragraph using SUM function).

Note 

All columns specified in a SELECT clause list, with the exception of aggregate columns (these used as an argument for an aggregate function), must be listed in the GROUP BY clause as well.

The GROUP BY clause is mostly (but not always) used in conjunction with aggregate functions, which are introduced in Chapter 10. The aggregate functions return a single value as a result of an operation conducted on a set of values. The set is grouped to provide a series of sets for use with the aggregate functions.

Table 10-7 in Chapter 10 lists five aggregate functions alongside their descriptions and particulars of implementation across all three RDBMS. For examples of the specific function usage, please refer to Chapter 10; here we're going to discuss the concept of aggregation and how it refers to the GROUP BY clause within a SELECT query.

To calculate the total sum of an ordered quantity for all orders (and one order could have one or more items) in the table ORDER_LINE you could use the SUM aggregate function:

SELECT
         SUM(ordline_ordqty_n) AS TOT_QTY_PER_ORDER
FROM     order_line;
   
TOT_QTY_PER_ORDER 
----------------- 
31847
   
1 record selected.

The single value that summed up all ordered quantities for all the records in the table was returned. While useful, this information could be more valuable if the ordered quantity is summed up per order — you would know how many items were ordered in each and every order. Here is the query that accomplishes this task:

SELECT   ordline_ordhdrid_fn, 
         SUM(ordline_ordqty_n) AS TOT_QTY_PER_ORDER
FROM     order_line
GROUP BY ordline_ordhdrid_fn
   
ORDLINE_ORDHDRID_FN   TOT_QTY_PER_ORDER
-------------------   -----------------
              30607                 590
              30608                 615
              30609                 560
              ...                   ...
              30669                 600
              30670                 720
   
51 rows selected.

Here we have much more detailed information, as the quantities were grouped by order and then summed up for each order, producing a single value for each order (as opposed to producing it for a whole resultset).

Here is an example of another aggregate function AVG, 
which calculates the average of the values. In this case, you are 
going to calculate the average quantity per order.
SELECT   ordline_ordhdrid_fn, 
         AVG(ordline_ordqty_n) AS AVG_QTY_PER_ORDER
FROM     order_line
GROUP BY ordline_ordhdrid_fn
ORDLINE_ORDHDRID_FN AVG_QTY_PER_ORDER
------------------- -----------------
              30607               295
              30608             307.5
              30609               560
              ...                 ...
              30669               300
              30670               180
   
51 rows selected.
Note 

For order #30608, the average quantity is 307.5 in Oracle, but MS SQL Server and DB2 would round the value to 307 because the field is of INTEGER data type, whereas Oracle uses the NUMBER data type able to accommodate decimals. Refer to Chapter 3 for more information on data types.

The HAVING clause used exclusively with the GROUP BY clause provides a means of additional selectivity. Imagine that you need to select not all records in your GROUP BY query but only those that would have their grouped value greater than 750. Adding additional criterion to the WHERE clause would not help, as the value by which we could limit the records is calculated using GROUP BY and is unavailable outside it before the query has completed execution. The HAVING clause used within the GROUP BY clause allows us to add this additional criterion to the results of the GROUP BY operation. For example, to display orders with a total quantity greater than 750, the following query could be used:

SELECT    ordline_ordhdrid_fn, 
          SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER
FROM      order_line
GROUP BY  ordline_ordhdrid_fn
HAVING   SUM(ordline_ordqty_n) > 750
   
ORDLINE_ORDHDRID_FN TOT_QTY_PER_ORDER
 ------------------- -----------------
               30628               789
               30668               789
   
2 records selected.

As you can see, only 2 records among 51 from the previous query had satisfied this additional restriction.

We could have used a column ORDLINE_ORDHDRID_FN, without the SUM aggregate function in the HAVING clause to restrict the returned records by some other criteria, but we cannot use just any column from the SELECT clause: It also has to be listed in the GROUP BY clause to be used with HAVING. Here is a query example that sums up order quantities grouped by order header ID only if they fall into a specified list of orders:

SELECT   ordline_ordhdrid_fn, 
         SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER
FROM     order_line
GROUP BY ordline_ordhdrid_fn
HAVING   ordline_ordhdrid_fn IN (30607,30608,30611,30622)
   
ordline_ordhdrid_fn TOT_QTY_PER_ORDER 
------------------- ----------------- 
30607               590
30608               615
30611               625
30622               726
   
4 records selected.
Note 

While GROUP BY would consider the null values in the columns by which the grouping is performed a valid group, this is not the way the NULLs are treated by the aggregate functions. Aggregate functions simply exclude the NULL records — they will not make it to the final result. See Chapter 10 for more information.