Grouping records in the resultset based on some criteria could provide a valuable insight into data that has accumulated in the table. For example, you would like to see the final resultset of your orders (where there could be one or more order items per order) not in the random order they were entered in, but rather in groups of items that belong to the same order:
SELECT ordline_ordhdrid_fn, ordline_ordqty_n AS QTY_PER_ITEM FROM order_line GROUP BY ordline_ordhdrid_fn, ordline_ordqty_n; ordline_ordhdrid_fn QTY_PER_ITEM ------------------- ------------ 30607 90 30607 500 30608 275 30608 340 30609 560 . . . . . . 30666 560 30667 560 30668 72 30668 90 . . . . . . 30669 120 30669 480 . . . . . . 30670 126 30670 450 97 records selected.
Note the repeating values (groups) in the ORDLINE_ORDHDRID_FN field, representing the order header ID for which there could be one or more order items and for which there are different quantities. This information might become more concise with the use of aggregate functions that could sum the quantity for the order or calculate the average, and so on (see the example in this paragraph using SUM function).
All columns specified in a SELECT clause list, with the exception of aggregate columns (these used as an argument for an aggregate function), must be listed in the GROUP BY clause as well.
The GROUP BY clause is mostly (but not always) used in conjunction with aggregate functions, which are introduced in Chapter 10. The aggregate functions return a single value as a result of an operation conducted on a set of values. The set is grouped to provide a series of sets for use with the aggregate functions.
Table 10-7 in Chapter 10 lists five aggregate functions alongside their descriptions and particulars of implementation across all three RDBMS. For examples of the specific function usage, please refer to Chapter 10; here we're going to discuss the concept of aggregation and how it refers to the GROUP BY clause within a SELECT query.
To calculate the total sum of an ordered quantity for all orders (and one order could have one or more items) in the table ORDER_LINE you could use the SUM aggregate function:
SELECT SUM(ordline_ordqty_n) AS TOT_QTY_PER_ORDER FROM order_line; TOT_QTY_PER_ORDER ----------------- 31847 1 record selected.
The single value that summed up all ordered quantities for all the records in the table was returned. While useful, this information could be more valuable if the ordered quantity is summed up per order — you would know how many items were ordered in each and every order. Here is the query that accomplishes this task:
SELECT ordline_ordhdrid_fn, SUM(ordline_ordqty_n) AS TOT_QTY_PER_ORDER FROM order_line GROUP BY ordline_ordhdrid_fn ORDLINE_ORDHDRID_FN TOT_QTY_PER_ORDER ------------------- ----------------- 30607 590 30608 615 30609 560 ... ... 30669 600 30670 720 51 rows selected.
Here we have much more detailed information, as the quantities were grouped by order and then summed up for each order, producing a single value for each order (as opposed to producing it for a whole resultset).
Here is an example of another aggregate function AVG, which calculates the average of the values. In this case, you are going to calculate the average quantity per order. SELECT ordline_ordhdrid_fn, AVG(ordline_ordqty_n) AS AVG_QTY_PER_ORDER FROM order_line GROUP BY ordline_ordhdrid_fn ORDLINE_ORDHDRID_FN AVG_QTY_PER_ORDER ------------------- ----------------- 30607 295 30608 307.5 30609 560 ... ... 30669 300 30670 180 51 rows selected.
For order #30608, the average quantity is 307.5 in Oracle, but MS SQL Server and DB2 would round the value to 307 because the field is of INTEGER data type, whereas Oracle uses the NUMBER data type able to accommodate decimals. Refer to Chapter 3 for more information on data types.
The HAVING clause used exclusively with the GROUP BY clause provides a means of additional selectivity. Imagine that you need to select not all records in your GROUP BY query but only those that would have their grouped value greater than 750. Adding additional criterion to the WHERE clause would not help, as the value by which we could limit the records is calculated using GROUP BY and is unavailable outside it before the query has completed execution. The HAVING clause used within the GROUP BY clause allows us to add this additional criterion to the results of the GROUP BY operation. For example, to display orders with a total quantity greater than 750, the following query could be used:
SELECT ordline_ordhdrid_fn, SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER FROM order_line GROUP BY ordline_ordhdrid_fn HAVING SUM(ordline_ordqty_n) > 750 ORDLINE_ORDHDRID_FN TOT_QTY_PER_ORDER ------------------- ----------------- 30628 789 30668 789 2 records selected.
As you can see, only 2 records among 51 from the previous query had satisfied this additional restriction.
We could have used a column ORDLINE_ORDHDRID_FN, without the SUM aggregate function in the HAVING clause to restrict the returned records by some other criteria, but we cannot use just any column from the SELECT clause: It also has to be listed in the GROUP BY clause to be used with HAVING. Here is a query example that sums up order quantities grouped by order header ID only if they fall into a specified list of orders:
SELECT ordline_ordhdrid_fn, SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER FROM order_line GROUP BY ordline_ordhdrid_fn HAVING ordline_ordhdrid_fn IN (30607,30608,30611,30622) ordline_ordhdrid_fn TOT_QTY_PER_ORDER ------------------- ----------------- 30607 590 30608 615 30611 625 30622 726 4 records selected.
While GROUP BY would consider the null values in the columns by which the grouping is performed a valid group, this is not the way the NULLs are treated by the aggregate functions. Aggregate functions simply exclude the NULL records — they will not make it to the final result. See Chapter 10 for more information.