13.2 Pushing the GROUPING Envelope

The grouping examples you have seen so far represent simple ways of aggregating data using Oracle's extensions of the GROUP BY clause. These simple mechanisms were introduced in Oracle8i. In Oracle9i Database, Oracle enhanced this new functionality in some interesting and useful ways. Oracle now allows for:

  • Repeating column names in the GROUP BY clause

  • Grouping on composite columns

  • Concatenated groupings

13.2.1 Repeated Column Names in the GROUP BY Clause

In Oracle8i, repeating column names are not allowed in a GROUP BY clause. If the GROUP BY clause contains an extension (i.e., ROLLUP or CUBE), you cannot use the same column inside the extension as well as outside the extension. The following SQL is invalid in Oracle8i:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY o.year, ROLLUP (o.year, o.month, r.name);

                         *

ERROR at line 6:

ORA-30490: Ambiguous expression in GROUP BY ROLLUP or CUBE list

However, the same query works in Oracle9i Database and later:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY o.year, ROLLUP (o.year, o.month, r.name);



      YEAR MONTH     REGION                    TOTAL

---------- --------- -------------------- ----------

      2000 January   Mid-Atlantic            1221394

      2000 January   New England             1018430

      2000 January   Southeast US             758042

      2000 January                           2997866

      2000 February  Mid-Atlantic             857352

      2000 February  New England             1231492

      2000 February  Southeast US            1236846

      2000 February                          3325690

      2000 March     Mid-Atlantic            1274062

      2000 March     New England             1132966

      2000 March     Southeast US            1311986

      2000 March                             3719014

      2001 January   Mid-Atlantic             610697

      2001 January   New England              509215

      2001 January   Southeast US             379021

      2001 January                           1498933

      2001 February  Mid-Atlantic             428676

      2001 February  New England              615746

      2001 February  Southeast US             618423

      2001 February                          1662845

      2001 March     Mid-Atlantic             637031

      2001 March     New England              566483

      2001 March     Southeast US             655993

      2001 March                             1859507

      2000                                  10042570

      2001                                   5021285

      2000                                  10042570

      2001                                   5021285



28 rows selected.

Repetition of o.year in the GROUP BY clause as well as in the ROLLUP operation repeats the summary rows of each year in the output and suppresses the grand total. Repetition of column names in a GROUP BY clause isn't very useful, but it's worth knowing that such constructs are allowed in Oracle9i and later.

13.2.2 Grouping on Composite Columns

Oracle8i supports grouping on individual columns only. Oracle9i extends the grouping operations to include grouping on composite columns. A composite column is a collection of two or more columns, but their values are treated as one for the grouping computation. Oracle8i allows group operations of the form ROLLUP (a,b,c), while, Oracle9i allows group operations of the form ROLLUP (a,(b,c)) as well. In this case, (b,c) is treated as one column for the purpose of the grouping computation. For example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP ((o.year, o.month),r.name);



      YEAR MONTH     REGION                    TOTAL

---------- --------- -------------------- ----------

      2000 January   Mid-Atlantic            1221394

      2000 January   New England             1018430

      2000 January   Southeast US             758042

      2000 January                           2997866

      2000 February  Mid-Atlantic             857352

      2000 February  New England             1231492

      2000 February  Southeast US            1236846

      2000 February                          3325690

      2000 March     Mid-Atlantic            1274062

      2000 March     New England             1132966

      2000 March     Southeast US            1311986

      2000 March                             3719014

      2001 January   Mid-Atlantic             610697

      2001 January   New England              509215

      2001 January   Southeast US             379021

      2001 January                           1498933

      2001 February  Mid-Atlantic             428676

      2001 February  New England              615746

      2001 February  Southeast US             618423

      2001 February                          1662845

      2001 March     Mid-Atlantic             637031

      2001 March     New England              566483

      2001 March     Southeast US             655993

      2001 March                             1859507

                                            15063855



25 rows selected.

In this example, two columns (o.year, o.month) are treated as one composite column. This causes Oracle to treat the combination of year and month as one dimension, and the summary rows are computed accordingly. Although this query is not allowed in Oracle8i, you can fake composite column groupings in Oracle8i by using the concatenation operator (||) to combine two columns and treat the result as one composite column. Oracle8i can then produce the same result as the previous query in Oracle 9i. For example:

SELECT TO_CHAR(o.year)||' '||TO_CHAR(TO_DATE(o.month,'MM'),'Month') 

       Year_Month, 

       r.name region, SUM(o.tot_sales) 

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY 

ROLLUP (TO_CHAR(o.year)||' '||

        TO_CHAR(TO_DATE(o.month,'MM'),'Month'), r.name);



YEAR_MONTH           REGION               SUM(O.TOT_SALES)

-------------------- -------------------- ----------------

2000 February        Mid-Atlantic                   857352

2000 February        New England                   1231492

2000 February        Southeast US                  1236846

2000 February                                      3325690

2000 January         Mid-Atlantic                  1221394

2000 January         New England                   1018430

2000 January         Southeast US                   758042

2000 January                                       2997866

2000 March           Mid-Atlantic                  1274062

2000 March           New England                   1132966

2000 March           Southeast US                  1311986

2000 March                                         3719014

2001 February        Mid-Atlantic                   428676

2001 February        New England                    615746

2001 February        Southeast US                   618423

2001 February                                      1662845

2001 January         Mid-Atlantic                   610697

2001 January         New England                    509215

2001 January         Southeast US                   379021

2001 January                                       1498933

2001 March           Mid-Atlantic                   637031

2001 March           New England                    566483

2001 March           Southeast US                   655993

2001 March                                         1859507

                                                  15063855



25 rows selected.

This query converts the numeric month into the string expression of the name of the month and concatenates it with the string representation of the year. The same expression has to be used in the SELECT list and the ROLLUP clause. The expression TO_CHAR(o.year)||' '||TO_CHAR(TO_DATE( o.month,'MM'),'Month') is treated as one composite column.

13.2.3 Concatenated Groupings

With Oracle9i and later, you can have multiple ROLLUP, CUBE, or GROUPING SETS operations, or a combination of these under the GROUP BY clause in a query. This is not allowed in Oracle8i. You will get an error message if you attempt the following query in Oracle8i:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name);

                                   *

ERROR at line 6:

ORA-30489: Cannot have more than one rollup/cube expression list

However, the same query works in Oracle9i and later:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name);



      YEAR MONTH     REGION                    TOTAL

---------- --------- -------------------- ----------

      2000 January   Mid-Atlantic            1221394

      2000 January   New England             1018430

      2000 January   Southeast US             758042

      2000 January                           2997866

      2000 February  Mid-Atlantic             857352

      2000 February  New England             1231492

      2000 February  Southeast US            1236846

      2000 February                          3325690

      2000 March     Mid-Atlantic            1274062

      2000 March     New England             1132966

      2000 March     Southeast US            1311986

      2000 March                             3719014

      2000           Mid-Atlantic            3352808

      2000           New England             3382888

      2000           Southeast US            3306874

      2000                                  10042570

      2001 January   Mid-Atlantic             610697

      2001 January   New England              509215

      2001 January   Southeast US             379021

      2001 January                           1498933

      2001 February  Mid-Atlantic             428676

      2001 February  New England              615746

      2001 February  Southeast US             618423

      2001 February                          1662845

      2001 March     Mid-Atlantic             637031

      2001 March     New England              566483

      2001 March     Southeast US             655993

      2001 March                             1859507

      2001           Mid-Atlantic            1676404

      2001           New England             1691444

      2001           Southeast US            1653437

      2001                                   5021285

                     Mid-Atlantic            5029212

                     New England             5074332

                     Southeast US            4960311

                                            15063855



36 rows selected.

When you have multiple grouping operations (ROLLUP, CUBE, or GROUPING SETS) in a GROUP BY clause, what you have is called a concatenated grouping. The result of the concatenated grouping is to produce a cross-product of groupings from each grouping operation. Therefore, the query:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP(o.year),  ROLLUP (o.month), ROLLUP (r.name);

behaves as a CUBE and produces the same result as the query:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY CUBE (o.year, o.month, r.name);

Since a CUBE contains aggregates for all possible combinations of the grouping columns, the concatenated grouping of CUBES is no different from a regular CUBE, and all the following queries return the same result as the query shown previously:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY CUBE (o.year, o.month), CUBE (r.name);



SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY CUBE (o.year), CUBE (o.month, r.name);



SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY CUBE (o.year), CUBE (o.month), CUBE (r.name);

13.2.3.1 Concatenated groupings with GROUPING SETS

Concatenated groupings come in handy while using GROUPING SETS. Since GROUPING SETS produces only the subtotal rows, you can specify just the aggregation levels you want in your output by using a concatenated grouping of GROUPING SETS. The concatenated grouping of GROUPING SETS (a,b) and GROUPING SETS (c,d) will produce aggregate rows for the aggregation levels (a,c), (a,d), (b,c), and (b,d). The concatenated grouping of GROUPING SETS (a,b) and GROUPING SETS (c) will produce aggregate rows for the aggregation levels (a,c) and (b,c). For example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.year, o.month), GROUPING SETS (r.name);



      YEAR MONTH     REGION                    TOTAL

---------- --------- -------------------- ----------

      2000           Mid-Atlantic            3352808

      2000           New England             3382888

      2000           Southeast US            3306874

      2001           Mid-Atlantic            1676404

      2001           New England             1691444

      2001           Southeast US            1653437

           January   Mid-Atlantic            1832091

           January   New England             1527645

           January   Southeast US            1137063

           February  Mid-Atlantic            1286028

           February  New England             1847238

           February  Southeast US            1855269

           March     Mid-Atlantic            1911093

           March     New England             1699449

           March     Southeast US            1967979



15 rows selected.

The concatenated grouping GROUP BY GROUPING SETS (O.YEAR, O.MONTH), GROUPING SETS (R.NAME) in this example produces rows for aggregate levels (O.YEAR, R.NAME) and (O.MONTH, R.NAME). Therefore, you see aggregate rows for (Year, Region) and (Month, Region) combinations in the output. The following example extends the previous query:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.year, o.month), GROUPING SETS (o.year, r. name);



          YEAR MONTH     REGION                    TOTAL

    ---------- --------- -------------------- ----------

 1:      2000                                  10042570

 2:      2001                                   5021285

 3:      2000 January                           2997866

 4:      2000 February                          3325690

 5:      2000 March                             3719014

 6:      2001 January                           1498933

 7:      2001 February                          1662845

 8:      2001 March                             1859507

 9:      2000           Mid-Atlantic            3352808

10:      2000           New England             3382888

11:      2000           Southeast US            3306874

12:      2001           Mid-Atlantic            1676404

13:      2001           New England             1691444

14:      2001           Southeast US            1653437

15:           January   Mid-Atlantic            1832091

16:           January   New England             1527645

17:           January   Southeast US            1137063

18:           February  Mid-Atlantic            1286028

19:           February  New England             1847238

20:           February  Southeast US            1855269

21:           March     Mid-Atlantic            1911093

22:           March     New England             1699449

23:           March     Southeast US            1967979



23 rows selected.

This example produces four grouping combinations. Table 13-1 describes the various grouping combinations produced by this query and references their corresponding row numbers in the output.

Table 13-1. Grouping combinations

Grouping combination

Corresponding rows

(o.year, o.year)

1-2

(o.year, r.name)

9-14

(o.month, o.year)

3-8

(o.month, r.name)

15-23


The GROUPING SETS operation is independent of the order of columns. Therefore, the following two queries will produce the same results as shown previously:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.year, r.name), GROUPING SETS (o.year, o.month);



SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.month, o.year), GROUPING SETS (r.name, o.year);

It is permissible to have a combination of ROLLUP, CUBE, and GROUPING SETS in a single GROUP BY clause, as in the following example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.month, o.year), ROLLUP(r.name), CUBE (o.year);

However, the output from such queries seldom makes any sense. You should carefully evaluate the need for such a query if you intend to write one.

13.2.3.2 ROLLUP and CUBE as arguments to GROUPING SETS

Unlike the ROLLUP and CUBE operations, the GROUPING SETS operation can take a ROLLUP or a CUBE as its argument. As you have seen earlier, GROUPING SETS produces only subtotal rows. However, there are times when you may need to print the grand total along with the subtotals. In such situations, you can perform the GROUPING SETS operation on ROLLUP operations, as in the following example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (ROLLUP (o.year), 

                        ROLLUP (o.month), 

                        ROLLUP (r. name));



      YEAR MONTH     REGION                    TOTAL

---------- --------- -------------------- ----------

                     Mid-Atlantic            5029212

                     New England             5074332

                     Southeast US            4960311

           January                           4496799

           February                          4988535

           March                             5578521

      2000                                  10042570

      2001                                   5021285

                                            15063855

                                            15063855

                                            15063855



11 rows selected.

This example produces the subtotals for each dimension, as expected from the regular GROUPING SETS operations. Also, it produces the grand total across all the dimensions. However, you get three identical grand-total rows. The grand-total rows are repeated because they are produced by each ROLLUP operation inside the GROUPING SETS. If you insist on only one grand-total row, you may use the DISTINCT keyword in the SELECT clause:

SELECT Distinct o.year, 

                TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

                r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (ROLLUP (o.year), ROLLUP (o.month), 

                        ROLLUP (r. name));



 YEAR MONTH     REGION                    TOTAL

----- --------- -------------------- ----------

 2000                                  10042570

 2001                                   5021285

       February                         4988535

       January                          4496799

       March                            5578521

                Mid-Atlantic            5029212

                New England             5074332

                Southeast US            4960311

                                       15063855



9 rows selected.

In this example, the DISTINCT keyword eliminated the duplicate grand-total rows. You can also eliminate duplicate rows by using the GROUP_ID function, as discussed later in this chapter.

If you are interested in subtotals and totals on composite dimensions, you can use composite or concatenated ROLLUP operations within GROUPING SETS, as in the following example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales) total

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (ROLLUP (o.year, o.month), ROLLUP(r.name));



     YEAR MONTH     REGION                    TOTAL

--------- --------- -------------------- ----------

                    Mid-Atlantic            5029212

                    New England             5074332

                    Southeast US            4960311

     2000 January                           2997866

     2000 February                          3325690

     2000 March                             3719014

     2000                                  10042570

     2001 January                           1498933

     2001 February                          1662845

     2001 March                             1859507

     2001                                   5021285

                                           15063855

                                           15063855



13 rows selected.

This query generates subtotals for (year, month) combinations, subtotals for the region, subtotals for the year, and the grand total. Note that there are duplicate grand-total rows because of the multiple ROLLUP operations within the GROUPING SETS operation.