13.1 Multiple Summary Levels

In Chapter 4, you saw how the GROUP BY clause, along with the aggregate functions, can be used to produce summary results. For example, if you want to print the monthly total sales for each region, you would probably execute the following query:

SELECT r.name region, 

       TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name, o.month;



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

New England          January            1527645

New England          February           1847238

New England          March              1699449

New England          April              1792866

New England          May                1698855

New England          June               1510062

New England          July               1678002

New England          August             1642968

New England          September          1726767

New England          October            1648944

New England          November           1384185

New England          December           1599942

Mid-Atlantic         January            1832091

Mid-Atlantic         February           1286028

Mid-Atlantic         March              1911093

Mid-Atlantic         April              1623438

Mid-Atlantic         May                1778805

Mid-Atlantic         June               1504455

Mid-Atlantic         July               1820742

Mid-Atlantic         August             1381560

Mid-Atlantic         September          1178694

Mid-Atlantic         October            1530351

Mid-Atlantic         November           1598667

Mid-Atlantic         December           1477374

Southeast US         January            1137063

Southeast US         February           1855269

Southeast US         March              1967979

Southeast US         April              1830051

Southeast US         May                1983282

Southeast US         June               1705716

Southeast US         July               1670976

Southeast US         August             1436295

Southeast US         September          1905633

Southeast US         October            1610523

Southeast US         November           1661598

Southeast US         December           1841100



36 rows selected.

As expected, this report prints the total sales for each region and month combination. However, in a more complex application, you may also want to have the subtotal for each region over all months, along with the total for all regions, or you may want the subtotal for each month over all regions, along with the total for all months. In short, you may need to generate subtotals and totals at more than one level.

13.1.1 UNION

In data warehouse applications, you frequently need to generate summary information over various dimensions, and subtotal and total across those dimensions. Generating and retrieving this type of summary information is a core goal of almost all data warehouse applications.

By this time, you have realized that a simple GROUP BY query is not sufficient to generate the subtotals and totals described in this section. To illustrate the complexity of the problem, let's attempt to write a query that would return the following in a single output:

  • Sales for each month for every region

  • Subtotals over all months for every region

  • Total sales for all regions over all months

One way to generate multiple levels of summary (the only way prior to Oracle8i) is to write a UNION query. For example, the following UNION query will give us the desired three levels of subtotals:

SELECT r.name region, 

       TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name, o.month

UNION ALL

SELECT r.name region, NULL, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name

UNION ALL

SELECT NULL, NULL, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id;



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

New England          January            1527645

New England          February           1847238

New England          March              1699449

New England          April              1792866

New England          May                1698855

New England          June               1510062

New England          July               1678002

New England          August             1642968

New England          September          1726767

New England          October            1648944

New England          November           1384185

New England          December           1599942

Mid-Atlantic         January            1832091

Mid-Atlantic         February           1286028

Mid-Atlantic         March              1911093

Mid-Atlantic         April              1623438

Mid-Atlantic         May                1778805

Mid-Atlantic         June               1504455

Mid-Atlantic         July               1820742

Mid-Atlantic         August             1381560

Mid-Atlantic         September          1178694

Mid-Atlantic         October            1530351

Mid-Atlantic         November           1598667

Mid-Atlantic         December           1477374

Southeast US         January            1137063

Southeast US         February           1855269

Southeast US         March              1967979

Southeast US         April              1830051

Southeast US         May                1983282

Southeast US         June               1705716

Southeast US         July               1670976

Southeast US         August             1436295

Southeast US         September          1905633

Southeast US         October            1610523

Southeast US         November           1661598

Southeast US         December           1841100

Mid-Atlantic                           18923298

New England                            19756923

Southeast US                           20605485

                                       59285706



40 rows selected.

This query produced 40 rows of output, 36 of which are the sales for each month for every region. The last four rows are the subtotals and the total. The three rows with region names and NULL values for the month are the subtotals for each region over all the months, and the last row with NULL values for both the region and month is the total sales for all the regions over all the months.

Now that you have the desired result, try to analyze the query a bit. You have a very small all_orders table with only 1440 rows in this example. You wanted to have summary information over just two dimensions?region and month. You have 3 regions and 12 months. To get the desired summary information from this table, you have to write a query consisting of three SELECT statements combined together using UNION ALL. The execution plan for this query is:

PLAN_TABLE_OUTPUT

-----------------------------------------------------



-----------------------------------------------------

| Id | Operation                  | Name            |

-----------------------------------------------------

|  0 | SELECT STATEMENT           |                 |

|  1 | UNION-ALL                  |                 |

|  2 | SORT GROUP BY              |                 |

|  3 | MERGE JOIN                 |                 |

|  4 | TABLE ACCESS BY INDEX ROWID| REGION          |

|  5 | INDEX FULL SCAN            | REGION_PK       |

|* 6 | SORT JOIN                  |                 |

|  7 | TABLE ACCESS FULL          | ALL_ORDERS      |

|  8 | SORT GROUP BY              |                 |

|  9 | MERGE JOIN                 |                 |

|  10| TABLE ACCESS BY INDEX ROWID| REGION          |

|  11| INDEX FULL SCAN            | REGION_PK       |

|* 12| SORT JOIN                  |                 |

|  13| TABLE ACCESS FULL          | ALL_ORDERS      |

|  14| SORT AGGREGATE             |                 |

|  15| NESTED LOOPS               |                 |

|  16| TABLE ACCESS FULL          | ALL_ORDERS      |

|* 17| INDEX UNIQUE SCAN          | REGION_PK       |

-----------------------------------------------------

As indicated by the execution plan output, Oracle needs to perform the following operations to get the results:

Three FULL TABLE scans on all_orders

Three INDEX scan on region_pk (Primary key of table region)

Two Sort-Merge Joins

One NESTED LOOPS JOIN

Two SORT GROUP BY operations

One SORT AGGREGATE operation

One UNION ALL

In any practical application the all_orders table will consist of millions of rows, and performing all these operations would be time-consuming. Even worse, if you have more dimensions for which to prepare summary information than the two shown in this example, you have to write an even more complex query. The bottom line is that such a query badly hurts performance.

13.1.2 ROLLUP

Oracle8i introduced several new features for generating multiple levels of summary information with one query. One such feature is a set of extensions to the GROUP BY clause. In Oracle8i, the GROUP BY clause comes with two extensions: ROLLUP and CUBE. Oracle9i introduced another extension: GROUPING SETS. We discuss ROLLUP in this section. CUBE and GROUPING SETS are discussed later in this chapter.

ROLLUP is an extension to the GROUP BY clause, and therefore can only appear in a query with a GROUP BY clause. The ROLLUP operation groups the selected rows based on the expressions in the GROUP BY clause, and prepares a summary row for each group. The syntax of ROLLUP is:

SELECT  . . . 

FROM  . . . 

GROUP BY ROLLUP (ordered list of grouping columns)

Using ROLLUP, you can generate the summary information discussed in the previous section in a much easier way than in our UNION ALL query. For example:

SELECT r.name region, 

       TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY ROLLUP (r.name, o.month);



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

New England          January            1527645

New England          February           1847238

New England          March              1699449

New England          April              1792866

New England          May                1698855

New England          June               1510062

New England          July               1678002

New England          August             1642968

New England          September          1726767

New England          October            1648944

New England          November           1384185

New England          December           1599942

New England                            19756923

Mid-Atlantic         January            1832091

Mid-Atlantic         February           1286028

Mid-Atlantic         March              1911093

Mid-Atlantic         April              1623438

Mid-Atlantic         May                1778805

Mid-Atlantic         June               1504455

Mid-Atlantic         July               1820742

Mid-Atlantic         August             1381560

Mid-Atlantic         September          1178694

Mid-Atlantic         October            1530351

Mid-Atlantic         November           1598667

Mid-Atlantic         December           1477374

Mid-Atlantic                           18923298

Southeast US         January            1137063

Southeast US         February           1855269

Southeast US         March              1967979

Southeast US         April              1830051

Southeast US         May                1983282

Southeast US         June               1705716

Southeast US         July               1670976

Southeast US         August             1436295

Southeast US         September          1905633

Southeast US         October            1610523

Southeast US         November           1661598

Southeast US         December           1841100

Southeast US                           20605485

                                       59285706



40 rows selected.

As you can see in this output, the ROLLUP operation produced subtotals across the specified dimensions and a grand total. The argument to the ROLLUP operation is an ordered list of grouping columns. Since the ROLLUP operation is used in conjunction with the GROUP BY clause, it first generates aggregate values based on the GROUP BY operation on the ordered list of columns. It then generates higher-level subtotals and finally a grand total. ROLLUP not only simplifies the query, it results in more efficient execution. The execution plan for this ROLLUP query is as follows:

PLAN_TABLE_OUTPUT

----------------------------------------------------



----------------------------------------------------

| Id  | Operation                     | Name       |

----------------------------------------------------

|   0 | SELECT STATEMENT              |            |

|   1 |  SORT GROUP BY ROLLUP         |            |

|   2 |   MERGE JOIN                  |            |

|   3 |    TABLE ACCESS BY INDEX ROWID| REGION     |

|   4 |     INDEX FULL SCAN           | REGION_PK  |

|*  5 |    SORT JOIN                  |            |

|   6 |     TABLE ACCESS FULL         | ALL_ORDERS |

----------------------------------------------------

Rather than the multiple table scans, joins, and other operations required by the UNION ALL version of the query, the ROLLUP query needs just one index scan on region_pk, one full table scan on all_orders, and one join to generate the required output.

If you want to generate subtotals for each month instead of for each region, all you need to do is change the order of columns in the ROLLUP operation, as in the following example:

SELECT r.name region, 

TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY ROLLUP (o.month, r.name);



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

New England          January            1527645

Mid-Atlantic         January            1832091

Southeast US         January            1137063

                     January            4496799

New England          February           1847238

Mid-Atlantic         February           1286028

Southeast US         February           1855269

                     February           4988535

New England          March              1699449

Mid-Atlantic         March              1911093

Southeast US         March              1967979

                     March              5578521

New England          April              1792866

Mid-Atlantic         April              1623438

Southeast US         April              1830051

                     April              5246355

New England          May                1698855

Mid-Atlantic         May                1778805

Southeast US         May                1983282

                     May                5460942

New England          June               1510062

Mid-Atlantic         June               1504455

Southeast US         June               1705716

                     June               4720233

New England          July               1678002

Mid-Atlantic         July               1820742

Southeast US         July               1670976

                     July               5169720

New England          August             1642968

Mid-Atlantic         August             1381560

Southeast US         August             1436295

                     August             4460823

New England          September          1726767

Mid-Atlantic         September          1178694

Southeast US         September          1905633

                     September          4811094

New England          October            1648944

Mid-Atlantic         October            1530351

Southeast US         October            1610523

                     October            4789818

New England          November           1384185

Mid-Atlantic         November           1598667

Southeast US         November           1661598

                     November           4644450

New England          December           1599942

Mid-Atlantic         December           1477374

Southeast US         December           1841100

                     December           4918416

                                       59285706



49 rows selected.

Adding dimensions does not result in additional complexity. The following query rolls up subtotals for the region, the month, and the year for the first quarter:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP (o.year, o.month, r.name);



      YEAR MONTH     REGION               SUM(O.TOT_SALES)

---------- --------- -------------------- ----------------

      2000 January   New England                   1018430

      2000 January   Mid-Atlantic                  1221394

      2000 January   Southeast US                   758042

      2000 January                                 2997866

      2000 February  New England                   1231492

      2000 February  Mid-Atlantic                   857352

      2000 February  Southeast US                  1236846

      2000 February                                3325690

      2000 March     New England                   1132966

      2000 March     Mid-Atlantic                  1274062

      2000 March     Southeast US                  1311986

      2000 March                                   3719014

      2000                                        10042570

      2001 January   New England                    509215

      2001 January   Mid-Atlantic                   610697

      2001 January   Southeast US                   379021

      2001 January                                 1498933

      2001 February  New England                    615746

      2001 February  Mid-Atlantic                   428676

      2001 February  Southeast US                   618423

      2001 February                                1662845

      2001 March     New England                    566483

      2001 March     Mid-Atlantic                   637031

      2001 March     Southeast US                   655993

      2001 March                                   1859507

      2001                                         5021285

                                                  15063855



27 rows selected.

13.1.3 Partial ROLLUPs

In a ROLLUP query with N dimensions, the grand total is considered the top level. The various subtotal rows of N-1 dimensions constitute the next lower level, the subtotal rows of N-2 dimensions constitute yet another level down, and so on. In the most recent example, you have three dimensions (year, month, and region), and the total row is the top level. The subtotal rows for the year represent the next lower level, because these rows are subtotals across two dimensions (month and region). The subtotal rows for the year and month combination are one level lower, because these rows are subtotals across one dimension (region). The rest of the rows are the result of the regular GROUP BY operation (without ROLLUP), and form the lowest level.

If you want to exclude some subtotals and totals from the ROLLUP output, you can only move top to bottom, i.e., exclude the top-level total first, then progressively go down to the next level subtotals, and so on. To do this, you have to take out one or more columns from the ROLLUP operation, and put them in the GROUP BY clause. This is called a partial ROLLUP.

As an example of a partial ROLLUP, let's see what happens when you take out the first column, which is o.year, from the previous ROLLUP operation and move it into the GROUP BY clause.

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY o.year, ROLLUP (o.month, r.name);



      YEAR MONTH     REGION               SUM(O.TOT_SALES)

---------- --------- -------------------- ----------------

      2000 January   New England                   1018430

      2000 January   Mid-Atlantic                  1221394

      2000 January   Southeast US                   758042

      2000 January                                 2997866

      2000 February  New England                   1231492

      2000 February  Mid-Atlantic                   857352

      2000 February  Southeast US                  1236846

      2000 February                                3325690

      2000 March     New England                   1132966

      2000 March     Mid-Atlantic                  1274062

      2000 March     Southeast US                  1311986

      2000 March                                   3719014

      2000                                        10042570

      2001 January   New England                    509215

      2001 January   Mid-Atlantic                   610697

      2001 January   Southeast US                   379021

      2001 January                                 1498933

      2001 February  New England                    615746

      2001 February  Mid-Atlantic                   428676

      2001 February  Southeast US                   618423

      2001 February                                1662845

      2001 March     New England                    566483

      2001 March     Mid-Atlantic                   637031

      2001 March     Southeast US                   655993

      2001 March                                   1859507

      2001                                         5021285



26 rows selected.

The query in this example excludes the grand-total row from the output. By taking out o.year from the ROLLUP operation, you are asking the database not to roll up summary information over the years. Therefore, the database rolls up summary information on region and month. When you proceed to remove o.month from the ROLLUP operation, the query will not generate the roll up summary for the month dimension, and only the region-level subtotals will be printed in the output. For example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY o.year, o.month, ROLLUP (r.name);



      YEAR MONTH     REGION               SUM(O.TOT_SALES)

---------- --------- -------------------- ----------------

      2000 January   New England                   1018430

      2000 January   Mid-Atlantic                  1221394

      2000 January   Southeast US                   758042

      2000 January                                 2997866

      2000 February  New England                   1231492

      2000 February  Mid-Atlantic                   857352

      2000 February  Southeast US                  1236846

      2000 February                                3325690

      2000 March     New England                   1132966

      2000 March     Mid-Atlantic                  1274062

      2000 March     Southeast US                  1311986

      2000 March                                   3719014

      2001 January   New England                    509215

      2001 January   Mid-Atlantic                   610697

      2001 January   Southeast US                   379021

      2001 January                                 1498933

      2001 February  New England                    615746

      2001 February  Mid-Atlantic                   428676

      2001 February  Southeast US                   618423

      2001 February                                1662845

      2001 March     New England                    566483

      2001 March     Mid-Atlantic                   637031

      2001 March     Southeast US                   655993

      2001 March                                   1859507



24 rows selected.

13.1.4 CUBE

The CUBE extension of the GROUP BY clause takes aggregation one step further than ROLLUP. The CUBE operation generates subtotals for all possible combinations of the grouping columns. Therefore, output of a CUBE operation will contain all subtotals produced by an equivalent ROLLUP operation and also some additional subtotals. For example, if you are performing ROLLUP on columns region and month, you will get subtotals for all months for each region, and a grand total. However, if you perform the corresponding CUBE, you will get:

  • The regular rows produced by the GROUP BY clause

  • Subtotals for all months on each region

  • A subtotal for all regions on each month

  • A grand total

Like ROLLUP, CUBE is an extension of the GROUP BY clause, and can appear in a query only along with a GROUP BY clause. The syntax of CUBE is:

SELECT  . . . 

FROM  . . . 

GROUP BY CUBE (list of grouping columns)

For example, the following query returns subtotals for all combinations of regions and months in the all_orders table:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY CUBE(r.name, o.month);



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

                                       59285706

                     January            4496799

                     February           4988535

                     March              5578521

                     April              5246355

                     May                5460942

                     June               4720233

                     July               5169720

                     August             4460823

                     September          4811094

                     October            4789818

                     November           4644450

                     December           4918416

New England                            19756923

New England          January            1527645

New England          February           1847238

New England          March              1699449

New England          April              1792866

New England          May                1698855

New England          June               1510062

New England          July               1678002

New England          August             1642968

New England          September          1726767

New England          October            1648944

New England          November           1384185

New England          December           1599942

Mid-Atlantic                           18923298

Mid-Atlantic         January            1832091

Mid-Atlantic         February           1286028

Mid-Atlantic         March              1911093

Mid-Atlantic         April              1623438

Mid-Atlantic         May                1778805

Mid-Atlantic         June               1504455

Mid-Atlantic         July               1820742

Mid-Atlantic         August             1381560

Mid-Atlantic         September          1178694

Mid-Atlantic         October            1530351

Mid-Atlantic         November           1598667

Mid-Atlantic         December           1477374

Southeast US                           20605485

Southeast US         January            1137063

Southeast US         February           1855269

Southeast US         March              1967979

Southeast US         April              1830051

Southeast US         May                1983282

Southeast US         June               1705716

Southeast US         July               1670976

Southeast US         August             1436295

Southeast US         September          1905633

Southeast US         October            1610523

Southeast US         November           1661598

Southeast US         December           1841100



52 rows selected.

Note that the output contains not only the subtotals for each region, but also the subtotals for each month. You can get the same result from a query without the CUBE operation. However, that query would be lengthy and complex and, of course, very inefficient. Such a query would look like:

SELECT NULL region, NULL month, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id 

UNION ALL

SELECT NULL, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY o.month

UNION ALL

SELECT r.name region, NULL, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name

UNION ALL

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name, o.month;



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

                                       59285706

                     January            4496799

                     February           4988535

                     March              5578521

                     April              5246355

                     May                5460942

                     June               4720233

                     July               5169720

                     August             4460823

                     September          4811094

                     October            4789818

                     November           4644450

                     December           4918416

Mid-Atlantic                           18923298

New England                            19756923

Southeast US                           20605485

New England          January            1527645

New England          February           1847238

New England          March              1699449

New England          April              1792866

New England          May                1698855

New England          June               1510062

New England          July               1678002

New England          August             1642968

New England          September          1726767

New England          October            1648944

New England          November           1384185

New England          December           1599942

Mid-Atlantic         January            1832091

Mid-Atlantic         February           1286028

Mid-Atlantic         March              1911093

Mid-Atlantic         April              1623438

Mid-Atlantic         May                1778805

Mid-Atlantic         June               1504455

Mid-Atlantic         July               1820742

Mid-Atlantic         August             1381560

Mid-Atlantic         September          1178694

Mid-Atlantic         October            1530351

Mid-Atlantic         November           1598667

Mid-Atlantic         December           1477374

Southeast US         January            1137063

Southeast US         February           1855269

Southeast US         March              1967979

Southeast US         April              1830051

Southeast US         May                1983282

Southeast US         June               1705716

Southeast US         July               1670976

Southeast US         August             1436295

Southeast US         September          1905633

Southeast US         October            1610523

Southeast US         November           1661598

Southeast US         December           1841100



52 rows selected.

Since a CUBE produces aggregate results for all possible combinations of the grouping columns, the output of a query using CUBE is independent of the order of columns in the CUBE operation, if everything else remains the same. This is not the case with ROLLUP. If everything else in the query remains the same, ROLLUP(a,b) will produce a slightly different result set than ROLLUP(b,a). However, the result set of CUBE(a,b) will be the same as that of CUBE(b,a). The following example illustrates this by taking the example at the beginning of this section and reversing the order of columns in the CUBE operation:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY CUBE(o.month, r.name);



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

                                       59285706

New England                            19756923

Mid-Atlantic                           18923298

Southeast US                           20605485

                     January            4496799

New England          January            1527645

Mid-Atlantic         January            1832091

Southeast US         January            1137063

                     February           4988535

New England          February           1847238

Mid-Atlantic         February           1286028

Southeast US         February           1855269

                     March              5578521

New England          March              1699449

Mid-Atlantic         March              1911093

Southeast US         March              1967979

                     April              5246355

New England          April              1792866

Mid-Atlantic         April              1623438

Southeast US         April              1830051

                     May                5460942

New England          May                1698855

Mid-Atlantic         May                1778805

Southeast US         May                1983282

                     June               4720233

New England          June               1510062

Mid-Atlantic         June               1504455

Southeast US         June               1705716

                     July               5169720

New England          July               1678002

Mid-Atlantic         July               1820742

Southeast US         July               1670976

                     August             4460823

New England          August             1642968

Mid-Atlantic         August             1381560

Southeast US         August             1436295

                     September          4811094

New England          September          1726767

Mid-Atlantic         September          1178694

Southeast US         September          1905633

                     October            4789818

New England          October            1648944

Mid-Atlantic         October            1530351

Southeast US         October            1610523

                     November           4644450

New England          November           1384185

Mid-Atlantic         November           1598667

Southeast US         November           1661598

                     December           4918416

New England          December           1599942

Mid-Atlantic         December           1477374

Southeast US         December           1841100



52 rows selected.

This query produced the same results as the earlier query; only the order of the rows happens to be different.

13.1.5 Partial CUBE

To exclude some subtotals from the output, you can do a partial CUBE, (similar to a partial ROLLUP) by taking out column(s) from the CUBE operation and putting them into the GROUP BY clause. Here's an example:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name, CUBE(o.month);



REGION               MONTH     SUM(O.TOT_SALES)

-------------------- --------- ----------------

New England                            19756923

New England          January            1527645

New England          February           1847238

New England          March              1699449

New England          April              1792866

New England          May                1698855

New England          June               1510062

New England          July               1678002

New England          August             1642968

New England          September          1726767

New England          October            1648944

New England          November           1384185

New England          December           1599942

Mid-Atlantic                           18923298

Mid-Atlantic         January            1832091

Mid-Atlantic         February           1286028

Mid-Atlantic         March              1911093

Mid-Atlantic         April              1623438

Mid-Atlantic         May                1778805

Mid-Atlantic         June               1504455

Mid-Atlantic         July               1820742

Mid-Atlantic         August             1381560

Mid-Atlantic         September          1178694

Mid-Atlantic         October            1530351

Mid-Atlantic         November           1598667

Mid-Atlantic         December           1477374

Southeast US                           20605485

Southeast US         January            1137063

Southeast US         February           1855269

Southeast US         March              1967979

Southeast US         April              1830051

Southeast US         May                1983282

Southeast US         June               1705716

Southeast US         July               1670976

Southeast US         August             1436295

Southeast US         September          1905633

Southeast US         October            1610523

Southeast US         November           1661598

Southeast US         December           1841100



39 rows selected.

If you compare the results of the partial CUBE operation with that of the full CUBE operation, discussed at the beginning of this section, you will notice that the partial CUBE has excluded the subtotals for each month and the grand total from the output. If you want to retain the subtotals for each month, but want to exclude the subtotals for each region, you can swap the position of r.name and o.month in the GROUP BY . . . CUBE clause, as shown here:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY o.month, CUBE(r.name);

One interesting thing to note is that if you have one column in the CUBE operation, it produces the same result as the ROLLUP operation. Therefore, the following two queries produce identical results:

SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name, CUBE(o.month);



SELECT r.name region, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month, 

SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

GROUP BY r.name, ROLLUP(o.month);

13.1.6 The GROUPING Function

ROLLUP and CUBE produce extra rows in the output that contain subtotals and totals. When a row represents a summary over a given column or set of columns, those columns will contain NULL values. Output containing NULLs and indicating subtotals doesn't make sense to an ordinary person who is unware of the behavior of ROLLUP and CUBE operations. Does your corporate vice president (VP) care about whether you used ROLLUP or CUBE or any other operation to get him the monthly total sales for each region? Obviously, he doesn't. That's exactly why you are reading this page and not your VP.

If you know your way around the NVL function, you would probably attempt to translate each NULL value from CUBE and ROLLUP to some descriptive value, as in the following example:

SELECT NVL(TO_CHAR(o.year), 'All Years') year,

NVL(TO_CHAR(TO_DATE(o.month, 'MM'), 'Month'), 'First Quarter') month,

NVL(r.name, 'All Regions') region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP (o.year, o.month, r.name);



YEAR         MONTH         REGION         SUM(O.TOT_SALES)

------------ ------------- -------------- ----------------

2000         January       New England             1018430

2000         January       Mid-Atlantic            1221394

2000         January       Southeast US             758042

2000         January       All Regions             2997866

2000         February      New England             1231492

2000         February      Mid-Atlantic             857352

2000         February      Southeast US            1236846

2000         February      All Regions             3325690

2000         March         New England             1132966

2000         March         Mid-Atlantic            1274062

2000         March         Southeast US            1311986

2000         March         All Regions             3719014

2000         First Quarter All Regions            10042570

2001         January       New England              509215

2001         January       Mid-Atlantic             610697

2001         January       Southeast US             379021

2001         January       All Regions             1498933

2001         February      New England              615746

2001         February      Mid-Atlantic             428676

2001         February      Southeast US             618423

2001         February      All Regions             1662845

2001         March         New England              566483

2001         March         Mid-Atlantic             637031

2001         March         Southeast US             655993

2001         March         All Regions             1859507

2001         First Quarter All Regions             5021285

All Years    First Quarter All Regions            15063855



27 rows selected.

The NVL function works pretty well for this example. However, if the data itself contains some NULL values, it becomes impossible to distinguish whether a NULL value represents unavailable data or a subtotal row. The NVL function will cause a problem in such a case. The following data can be used to illustrate this problem:

SELECT * FROM disputed_orders;



ORDER_NBR   CUST_NBR SALES_EMP_ID SALE_PRICE ORDER_DT  EXPECTED_ STATUS

---------- ---------- ------------ ---------- --------- --------- ---------

      1001          1         7354         99 22-JUL-01 23-JUL-01 DELIVERED

      1000          1         7354            19-JUL-01 24-JUL-01

      1002          5         7368            12-JUL-01 25-JUL-01

      1003          4         7654         56 16-JUL-01 26-JUL-01 DELIVERED

      1004          4         7654         34 18-JUL-01 27-JUL-01 PENDING

      1005          8         7654         99 22-JUL-01 24-JUL-01 DELIVERED

      1006          1         7354            22-JUL-01 28-JUL-01

      1007          5         7368         25 20-JUL-01 22-JUL-01 PENDING

      1008          5         7368         25 21-JUL-01 23-JUL-01 PENDING

      1009          1         7354         56 18-JUL-01 22-JUL-01 DELIVERED

      1012          1         7354         99 22-JUL-01 23-JUL-01 DELIVERED

      1011          1         7354            19-JUL-01 24-JUL-01

      1015          5         7368            12-JUL-01 25-JUL-01

      1017          4         7654         56 16-JUL-01 26-JUL-01 DELIVERED

      1019          4         7654         34 18-JUL-01 27-JUL-01 PENDING

      1021          8         7654         99 22-JUL-01 24-JUL-01 DELIVERED

      1023          1         7354            22-JUL-01 28-JUL-01

      1025          5         7368         25 20-JUL-01 22-JUL-01 PENDING

      1027          5         7368         25 21-JUL-01 23-JUL-01 PENDING

      1029          1         7354         56 18-JUL-01 22-JUL-01 DELIVERED



20 rows selected.

Note that the column status contains NULL values. If you want the summary status of orders for each customer, and you executed the following query (note the application of NVL to the status column), the output might surprise you.

SELECT NVL(TO_CHAR(cust_nbr), 'All Customers') customer, 

NVL(status, 'All Status') status,

COUNT(*) FROM disputed_orders

GROUP BY CUBE(cust_nbr, status);



CUSTOMER                                 STATUS                 COUNT(*)

---------------------------------------- -------------------- ----------

All Customers                            All Status                    6

All Customers                            All Status                   20

All Customers                            PENDING                       6

All Customers                            DELIVERED                     8

1                                        All Status                    4

1                                        All Status                    8

1                                        DELIVERED                     4

4                                        All Status                    4

4                                        PENDING                       2

4                                        DELIVERED                     2

5                                        All Status                    2

5                                        All Status                    6

5                                        PENDING                       4

8                                        All Status                    2

8                                        DELIVERED                     2



15 rows selected.

This output doesn't make any sense. The problem is that any time the status column legitimately contains a NULL value, the NVL function returns the string "All Status." Obviously, NVL isn't useful in this situation. However, don't worry?Oracle provides a solution to this problem through the GROUPING function.

The GROUPING function is meant to be used in conjunction with either a ROLLUP or a CUBE operation. The GROUPING function takes a grouping column name as input and returns either 1 or 0. A 1 is returned if the column's value is NULL as the result of aggregation (ROLLUP or CUBE); otherwise, 0 is returned. The general syntax of the GROUPING function is:

SELECT  . . .  [GROUPING(grouping_column_name)]  . . . 

FROM  . . . 

GROUP BY  . . .  {ROLLUP | CUBE} (grouping_column_name)

The following example illustrates the use of GROUPING function in a simple way by returning the GROUPING function results for the three columns passed to ROLLUP:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales),

GROUPING(o.year) y, GROUPING(o.month) m, GROUPING(r.name) r

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP (o.year, o.month, r.name);



YEAR MONTH     REGION               SUM(O.TOT_SALES)    Y     M     R

----- --------- -------------------- ---------------- ---- ----- -----

 2000 January   New England                   1018430    0     0     0

 2000 January   Mid-Atlantic                  1221394    0     0     0

 2000 January   Southeast US                   758042    0     0     0

 2000 January                                 2997866    0     0     1

 2000 February  New England                   1231492    0     0     0

 2000 February  Mid-Atlantic                   857352    0     0     0

 2000 February  Southeast US                  1236846    0     0     0

 2000 February                                3325690    0     0     1

 2000 March     New England                   1132966    0     0     0

 2000 March     Mid-Atlantic                  1274062    0     0     0

 2000 March     Southeast US                  1311986    0     0     0

 2000 March                                   3719014    0     0     1

 2000                                        10042570    0     1     1

 2001 January   New England                    509215    0     0     0

 2001 January   Mid-Atlantic                   610697    0     0     0

 2001 January   Southeast US                   379021    0     0     0

 2001 January                                 1498933    0     0     1

 2001 February  New England                    615746    0     0     0

 2001 February  Mid-Atlantic                   428676    0     0     0

 2001 February  Southeast US                   618423    0     0     0

 2001 February                                1662845    0     0     1

 2001 March     New England                    566483    0     0     0

 2001 March     Mid-Atlantic                   637031    0     0     0

 2001 March     Southeast US                   655993    0     0     0

 2001 March                                   1859507    0     0     1

 2001                                         5021285    0     1     1

                                             15063855    1     1     1



27 rows selected.

Look at the y, m, and r columns in this output. Row 4 is a region-level subtotal for a particular month and year, and therefore, the GROUPING function results in a value of 1 for the region and a value 0 for the month and year. Row 26 (the second to last) is a subtotal for all regions and months for a particular year, and therefore, the GROUPING function prints 1 for the month and the region and 0 for the year. Row 27 (the grand total) contains 1 for all the GROUPING columns.

With a combination of GROUPING and DECODE (or CASE), you can produce more readable query output when using CUBE and ROLLUP, as in the following example:

SELECT DECODE(GROUPING(o.year), 1, 'All Years', o.year) Year,

DECODE(GROUPING(o.month), 1, 'All Months', 

TO_CHAR(TO_DATE(o.month, 'MM'), 'Month')) Month,

DECODE(GROUPING(r.name), 1, 'All Regions', r.name) Region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY ROLLUP (o.year, o.month, r.name);



YEAR             MONTH      REGION               SUM(O.TOT_SALES)

---------------- ---------- -------------------- ----------------

2000             January    New England                   1018430

2000             January    Mid-Atlantic                  1221394

2000             January    Southeast US                   758042

2000             January    All Regions                   2997866

2000             February   New England                   1231492

2000             February   Mid-Atlantic                   857352

2000             February   Southeast US                  1236846

2000             February   All Regions                   3325690

2000             March      New England                   1132966

2000             March      Mid-Atlantic                  1274062

2000             March      Southeast US                  1311986

2000             March      All Regions                   3719014

2000             All Months All Regions                  10042570

2001             January    New England                    509215

2001             January    Mid-Atlantic                   610697

2001             January    Southeast US                   379021

2001             January    All Regions                   1498933

2001             February   New England                    615746

2001             February   Mid-Atlantic                   428676

2001             February   Southeast US                   618423

2001             February   All Regions                   1662845

2001             March      New England                    566483

2001             March      Mid-Atlantic                   637031

2001             March      Southeast US                   655993

2001             March      All Regions                   1859507

2001             All Months All Regions                   5021285

All Years        All Months All Regions                  15063855



27 rows selected.

By using DECODE with GROUPING, we produced the same result that was produced by using NVL at the beginning of the section. However, the risk of mistreating a NULL data value as a summary row is eliminated by using GROUPING and DECODE. You will notice this in the following example, in which NULL data values in subtotal and total rows are treated differently by the GROUPING function than the NULL values in the summary rows:

SELECT DECODE(GROUPING(cust_nbr), 1, 'All Customers', cust_nbr) customer,

DECODE(GROUPING(status), 1, 'All Status', status) status, COUNT(*)

FROM disputed_orders

GROUP BY CUBE(cust_nbr, status);



CUSTOMER                                 STATUS                 COUNT(*)

---------------------------------------- -------------------- ----------

All Customers                                                          6

All Customers                            All Status                   20

All Customers                            PENDING                       6

All Customers                            DELIVERED                     8

1                                                                      4

1                                        All Status                    8

1                                        DELIVERED                     4

4                                        All Status                    4

4                                        PENDING                       2

4                                        DELIVERED                     2

5                                                                      2

5                                        All Status                    6

5                                        PENDING                       4

8                                        All Status                    2

8                                        DELIVERED                     2



15 rows selected.

13.1.7 GROUPING SETS

Earlier in this chapter, you saw how to generate summary information using ROLLUP and CUBE. However, the output of ROLLUP and CUBE include the rows produced by the regular GROUP BY operation along with the summary rows. Oracle9i introduced another extension to the GROUP BY clause called GROUPING SETS that you can use to generate summary information at the level you choose without including all the rows produced by the regular GROUP BY operation.

Like ROLLUP and CUBE, GROUPING SETS is also an extension of the GROUP BY clause, and can appear in a query only along with a GROUP BY clause. The syntax of GROUPING SETS is:

SELECT  . . . 

FROM  . . . 

GROUP BY GROUPING SETS (list of grouping columns)

Let's take an example to understand the GROUPING SETS operation further:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.year, o.month, r.name);



      YEAR MONTH     REGION               SUM(O.TOT_SALES)

---------- --------- -------------------- ----------------

                     Mid-Atlantic                  5029212

                     New England                   5074332

                     Southeast US                  4960311

           January                                 4496799

           February                                4988535

           March                                   5578521

      2000                                        10042570

      2001                                         5021285



8 rows selected.

This output contains only the subtotals at the region, month, and year levels, but that none of the normal, more detailed, GROUP BY data is included. The order of columns in the GROUPING SETS operation is not critical. The operation produces the same output regardless of the order of the columns. For example:

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,

r.name region, SUM(o.tot_sales)

FROM all_orders o JOIN region r

ON r.region_id = o.region_id

WHERE o.month BETWEEN 1 AND 3

GROUP BY GROUPING SETS (o.month, r.name, o.year);



      YEAR MONTH     REGION               SUM(O.TOT_SALES)

---------- --------- -------------------- ----------------

                     Mid-Atlantic                  5029212

                     New England                   5074332

                     Southeast US                  4960311

           January                                 4496799

           February                                4988535

           March                                   5578521

      2000                                        10042570

      2001                                         5021285



8 rows selected.