Earlier in this
chapter,
you
saw
how to use the GROUPING function to distinguish between the regular
GROUP BY rows and the summary rows produced by the GROUP BY
extensions. Oracle9i extended the concept of the
GROUPING function and introduced two more functions that you can use
with a GROUP BY clause:
These functions can be used only with a GROUP BY clause. However,
unlike the GROUPING function that can only be used with a GROUP BY
extension, the GROUPING_ID and GROUP_ID functions can be used in a
query, even without a GROUP BY extension.
 |
Although it is legal to use these two functions without a GROUP BY
extension, using GROUPING_ID and GROUP_ID without ROLLUP, CUBE, or
GROUPING SETS doesn't produce any meaningful output,
because GROUPING_ID and GROUP_ID are 0 for all regular GROUP BY rows.
|
|
The following sections discuss these two functions in detail.
13.3.1 GROUPING_ID
The syntax of
the GROUPING_ID function is as follows:
SELECT . . . , GROUPING_ID(ordered_list_of_grouping_columns)
FROM . . .
GROUP BY . . .
The GROUPING_ID function takes an ordered list of grouping columns as
input, and computes the output by working through the following
steps:
It generates the results of the GROUPING function as applied to each
of the individual columns in the list. The result of this step is a
set of ones and zeros.
It puts these ones and zeros in the same order as the order of the
columns in its argument list to produce a bit vector.
Treating this bit vector (a series of ones and zeros) as a binary
number, it converts the bit vector into a decimal (base 10) number.
The decimal number computed in Step 3 is returned as the GROUPING_ID
function's output.
The following example illustrates this process and compares the
results from GROUPING_ID with those from GROUPING:
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total,
GROUPING(o.year) y, GROUPING(o.month) m, GROUPING(r.name) r,
GROUPING_ID (o.year, o.month, r.name) gid
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month, r.name);
YEAR MONTH REGION TOTAL Y M R GID
---- --------- -------------- ---------- --- ---- --- ------
2000 January Mid-Atlantic 1221394 0 0 0 0
2000 January New England 1018430 0 0 0 0
2000 January Southeast US 758042 0 0 0 0
2000 January 2997866 0 0 1 1
2000 February Mid-Atlantic 857352 0 0 0 0
2000 February New England 1231492 0 0 0 0
2000 February Southeast US 1236846 0 0 0 0
2000 February 3325690 0 0 1 1
2000 March Mid-Atlantic 1274062 0 0 0 0
2000 March New England 1132966 0 0 0 0
2000 March Southeast US 1311986 0 0 0 0
2000 March 3719014 0 0 1 1
2000 Mid-Atlantic 3352808 0 1 0 2
2000 New England 3382888 0 1 0 2
2000 Southeast US 3306874 0 1 0 2
2000 10042570 0 1 1 3
2001 January Mid-Atlantic 610697 0 0 0 0
2001 January New England 509215 0 0 0 0
2001 January Southeast US 379021 0 0 0 0
2001 January 1498933 0 0 1 1
2001 February Mid-Atlantic 428676 0 0 0 0
2001 February New England 615746 0 0 0 0
2001 February Southeast US 618423 0 0 0 0
2001 February 1662845 0 0 1 1
2001 March Mid-Atlantic 637031 0 0 0 0
2001 March New England 566483 0 0 0 0
2001 March Southeast US 655993 0 0 0 0
2001 March 1859507 0 0 1 1
2001 Mid-Atlantic 1676404 0 1 0 2
2001 New England 1691444 0 1 0 2
2001 Southeast US 1653437 0 1 0 2
2001 5021285 0 1 1 3
January Mid-Atlantic 1832091 1 0 0 4
January New England 1527645 1 0 0 4
January Southeast US 1137063 1 0 0 4
January 4496799 1 0 1 5
February Mid-Atlantic 1286028 1 0 0 4
February New England 1847238 1 0 0 4
February Southeast US 1855269 1 0 0 4
February 4988535 1 0 1 5
March Mid-Atlantic 1911093 1 0 0 4
March New England 1699449 1 0 0 4
March Southeast US 1967979 1 0 0 4
March 5578521 1 0 1 5
Mid-Atlantic 5029212 1 1 0 6
New England 5074332 1 1 0 6
Southeast US 4960311 1 1 0 6
15063855 1 1 1 7
48 rows selected.
Note that the GROUPING_ID is the decimal equivalent of the bit vector
generated by the individual GROUPING functions. In this output, the
GROUPING_ID has values 0, 1, 2, 3, 4, 5, 6,
and 7. Table 13-2
describes these aggregation levels.
Table 13-2. Result of GROUPING_ID(o.year, o.month, r.name)
|
Aggregation level
|
Bit vector
|
GROUPING_ID
|
|---|
|
Regular GROUP BY rows
|
0 0 0
|
0
|
|
Subtotal for Year-Month, aggregated at (Region)
|
0 0 1
|
1
|
|
Subtotal for Year-Region, aggregated at (Month)
|
0 1 0
|
2
|
|
Subtotal for Year, aggregated at (Month, Region)
|
0 1 1
|
3
|
|
Subtotal for Month-Region, aggregated at (Year)
|
1 0 0
|
4
|
|
Subtotal for Month, aggregated at (Year, Region)
|
1 0 1
|
5
|
|
Subtotal for Region, aggregated at (Year, Month)
|
1 1 0
|
6
|
|
Grand total for all levels, aggregated at (Year, Month, Region)
|
1 1 1
|
7
|
The GROUPING_ID function can be used effectively in a query to filter
rows according to your requirement. Let's say you
want only the summary rows to be displayed in the output, and not the
regular GROUP BY rows. You can use the GROUPING_ID function in the
HAVING clause to do this by restricting output to only those rows
that contain totals and subtotals (i.e., for which GROUPING_ID >
0):
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month, r.name)
HAVING GROUPING_ID (o.year, o.month, r.name) > 0;
YEAR MONTH REGION TOTAL
--------- --------- -------------------- ----------
15063855
New England 5074332
Mid-Atlantic 5029212
Southeast US 4960311
January 4496799
January New England 1527645
January Mid-Atlantic 1832091
January Southeast US 1137063
February 4988535
February New England 1847238
February Mid-Atlantic 1286028
February Southeast US 1855269
March 5578521
March New England 1699449
March Mid-Atlantic 1911093
March Southeast US 1967979
2000 10042570
2000 New England 3382888
2000 Mid-Atlantic 3352808
2000 Southeast US 3306874
2000 January 2997866
2000 February 3325690
2000 March 3719014
2001 5021285
2001 New England 1691444
2001 Mid-Atlantic 1676404
2001 Southeast US 1653437
2001 January 1498933
2001 February 1662845
2001 March 1859507
30 rows selected.
As you can see, GROUPING_ID makes it easier to filter the output of
aggregation operations. Without the GROUPING_ID function, you have to
write a more complex query using the GROUPING function to achieve the
same result. For example, the following query uses GROUPING rather
than GROUPING_ID to display only totals and subtotals. Note the added
complexity in the HAVING clause.
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month, r.name)
HAVING GROUPING(o.year) > 0
OR GROUPING(o.month) > 0
OR GROUPING(r.name) > 0;
YEAR MONTH REGION TOTAL
------- --------- -------------------- ----------
15063855
New England 5074332
Mid-Atlantic 5029212
Southeast US 4960311
January 4496799
January New England 1527645
January Mid-Atlantic 1832091
January Southeast US 1137063
February 4988535
February New England 1847238
February Mid-Atlantic 1286028
February Southeast US 1855269
March 5578521
March New England 1699449
March Mid-Atlantic 1911093
March Southeast US 1967979
2000 10042570
2000 New England 3382888
2000 Mid-Atlantic 3352808
2000 Southeast US 3306874
2000 January 2997866
2000 February 3325690
2000 March 3719014
2001 5021285
2001 New England 1691444
2001 Mid-Atlantic 1676404
2001 Southeast US 1653437
2001 January 1498933
2001 February 1662845
2001 March 1859507
30 rows selected.
13.3.2 GROUPING and GROUPING_ID in ORDER BY
The GROUPING and
GROUPING_ID
functions not only help you filter rows returned from queries using
CUBE and ROLLUP, they can also help you to order those rows in a
meaningful way. The order of the rows in a query's
output is not guaranteed unless you use an ORDER BY clause in the
query. However, if you order the results of a CUBE or ROLLUP query by
one dimension, the order of the results may not be meaningful with
respect to other dimensions. In such an aggregate query, you may
prefer to order the results based on the number of dimensions
involved rather than by individual dimensions. For example, when
executing the previous section's query, you may
prefer to see the output rows in the following order:
Those rows representing an aggregate in one dimension
Those rows representing an aggregate in two dimensions
Those rows representing an aggregate in three dimensions
To achieve this ordering of rows, you need to use an ORDER BY clause
that uses a combination of GROUPING and GROUPING_ID functions, as
shown in the following example:
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total,
GROUPING_ID (o.year, o.month, r.name) gid,
GROUPING(o.year) + GROUPING(o.month) + GROUPING(r.name) sum_grouping
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month, r.name)
HAVING GROUPING(o.year) > 0
OR GROUPING(o.month) > 0
OR GROUPING(r.name) > 0
ORDER BY (GROUPING(o.year) + GROUPING(o.month) + GROUPING(r.name)),
GROUPING_ID (o.year, o.month, r.name);
YEAR MONTH REGION TOTAL GID SUM_GROUPING
------ --------- -------------- ---------- ----- ------------
2000 January 2997866 1 1
2000 February 3325690 1 1
2000 March 3719014 1 1
2001 March 1859507 1 1
2001 February 1662845 1 1
2001 January 1498933 1 1
2000 New England 3382888 2 1
2001 Mid-Atlantic 1676404 2 1
2001 Southeast US 1653437 2 1
2001 New England 1691444 2 1
2000 Mid-Atlantic 3352808 2 1
2000 Southeast US 3306874 2 1
January New England 1527645 4 1
January Mid-Atlantic 1832091 4 1
January Southeast US 1137063 4 1
February Southeast US 1855269 4 1
March Mid-Atlantic 1911093 4 1
March New England 1699449 4 1
February Mid-Atlantic 1286028 4 1
February New England 1847238 4 1
March Southeast US 1967979 4 1
2000 10042570 3 2
2001 5021285 3 2
January 4496799 5 2
March 5578521 5 2
February 4988535 5 2
New England 5074332 6 2
Mid-Atlantic 5029212 6 2
Southeast US 4960311 6 2
15063855 7 3
In this output, the aggegate rows for individual dimensions, region,
month, and year are shown first. These are followed by the aggregate
rows for two dimensions: month and region, year and region, and year
and month, respectively. The last row is the one aggregated over all
three dimensions.
13.3.3 GROUP_ID
As you saw in
previous sections, Oracle9i Database allows you
to have repeating grouping columns and multiple grouping operations
in a GROUP BY clause. Some combinations could result in duplicate
rows in the output. The GROUP_ID distinguishes between otherwise
duplicate result rows.
The syntax of the GROUP_ID function is:
SELECT . . . , GROUP_ID( )
FROM . . .
GROUP BY . . .
The GROUP_ID function takes no argument, and returns 0 through n - 1,
where n is the occurrence count for duplicates. The first occurrence
of a given row in the output of a query will have a GROUP_ID of 0,
the second occurrence of a given row will have a GROUP_ID of 1, and
so forth. The following example illustrates the use of the GROUP_ID
function:
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total, GROUP_ID( )
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY o.year, ROLLUP (o.year, o.month, r.name);
YEAR MONTH REGION TOTAL GROUP_ID( )
---------- --------- -------------------- ---------- ----------
2000 January Mid-Atlantic 1221394 0
2000 January New England 1018430 0
2000 January Southeast US 758042 0
2000 January 2997866 0
2000 February Mid-Atlantic 857352 0
2000 February New England 1231492 0
2000 February Southeast US 1236846 0
2000 February 3325690 0
2000 March Mid-Atlantic 1274062 0
2000 March New England 1132966 0
2000 March Southeast US 1311986 0
2000 March 3719014 0
2001 January Mid-Atlantic 610697 0
2001 January New England 509215 0
2001 January Southeast US 379021 0
2001 January 1498933 0
2001 February Mid-Atlantic 428676 0
2001 February New England 615746 0
2001 February Southeast US 618423 0
2001 February 1662845 0
2001 March Mid-Atlantic 637031 0
2001 March New England 566483 0
2001 March Southeast US 655993 0
2001 March 1859507 0
2000 10042570 0
2001 5021285 0
2000 10042570 1
2001 5021285 1
28 rows selected.
Note that the value 1 is returned by the GROUP_ID function for the
last two rows. These rows are indeed duplicates of the previous two
rows. If you don't want to see the duplicates in
your result set, restrict your query's results to
GROUP_ID 0:
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY o.year, ROLLUP (o.year, o.month, r.name)
HAVING GROUP_ID( ) = 0;
YEAR MONTH REGION TOTAL
---------- --------- -------------------- ----------
2000 January New England 1018430
2000 January Mid-Atlantic 1221394
2000 January Southeast US 758042
2000 January 2997866
2000 February New England 1231492
2000 February Mid-Atlantic 857352
2000 February Southeast US 1236846
2000 February 3325690
2000 March New England 1132966
2000 March Mid-Atlantic 1274062
2000 March Southeast US 1311986
2000 March 3719014
2001 January New England 509215
2001 January Mid-Atlantic 610697
2001 January Southeast US 379021
2001 January 1498933
2001 February New England 615746
2001 February Mid-Atlantic 428676
2001 February Southeast US 618423
2001 February 1662845
2001 March New England 566483
2001 March Mid-Atlantic 637031
2001 March Southeast US 655993
2001 March 1859507
2000 10042570
2001 5021285
26 rows selected.
This version of the query uses HAVING GROUP_ID( ) =
0 to eliminate the two duplicate totals from the result
set. GROUP_ID is only meaningful in the HAVING clause, because it
applies to summarized data. You can't use GROUP_ID
in a WHERE clause, and it wouldn't make sense to
try.