14.2 Ranking Functions

Determining the performance of a particular business entity compared to its peers is central to a wide variety of business decisions. Examples include:

Identifying assets with the highest utilization
Determining the worst selling products by region
Finding the best performing salespeople

Prior to the release of Oracle8i Database, you could use the ORDER BY clause to sort a result set on one or more columns, but any further processing to calculate rankings or percentiles had to be performed using a procedural language. Beginning with Oracle8i Database, however, you can take advantage of several new functions to either generate rankings for each row in a result set or to group rows into buckets for percentile calculations.

14.2.1 RANK, DENSE_RANK, and ROW_NUMBER

The RANK, DENSE_RANK, and ROW_NUMBER functions generate an integer value from 1 to N for each row, where N is less than or equal to the number of rows in the result set. The differences in the values returned by these functions revolves around how each one handles ties:

ROW_NUMBER: Returns a unique number for each row starting with 1. For rows that have duplicate values, numbers are arbitrarily assigned.
DENSE_RANK: Assigns a unique number for each row starting with 1, except for rows that have duplicate values, in which case the same ranking is assigned.
RANK: Assigns a unique number for each row starting with 1, except for rows that have duplicate values, in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

An example will best illustrate the differences. First, here is the query to generate the aggregate sales data by region and customer for the year 2001:

SELECT region_id, cust_nbr, 

  SUM(tot_sales) cust_sales

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY region_id, cust_nbr;



REGION_ID    CUST_NBR CUST_SALES

---------- ---------- ----------

         5          1    1151162

         5          2    1224992

         5          3    1161286

         5          4    1878275

         5          5    1169926

         6          6    1788836

         6          7     971585

         6          8    1141638

         6          9    1208959

         6         10    1196748

         7         11    1190421

         7         12    1182275

         7         13    1310434

         7         14    1929774

         7         15    1255591

         8         16    1068467

         8         17    1944281

         8         18    1253840

         8         19    1174421

         8         20    1413722

         9         21    1020541

         9         22    1036146

         9         23    1224992

         9         24    1224992

         9         25    2232703

        10         26    1808949

        10         27    1322747

        10         28     986964

        10         29     903383

        10         30    1216858

Notice that three of the customers (2, 23, and 24) have the same value for total sales ($1,224,992). In the next query, three function calls are added to generate rankings for each customer across all regions, and the results are then ordered by the ROW_NUMBER function to make the difference in rankings easier to observe:

SELECT region_id, cust_nbr, 

  SUM(tot_sales) cust_sales,

  RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) sales_rank,

  DENSE_RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) sales_dense_rank,

  ROW_NUMBER( ) OVER (ORDER BY SUM(tot_sales) DESC) sales_number

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY sales_number;



REGION_ID   CUST_NBR  CUST_SALES SALES_RANK SALES_DENSE_RANK SALES_NUMBER

---------- ---------- ---------- ---------- ---------------- ------------

         9         25    2232703          1                1            1

         8         17    1944281          2                2            2

         7         14    1929774          3                3            3

         5          4    1878275          4                4            4

        10         26    1808949          5                5            5

         6          6    1788836          6                6            6

         8         20    1413722          7                7            7

        10         27    1322747          8                8            8

         7         13    1310434          9                9            9

         7         15    1255591         10               10           10

         8         18    1253840         11               11           11

         5          2    1224992         12               12           12

         9         23    1224992         12               12           13

         9         24    1224992         12               12           14

        10         30    1216858         15               13           15

         6          9    1208959         16               14           16

         6         10    1196748         17               15           17

         7         11    1190421         18               16           18

         7         12    1182275         19               17           19

         8         19    1174421         20               18           20

         5          5    1169926         21               19           21

         5          3    1161286         22               20           22

         5          1    1151162         23               21           23

         6          8    1141638         24               22           24

         8         16    1068467         25               23           25

         9         22    1036146         26               24           26

         9         21    1020541         27               25           27

        10         28     986964         28               26           28

         6          7     971585         29               27           29

        10         29     903383         30               28           30

Don't be confused by the ORDER BY clause at the end of the query and the ORDER BY clauses within each function call; the functions use their ORDER BY clauses internally to sort their results for the purpose of applying a ranking. Thus, each of the three functions applies its ranking algorithm to the sum of each customer's sales in descending order. The final ORDER BY clause specifies the results of the ROW_NUMBER function as the sort key for the final result set, but we could have picked any of the six columns as our sort key.

Both the RANK and DENSE_RANK functions assign the rank of 12 to the three rows with total sales of $1,224,992, while the ROW_NUMBER function assigns the ranks 12, 13, and 14 to the same rows. The difference between the RANK and DENSE_RANK functions manifests itself in the ranking assigned to the next-lowest sales total; the RANK function leaves a gap in the ranking sequence and assigns a rank of 15 to customer number 30, while the DENSE_RANK function continues the sequence with a ranking of 13.

Deciding which of the three functions to use depends on the desired outcome. If you want to identify the top 13 customers from this result set, you would use:

ROW_NUMBER: If you want exactly 13 rows without regard to ties. In this case, one of the customers who might otherwise be included in the list will be excluded from the final set.
RANK: If you want at least 13 rows but don't want to include rows that would have been excluded had there been no ties. In this case, you would retrieve 14 rows.
DENSE_RANK: If you want all customers with a ranking of 13 or less, including all duplicates. In this case, you would retrieve 15 rows.

While the previous query generates rankings across the entire result set, it is also possible to generate independent sets of rankings across multiple partitions of the result set. The following query generates rankings for customer sales within each region rather than across all regions. Note the addition of the PARTITION BY clause:

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,

  RANK( ) OVER (PARTITION BY region_id

    ORDER BY SUM(tot_sales) DESC) sales_rank,

  DENSE_RANK( ) OVER (PARTITION BY region_id

    ORDER BY SUM(tot_sales) DESC) sales_dense_rank,

  ROW_NUMBER( ) OVER (PARTITION BY region_id

    ORDER BY SUM(tot_sales) DESC) sales_number

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY region_id, sales_number;



REGION_ID    CUST_NBR CUST_SALES SALES_RANK SALES_DENSE_RANK SALES_NUMBER

---------- ---------- ---------- ---------- ---------------- ------------

         5          4    1878275          1                1            1

         5          2    1224992          2                2            2

         5          5    1169926          3                3            3

         5          3    1161286          4                4            4

         5          1    1151162          5                5            5

         6          6    1788836          1                1            1

         6          9    1208959          2                2            2

         6         10    1196748          3                3            3

         6          8    1141638          4                4            4

         6          7     971585          5                5            5

         7         14    1929774          1                1            1

         7         13    1310434          2                2            2

         7         15    1255591          3                3            3

         7         11    1190421          4                4            4

         7         12    1182275          5                5            5

         8         17    1944281          1                1            1

         8         20    1413722          2                2            2

         8         18    1253840          3                3            3

         8         19    1174421          4                4            4

         8         16    1068467          5                5            5

         9         25    2232703          1                1            1

         9         23    1224992          2                2            2

         9         24    1224992          2                2            3

         9         22    1036146          4                3            4

         9         21    1020541          5                4            5

        10         26    1808949          1                1            1

        10         27    1322747          2                2            2

        10         30    1216858          3                3            3

        10         28     986964          4                4            4

        10         29     903383          5                5            5

Each customer receives a ranking between one and five depending on their relation to other customers in the same region. Of the three customers with duplicate total sales, two of them are in region 9; as before, the RANK and DENSE_RANK functions generate identical rankings for both customers.

The PARTITION BY clause used in ranking functions is used to divide a result set into pieces so that rankings can be applied within each subset. This is completely different from the PARTITION BY RANGE/HASH/LIST clauses introduced in Chapter 10 for breaking a table or index into multiple pieces.

14.2.1.1 Handling NULLs

All ranking functions allow you to specify where in the ranking order NULL values should appear. This is accomplished by appending either NULLS FIRST or NULLS LAST after the ORDER BY clause of the function, as in:

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,

  RANK( ) OVER (ORDER BY SUM(tot_sales) DESC NULLS LAST) sales_rank

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr;

If omitted, NULL values will either appear last in ascending rankings or first in descending rankings.

14.2.1.2 Top/bottom N queries

One of the most common uses of a ranked data set is to identify the top N or bottom N performers. Since you can't call analytic functions from the WHERE or HAVING clauses, you are forced to generate the rankings for all the rows and then use an outer query to filter out the unwanted rankings. For example, the following query uses an inline view to identify the top five salespersons for 2001:

SELECT s.name, sp.sp_sales total_sales

FROM 

 (SELECT salesperson_id, SUM(tot_sales) sp_sales,

    RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) sales_rank

  FROM orders

  WHERE year = 2001

  GROUP BY salesperson_id) sp INNER JOIN salesperson s

  ON sp.salesperson_id = s.salesperson_id

WHERE sp.sales_rank <= 5

ORDER BY sp.sales_rank;



NAME                                              TOTAL_SALES

------------------------------------------------- -----------

Jeff Blake                                            1927580

Sam Houseman                                          1814327

Mark Russell                                          1784596

John Boorman                                          1768813

Carl Isaacs                                           1761814

Tim McGowan                                           1761814

14.2.1.3 FIRST/LAST

Although there is no function for returning only the top or bottom N from a ranked result set, Oracle provides functionality for identifying the first (top 1) or last (bottom 1) records in a ranked set. This is useful for queries such as the following: "Find the regions with the best and worst total sales last year." Unlike the top five salespeople example from the previous section, this query needs an additional piece of information?the size of the result set?to answer the question.

Oracle9i provides the ability to answer such queries efficiently using functions that rank the result set based on a specified ordering, identify the row with the top or bottom ranking, and report on any column available in the result set. These functions are composed of three parts:

An ORDER BY clause that specifies how to rank the result set.
The keywords FIRST and LAST to specify whether to use the top or bottom-ranked row.
An aggregate function (i.e., MIN, MAX, AVG, COUNT) used as a tiebreaker in case more than one row of the result set tie for the FIRST or LAST spot in the ranking.

The following query uses the MIN aggregate function to find the regions that rank FIRST and LAST by total sales:

SELECT

  MIN(region_id)

    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) best_region,

  MIN(region_id)

    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) worst_region

FROM orders

WHERE year = 2001

GROUP BY region_id;



BEST_REGION WORST_REGION

----------- ------------

          7           10

The use of the MIN function in the previous query is a bit confusing: it is used only if more than one region ties for either first or last place in the ranking. If there were a tie, the row with the minimum value for region_id would be chosen. To find out if a tie actually exists, you could call each function twice using MIN for the first and MAX for the second, and see if they return the same results:

SELECT

  MIN(region_id)

    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) min_best_region,

  MAX(region_id)

    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) max_best_region,

  MIN(region_id)

    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) min_worst_region,

  MAX(region_id)

    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) max_worst_region

FROM orders

WHERE year = 2001

GROUP BY region_id;



MIN_BEST_REGION MAX_BEST_REGION MIN_WORST_REGION MAX_WORST_REGION

--------------- --------------- ---------------- ----------------

              7               7               10               10

In this case, there are no ties for either first or last place. Depending on the type of data you are working with, using an aggregate function as a tiebreaker can can be somewhat arbitrary.

14.2.2 NTILE

Another way rankings are commonly used is to generate buckets into which sets of rankings are grouped. For example, you may want to find those customers whose total sales ranked in the top 25%. The following query uses the NTILE function to group the customers into four buckets (or quartiles):

SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,

  NTILE(4) OVER (ORDER BY SUM(tot_sales) DESC) sales_quartile

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY sales_quartile, cust_sales DESC;



REGION_ID    CUST_NBR CUST_SALES SALES_QUARTILE

---------- ---------- ---------- --------------

         9         25    2232703              1

         8         17    1944281              1

         7         14    1929774              1

         5          4    1878275              1

        10         26    1808949              1

         6          6    1788836              1

         8         20    1413722              1

        10         27    1322747              1

         7         13    1310434              2

         7         15    1255591              2

         8         18    1253840              2

         5          2    1224992              2

         9         23    1224992              2

         9         24    1224992              2

        10         30    1216858              2

         6          9    1208959              2

         6         10    1196748              3

         7         11    1190421              3

         7         12    1182275              3

         8         19    1174421              3

         5          5    1169926              3

         5          3    1161286              3

         5          1    1151162              3

         6          8    1141638              4

         8         16    1068467              4

         9         22    1036146              4

         9         21    1020541              4

        10         28     986964              4

         6          7     971585              4

        10         29     903383              4

The sales_quartile column in this query specifies NTILE(4) to create four buckets. The NTILE function finds each row's place in the ranking, and then assigns each row to a bucket such that every bucket contains the same number of rows. If the number of rows is not evenly divisible by the number of buckets, then the extra rows are distributed so that the number of rows per bucket differs by one at most. In the previous example, there are four buckets allocated for 30 rows, with buckets one and two containing eight rows each, and buckets three and four containing seven rows each. This approach is referred to as equiheight buckets because each bucket contains (optimally) the same number of rows.

Just like in the top N query discussed earlier, you will need to wrap the query in an inline view if you want to filter on the NTILE result:

SELECT r.name region, c.name customer, cs.cust_sales

FROM 

 (SELECT region_id, cust_nbr, SUM(tot_sales) cust_sales,

    NTILE(4) OVER (ORDER BY SUM(tot_sales) DESC) sales_quartile

  FROM orders

  WHERE year = 2001

  GROUP BY region_id, cust_nbr) cs INNER JOIN customer c

  ON cs.cust_nbr = c.cust_nbr

  INNER JOIN region r

  ON cs.region_id = r.region_id

WHERE cs.sales_quartile = 1



ORDER BY cs.cust_sales DESC;



REGION               CUSTOMER                       CUST_SALES

-------------------- ------------------------------ ----------

Northwest US         Worcester Technologies            2232703

Southwest US         Evans Supply Corp.                1944281

Southeast US         Madden Industries                 1929774

New England          Flowtech Inc.                     1878275

Central US           Alpha Technologies                1808949

Mid-Atlantic         Spartan Industries                1788836

Southwest US         Malden Labs                       1413722

Central US           Phillips Labs                     1322747

The outer query filters on sales_quartile = 1, which removes all rows not in the top 25% of sales, and then joins the region and customer dimensions to generate the final results.

14.2.3 WIDTH_BUCKET

Similar to the NTILE function, the WIDTH_BUCKET function groups rows of the result set into buckets. Unlike NTILE, however, the WIDTH_BUCKET function attempts to create equiwidth buckets, meaning that the range of values is evenly distributed across the buckets. If your data were distributed across a bell curve, therefore, you could expect the buckets representing the low and high ranges of the bell curve to contain few records, whereas the buckets representing the middle ranges would contain many records.

WIDTH_BUCKET can operate on numeric or date types, and takes the following four parameters:

The expression that generates the buckets
The value used as the start of the range for bucket #1
The value used as the end of the range for bucket #N
The number of buckets to create (N)

WIDTH_BUCKET uses the values of the second, third, and fourth parameters to generate N buckets containing comparable ranges. If the expression yields values that fall outside the range specified by the second and third parameters, the WIDTH_BUCKET function will generate two additional buckets, numbered 0 and N + 1, into which the outliers are placed. If you want to work with the entire result set, you need to make sure your values for the second and third parameters completely enclose the range of values in the result set. However, if you only wish to work with a subset of the data, you can specify values for the second and third parameters that enclose the desired range, and any rows falling outside the range will be placed into buckets 0 and N + 1.

Here's an example that uses the NTILE example from earlier to generate three buckets for the total sales per customer:

SELECT region_id, cust_nbr,

  SUM(tot_sales) cust_sales,

  WIDTH_BUCKET(SUM(tot_sales), 1, 3000000, 3) sales_buckets

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY cust_sales;



REGION_ID   CUST_NBR CUST_SALES SALES_BUCKETS

---------- ---------- ---------- -------------

        10         29     903383             1

         6          7     971585             1

        10         28     986964             1

         9         21    1020541             2

         9         22    1036146             2

         8         16    1068467             2

         6          8    1141638             2

         5          1    1151162             2

         5          3    1161286             2

         5          5    1169926             2

         8         19    1174421             2

         7         12    1182275             2

         7         11    1190421             2

         6         10    1196748             2

         6          9    1208959             2

        10         30    1216858             2

         5          2    1224992             2

         9         24    1224992             2

         9         23    1224992             2

         8         18    1253840             2

         7         15    1255591             2

         7         13    1310434             2

        10         27    1322747             2

         8         20    1413722             2

         6          6    1788836             2

        10         26    1808949             2

         5          4    1878275             2

         7         14    1929774             2

         8         17    1944281             2

         9         25    2232703             3

Based on these parameters, the WIDTH_BUCKET function generates three buckets; the first bucket starts at 1, and the third bucket has an upper range of 3,000,000. Since there are three buckets, the ranges for each bucket will be 1 to 1,000,000, 1,000,001 to 2,000,000, and 2,000,001 to 3,000,000. When the rows are placed in the appropriate bucket, there are three rows that fall into bucket #1, a single row that falls in bucket #3, and the remaining 26 rows that fall into the second bucket.

The values 1 and 3,000,000 were chosen to guarantee that all rows in the result set would be placed into one of the three buckets. If you want to generate buckets only for rows that have aggregate sales between $1,000,000 and $2,000,000, the WIDTH_BUCKET function will place the remaining rows in the 0^th and 4^th buckets:

SELECT region_id, cust_nbr,

  SUM(tot_sales) cust_sales,

  WIDTH_BUCKET(SUM(tot_sales), 1000000, 2000000, 3) sales_buckets

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY cust_sales;



REGION_ID   CUST_NBR CUST_SALES SALES_BUCKETS

---------- ---------- ---------- -------------

        10         29     903383             0

         6          7     971585             0

        10         28     986964             0

         9         21    1020541             1

         9         22    1036146             1

         8         16    1068467             1

         6          8    1141638             1

         5          1    1151162             1

         5          3    1161286             1

         5          5    1169926             1

         8         19    1174421             1

         7         12    1182275             1

         7         11    1190421             1

         6         10    1196748             1

         6          9    1208959             1

        10         30    1216858             1

         5          2    1224992             1

         9         24    1224992             1

         9         23    1224992             1

         8         18    1253840             1

         7         15    1255591             1

         7         13    1310434             1

        10         27    1322747             1

         8         20    1413722             2

         6          6    1788836             3

        10         26    1808949             3

         5          4    1878275             3

         7         14    1929774             3

         8         17    1944281             3

         9         25    2232703             4

Keep in mind that the WIDTH_BUCKET function does not remove rows from the result set that do not lie within the specified range; rather, they are placed into special buckets that your query can either utilize or ignore as needed.

14.2.4 CUME_DIST and PERCENT_RANK

The final two ranking functions, CUME_DIST and PERCENT_RANK, use the rank of a particular row to calculate additional information. The CUME_DIST function (short for Cumulative Distribution) calculates the ratio of the number of rows that have a lesser or equal ranking to the total number of rows in the partition. The PERCENT_RANK function calculates the ratio of a row's ranking to the number of rows in the partition using the formula:

(RRP -- 1) / (NRP -- 1)

where RRP is the "rank of row in partition," and NRP is the "number of rows in partition."

Both functions utilize DENSE_RANK for their rankings and can be specified to be in ascending or descending order. The following query demonstrates the use of these two functions (both specifying descending order) with the customer yearly sales query:

SELECT region_id, cust_nbr, 

  SUM(tot_sales) cust_sales,

  CUME_DIST( ) OVER (ORDER BY SUM(tot_sales) DESC) sales_cume_dist,

  PERCENT_RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) sales_percent_rank

FROM orders

WHERE year = 2001

GROUP BY region_id, cust_nbr

ORDER BY cust_sales DESC;



REGION_ID   CUST_NBR  CUST_SALES SALES_CUME_DIST SALES_PERCENT_RANK

---------- ---------- ---------- --------------- ------------------

         9         25    2232703      .033333333                  0

         8         17    1944281      .066666667         .034482759

         7         14    1929774              .1         .068965517

         5          4    1878275      .133333333         .103448276

        10         26    1808949      .166666667         .137931034

         6          6    1788836              .2         .172413793

         8         20    1413722      .233333333         .206896552

        10         27    1322747      .266666667         .24137931

         7         13    1310434              .3         .275862069

         7         15    1255591      .333333333         .310344828

         8         18    1253840      .366666667         .344827586

         5          2    1224992      .466666667         .379310345

         9         23    1224992      .466666667         .379310345

         9         24    1224992      .466666667         .379310345

        10         30    1216858              .5         .482758621

         6          9    1208959      .533333333         .517241379

         6         10    1196748      .566666667         .551724138

         7         11    1190421              .6         .586206897

         7         12    1182275      .633333333         .620689655

         8         19    1174421      .666666667         .655172414

         5          5    1169926              .7         .689655172

         5          3    1161286      .733333333         .724137931

         5          1    1151162      .766666667         .75862069

         6          8    1141638              .8         .793103448

         8         16    1068467      .833333333         .827586207

         9         22    1036146      .866666667         .862068966

         9         21    1020541              .9         .896551724

        10         28     986964      .933333333         .931034483

         6          7     971585      .966666667         .965517241

        10         29     903383               1                  1

Let's walk through a couple of calculations for customer number 1 in the previous result set. With total sales of $1,151,162, customer number 1 ranks 23^rd in the set of 30 customers in descending order of sales. Since there are a total of 30 rows, the CUME_DIST is equal to 23/30, or .766666667. The PERCENT_RANK function yields (23 - 1) / (30 - 1) = .75862069. It should come as no surprise that each function returns identical values for the rows that have identical sales totals, since the calculations are based on rank, which is identical for all three rows.

14.2.5 Hypothetical Functions

For some types of analysis, determining what might have happened is more revealing than knowing what really happened. Oracle provides special versions of RANK, DENSE_RANK, CUME_DIST, and PERCENT_RANK that allow rankings and distributions to be calculated for hypothetical data, allowing the user to see what would have happened if a specific value (or set of values) was included in a data set.

To illustrate this concept, let's rank all customers by total sales for 2001, and then see where a hypothetical sales figure would fall in the ranking. Here is the query that generates the rankings and distributions:

SELECT cust_nbr, SUM(tot_sales) cust_sales,

  RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) rank,

  DENSE_RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) dense_rank,

  CUME_DIST( ) OVER (ORDER BY SUM(tot_sales) DESC) cume_dist,

  PERCENT_RANK( ) OVER (ORDER BY SUM(tot_sales) DESC) percent_rank

FROM orders

WHERE year = 2001

GROUP BY cust_nbr

ORDER BY rank;



  CUST_NBR CUST_SALES       RANK DENSE_RANK  CUME_DIST PERCENT_RANK

---------- ---------- ---------- ---------- ---------- ------------

        25    2232703          1          1 .033333333            0

        17    1944281          2          2 .066666667   .034482759

        14    1929774          3          3         .1   .068965517

         4    1878275          4          4 .133333333   .103448276

        26    1808949          5          5 .166666667   .137931034

         6    1788836          6          6         .2   .172413793

        20    1413722          7          7 .233333333   .206896552

        27    1322747          8          8 .266666667    .24137931

        13    1310434          9          9         .3   .275862069

        15    1255591         10         10 .333333333   .310344828

        18    1253840         11         11 .366666667   .344827586

         2    1224992         12         12 .466666667   .379310345

        23    1224992         12         12 .466666667   .379310345

        24    1224992         12         12 .466666667   .379310345

        30    1216858         15         13         .5   .482758621

         9    1208959         16         14 .533333333   .517241379

        10    1196748         17         15 .566666667   .551724138

        11    1190421         18         16         .6   .586206897

        12    1182275         19         17 .633333333   .620689655

        19    1174421         20         18 .666666667   .655172414

         5    1169926         21         19         .7   .689655172

         3    1161286         22         20 .733333333   .724137931

         1    1151162         23         21 .766666667    .75862069

         8    1141638         24         22         .8   .793103448

        16    1068467         25         23 .833333333   .827586207

        22    1036146         26         24 .866666667   .862068966

        21    1020541         27         25         .9   .896551724

        28     986964         28         26 .933333333   .931034483

         7     971585         29         27 .966666667   .965517241

        29     903383         30         28          1            1

Now let's see where a customer with an even million dollars of sales would have ranked:

SELECT 

  RANK(1000000) WITHIN GROUP 

    (ORDER BY SUM(tot_sales) DESC) hyp_rank,

  DENSE_RANK(1000000) WITHIN GROUP 

    (ORDER BY SUM(tot_sales) DESC) hyp_dense_rank,

  CUME_DIST(1000000) WITHIN GROUP 

    (ORDER BY SUM(tot_sales) DESC) hyp_cume_dist,

  PERCENT_RANK(1000000) WITHIN GROUP 

    (ORDER BY SUM(tot_sales) DESC) hyp_percent_rank

FROM orders

WHERE year = 2001

GROUP BY cust_nbr;



  HYP_RANK HYP_DENSE_RANK HYP_CUME_DIST HYP_PERCENT_RANK

---------- -------------- ------------- ----------------

        28             26    .903225806               .9

The WITHIN GROUP clause has the effect of injecting a fictitious row into the result set before determining the rankings. One possible use of this functionality would be to see how actual sales compare to sales targets.