The ranking functions described thus far are quite useful when comparing items within a fixed window of time, such as "last year" or "second quarter." But what if you want to perform computations using a window that slides as you progress through the data set? Oracle's windowing functions allow aggregates to be calculated for each row in a result set based on a specified window. The aggregation window can be defined in one of three ways:
By specifying a set of rows: "From the current row to the end of the partition"
By specifying a time interval: "For the 30 days preceding the transaction date"
By specifying a range of values: "All rows having a transaction amount within 5% of the current row's transaction amount"
The first set of examples will generate a window that fills the entire partition, and then show how the window can be detached from one or both ends of the partition so that it floats with the current row. All of the examples will be based on the following query, which calculates total monthly sales in 2001 for the Mid-Atlantic region:
SELECT month, SUM(tot_sales) monthly_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES ---------- ------------- 1 610697 2 428676 3 637031 4 541146 5 592935 6 501485 7 606914 8 460520 9 392898 10 510117 11 532889 12 492458
The first step is to sum the monthly sales for the entire result set by specifying an "unbounded" window. Note the ROWS BETWEEN clause in the following example:
SELECT month, SUM(tot_sales) monthly_sales, SUM(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) total_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES TOTAL_SALES ---------- ------------- ----------- 1 610697 6307766 2 428676 6307766 3 637031 6307766 4 541146 6307766 5 592935 6307766 6 501485 6307766 7 606914 6307766 8 460520 6307766 9 392898 6307766 10 510117 6307766 11 532889 6307766 12 492458 6307766
Each time the function executes, it sums the monthly sales from months 1 through 12; thus, the same calculation is being performed 12 times. This is a rather inefficient way to generate the yearly sales total (see Section 14.4 later in this chapter for a better method), but it should give you an idea of the syntax for building an aggregation window. The next query will create a window that spans from the top of the partition to the current row. The function identifies the month that has the maximum sales, up to and including the current month:
SELECT month, SUM(tot_sales) monthly_sales, MAX(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_preceeding FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES MAX_PRECEEDING ---------- ------------- -------------- 1 610697 610697 2 428676 610697 3 637031 637031 4 541146 637031 5 592935 637031 6 501485 637031 7 606914 637031 8 460520 637031 9 392898 637031 10 510117 637031 11 532889 637031 12 492458 637031
Unlike the first query, which has a window size fixed at 12 rows, this query's aggregation window grows from a single row for month 1 to 12 rows for month 12. The keywords CURRENT ROW are used to indicate that the window should end at the current row being inspected by the function. If you replace MAX in the previous query with SUM, you can calculate a running total:
SELECT month, SUM(tot_sales) monthly_sales, SUM(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) running_total FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES RUNNING_TOTAL ---------- ------------- -------------- 1 610697 610697 2 428676 1039373 3 637031 1676404 4 541146 2217550 5 592935 2810485 6 501485 3311970 7 606914 3918884 8 460520 4379404 9 392898 4772302 10 510117 5282419 11 532889 5815308 12 492458 6307766
You have now seen examples using windows that are fixed at one or both ends. The next query will define a window that floats freely with each row:
SELECT month, SUM(tot_sales) monthly_sales, AVG(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- ----------- 1 610697 519686.5 2 428676 558801.333 3 637031 535617.667 4 541146 590370.667 5 592935 545188.667 6 501485 567111.333 7 606914 522973 8 460520 486777.333 9 392898 454511.667 10 510117 478634.667 11 532889 511821.333 12 492458 512673.5
For each of the 12 rows, the function calculates the average sales of the current month, the previous month, and the following month. The value of the ROLLING_AVG column is therefore the average sales within a three month floating window centered on the current month, with the exception that months 1 and 12 are calculated using a two-month window, since there is no previous month for month 1 or following month for month 12.
The previous windowing examples use the ROWS BETWEEN option to specify which rows to include in the aggregation. You may alternately specify a range and let Oracle determine which rows lie within the range. For example, the previous query used ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to generate a three-month rolling average; the same results can be achieved by substituting RANGE for ROWS:
SELECT month, SUM(tot_sales) monthly_sales, AVG(SUM(tot_sales)) OVER (ORDER BY month RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- ----------- 1 610697 519686.5 2 428676 558801.333 3 637031 535617.667 4 541146 590370.667 5 592935 545188.667 6 501485 567111.333 7 606914 522973 8 460520 486777.333 9 392898 454511.667 10 510117 478634.667 11 532889 511821.333 12 492458 512673.5
This substitution works because the month column contains integer values, so adding and subtracting 1 from the current month yields a three-month range. The next variation achieves the same results but specifies a range of +/- 1.999:
SELECT month, SUM(tot_sales) monthly_sales, AVG(SUM(tot_sales)) OVER (ORDER BY month RANGE BETWEEN 1.999 PRECEDING AND 1.999 FOLLOWING) rolling_avg FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- ----------- 1 610697 519686.5 2 428676 558801.333 3 637031 535617.667 4 541146 590370.667 5 592935 545188.667 6 501485 567111.333 7 606914 522973 8 460520 486777.333 9 392898 454511.667 10 510117 478634.667 11 532889 511821.333 12 492458 512673.5
If you are generating a window based on a DATE column, you can specify a range in increments of days, months, or years. Since the orders table has no DATE columns, the next example shows how a date range can be specified against the order_dt column of the cust_order table:
SELECT TRUNC(order_dt) day, SUM(sale_price) daily_sales, AVG(SUM(sale_price)) OVER (ORDER BY TRUNC(order_dt) RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND INTERVAL '2' DAY FOLLOWING) five_day_avg FROM cust_order WHERE sale_price IS NOT NULL AND order_dt BETWEEN TO_DATE('01-JUL-2001','DD-MON-YYYY') AND TO_DATE('31-JUL-2001','DD-MON-YYYY') GROUP BY TRUNC(order_dt); DAY DAILY_SALES FIVE_DAY_AVG --------- ----------- ------------ 16-JUL-01 112 146 18-JUL-01 180 114 20-JUL-01 50 169 21-JUL-01 50 165.333333 22-JUL-01 396 165.333333
This query generates a five-day rolling window by specifying a range of +/- two days around the truncated order date.
Oracle provides two additional aggregate functions, called FIRST_VALUE and LAST_VALUE, that can be used with windowing functions to identify the values of the first and last values in the window. In the case of the three-month rolling average query shown previously, you could display the values of all three months along with the average of the three, as in:
SELECT month, FIRST_VALUE(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) prev_month, SUM(tot_sales) monthly_sales, LAST_VALUE(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) next_month, AVG(SUM(tot_sales)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH PREV_MONTH MONTHLY_SALES NEXT_MONTH ROLLING_AVG ---------- ---------- ------------- ---------- ----------- 1 610697 610697 428676 519686.5 2 610697 428676 637031 558801.333 3 428676 637031 541146 535617.667 4 637031 541146 592935 590370.667 5 541146 592935 501485 545188.667 6 592935 501485 606914 567111.333 7 501485 606914 460520 522973 8 606914 460520 392898 486777.333 9 460520 392898 510117 454511.667 10 392898 510117 532889 478634.667 11 510117 532889 492458 511821.333 12 532889 492458 492458 512673.5
These functions are useful for queries that compare each value to the first or last value in the period, such as: "How did each month's sales compare to the first month?"
Although not technically windowing functions, the LAG and LEAD functions are included here because they allow rows to be referenced by their position relative to the current row, much like the PRECEDING and FOLLOWING clauses within windowing functions. LAG and LEAD are useful for comparing one row of a result set with another row of the same result set. For example, the query "Compute the total sales per month for the Mid-Atlantic region, including the percent change from the previous month" requires data from both the current and preceding rows to calculate the answer. This is, in effect, a two-row window, but the offset from the current row can be specified as one or more rows, making LAG and LEAD act like specialized windowing functions where only the outer edges of the window are utilized.
Here is the SQL that uses the LAG function to generate the data needed to answer the question posed in the previous paragraph:
SELECT month, SUM(tot_sales) monthly_sales, LAG(SUM(tot_sales), 1) OVER (ORDER BY month) prev_month_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month ORDER BY month; MONTH MONTHLY_SALES PREV_MONTH_SALES ---------- ------------- ---------------- 1 610697 2 428676 610697 3 637031 428676 4 541146 637031 5 592935 541146 6 501485 592935 7 606914 501485 8 460520 606914 9 392898 460520 10 510117 392898 11 532889 510117 12 492458 532889
As you might expect, the LAG value for month 1 is NULL, since there is no preceding month. This would also be the case for the LEAD value for month 12. If you would like the LAG and LEAD functions to return a non-NULL value for these cases, you can specify a substitute value via the optional third parameter (see next example).
The next query utilizes the output from the previous query to generate the percentage difference from month to month. Note how a third parameter has been specified for the LAG function so that month 1 will use the current month's sales instead of a NULL value for the percentage change:
SELECT months.month month, months.monthly_sales monthly_sales, ROUND((months.monthly_sales - months.prev_month_sales) / months.prev_month_sales, 3) * 100 percent_change FROM (SELECT month, SUM(tot_sales) monthly_sales, LAG(SUM(tot_sales), 1, SUM(tot_sales)) OVER (ORDER BY month) prev_month_sales FROM orders WHERE year = 2001 AND region_id = 6 GROUP BY month) months ORDER BY month; MONTH MONTHLY_SALES PERCENT_CHANGE ---------- ------------- -------------- 1 610697 0 2 428676 -29.8 3 637031 48.6 4 541146 -15.1 5 592935 9.6 6 501485 -15.4 7 606914 21 8 460520 -24.1 9 392898 -14.7 10 510117 29.8 11 532889 4.5 12 492458 -7.6