Rules are the core of a model query. Rules specify the formulae to compute values for the cells in the spreadsheet. Use the RULES clause to specify the rules for a model query. The RULES clause encloses all the rules in parentheses, and each rule is separated from the next by a comma.
Each rule represents an assignment, and consists of a lefthand side and a righthand side. The RULES clause of one of the previous examples looks like the following:
RULES (s[2004,3] = (s[2000,3] + s[2001,3]) / 2)
The lefthand side of a rule (s[2004,3] in this example) identifies the cells to be updated using values from the righthand side of the rule. The righthand side of a rule is an expression that represents the computation to be performed. You can use any valid SQL operator or function in an expression. There are also some additional constructs that you can use in rules, that are specific to the MODEL clause.
You can use the CV( ) function in the righthand side of a rule. It returns the current value of a dimension column from the lefthand side of the rule. The following example illustrates:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,3] = (s[2000,CV( )] + s[2001,CV( )]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2004 3 849724.5 6 2004 3 955546.5 7 2004 3 983989.5
The CV( ) function evaluates to the current value of the corresponding dimension. In this example, CV( ) evaluates to 3, corresponding to the month dimension specified on the lefthand side of the rule. The CV( ) function comes in handy when you need to evaluate a rule multiple times, and each time a dimension column takes a different value (such as in FOR loops, discussed later).
Optionally, the CV( ) function can take a dimension column as an argument. For example, we could have written CV(m) to access the current month value from the lefthand side of our rule. When no argument is specified, positional referencing is used, which means that the dimension column in the corresponding position is used.
ANY can be used as a wildcard in a rule written with positional referencing. It accepts any value for the corresponding column (including NULL). The following example illustrates the usage of ANY:
SELECT r, y, m, s FROM sales_history WHERE month <= 3 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[ANY,3] = (s[CV( ),1] + s[CV( ),2]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 3 1124961 6 2000 3 1039373 7 2000 3 997444 5 2001 3 562480.5 6 2001 3 519686.5 7 2001 3 498722
In this example, ANY is used as a wildcard for the year dimension, which translates into "all the values for the column year in the sales_history table." This example also illustrates why CV( ) is so important. Our rule will update every cell for March (month = 3), regardless of the year. We use CV( ) on the righthand side to capture the current year, so that we can reference the values for January and February of that same year.
|
FOR loops allow you to write a "rule" that affects a number of cells, and acts like a FOR loop in a procedural language such as PL/SQL. FOR loops are expanded at compile-time, so what looks like one rule to you is really seen by the database as many rules. More on this in a bit.
FOR loops are allowed only in the lefthand side of a rule. FOR loops allow multiple cells to be inserted by a single rule. FOR loops can take one of the following three forms:
FOR d IN (subquery | list) FOR d [LIKE pattern] FROM v1 TO v2 [INCREMENT | DECREMENT] n FOR (d1, d2, . . . ) IN (multi_column_subquery | multi_column_list)
The syntax elements are:
A single-dimension column.
A subquery returning value(s) for the dimension column.
A list of value(s) for the dimension column.
A string with a %. This pattern behaves slightly differently from the LIKE pattern used in a WHERE clause predicate. This pattern doesn't accept underscore. Values from v1 through v2 are substituted into the pattern at the position marked by %.
Two literals specifying the upper and lower bound for the dimension d.
A number to increment or decrement by. The value n must be positive.
Multiple-dimension columns in a FOR loop.
A subquery returning values for the multiple-dimension columns.
A list of values for the multiple-dimension columns.
The following example illustrates a single-column FOR loop:
SELECT r, y, m, s FROM sales_history WHERE month <= 6 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES ( s[2004, FOR m IN (SELECT DISTINCT month FROM sales_history WHERE month <= 6)] = s[2000,CV( )] ) ORDER BY y, r, m; R Y M S ------ ---------- ---------- ---------- 5 2004 1 1018430 5 2004 2 1231492 5 2004 3 1132966 5 2004 4 1195244 5 2004 5 1132570 5 2004 6 1006708 6 2004 1 1221394 6 2004 2 857352 6 2004 3 1274062 6 2004 4 1082292 6 2004 5 1185870 6 2004 6 1002970 7 2004 1 758042 7 2004 2 1236846 7 2004 3 1311986 7 2004 4 1220034 7 2004 5 1322188 7 2004 6 1137144
This query copies the sales history for the year 2000 to the year 2004, for each month, for the first six months. The following example does the same thing, but using a multiple column FOR loop:
SELECT r, y, m, s FROM sales_history WHERE month <= 6 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES ( s[FOR (y,m) IN (SELECT DISTINCT 2004, month FROM sales_history WHERE month <= 6)] = s[2000,CV( )] ) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2004 1 1018430 5 2004 2 1231492 5 2004 3 1132966 5 2004 4 1195244 5 2004 5 1132570 5 2004 6 1006708 6 2004 1 1221394 6 2004 2 857352 6 2004 3 1274062 6 2004 4 1082292 6 2004 5 1185870 6 2004 6 1002970 7 2004 1 758042 7 2004 2 1236846 7 2004 3 1311986 7 2004 4 1220034 7 2004 5 1322188 7 2004 6 1137144
The following restrictions apply to subqueries used in FOR loops:
They cannot be correlated.
They cannot be defined using the WITH clause.
They cannot return more than 10,000 rows.
The last restriction needs more explanation. The total number of rules you can specify in the RULES clause is 10,000. When you use a FOR loop, the RULES clause is expanded by unfolding the FOR loop at compile-time, with the database creating one rule for each value returned by the FOR loop. If the total number of rules, including those not generated from FOR loops, exceeds 10,000 for a given model query, you will get an error. The following example illustrates this error:
SELECT r, y, m, s FROM sales_history MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES ( s[2004, FOR m IN (SELECT ROWNUM FROM orders o1 CROSS JOIN orders o2 WHERE ROWNUM <= 10001)] = s[2000,CV( )] ) ORDER BY y, r, m; SELECT r, y, m, s * ERROR at line 1: ORA-32633: Spreadsheet subquery FOR cell index returns too many rows
In this example, the FOR loop is forced to execute 10,001 times, resulting in 10,001 rules being created, which exceeds the 10,000 rule limit. Even though the error message indicates that the limit is on the subquery of the FOR loop, the limit is actually on the total number of rules in the model. The subquery is simply the component of the model query that caused the limit to be exceeded. If a subquery returns less than 10,000 rows, but the total number of rules after unfolding all the FOR loops still exceeds 10,000, you will get an error, as illustrated in the following example:
SELECT r, y, m, s FROM sales_history MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES ( s[2004, FOR m IN (SELECT ROWNUM FROM orders o1 CROSS JOIN orders o2 WHERE ROWNUM <= 5000)] = s[2000,CV( )], s[2005, FOR m IN (SELECT ROWNUM FROM orders o1 CROSS JOIN orders o2 WHERE ROWNUM <= 5001)] = s[2001,CV( )] ) ORDER BY y, r, m; s[2005, FOR m IN (SELECT ROWNUM * ERROR at line 14: ORA-32636: Too many rules in spreadsheet
In this example, one FOR loop results in 5000 rules, and the other FOR loop results in 5001 rules, which make a total of 10001 rules. Therefore, you get the error message that indicates that you have too many rules.
IS ANY can be used as a wildcard in a rule when using symbolic referencing. It accepts any value for the corresponding column (including NULL), and returns TRUE always. This is the equivalent to the ANY wildcard used in positional referencing. IS ANY can be used only in the lefthand side of a rule. The following example illustrates the usage of IS ANY:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[y IS ANY, m=3] = (s[CV( ),m=3] + s[CV( ),m=3]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 3 1132966 6 2000 3 1274062 7 2000 3 1311986 5 2001 3 566483 6 2001 3 637031 7 2001 3 655993
In this example, IS ANY is used as a wildcard for the year dimension, which translates into "all the values for the column year in the sales_history table."
|
IS PRESENT returns TRUE if the cell referenced existed prior to the execution of the MODEL clause. Otherwise, if the cell was created as a result of executing a rule, or does not exist at all, IS PRESENT returns FALSE. The following example illustrates the usage of the IS PRESENT condition:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 3] = CASE WHEN s[2003,3] IS PRESENT THEN s[2003,3] ELSE 0 END) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 3 1132966 6 2000 3 1274062 7 2000 3 1311986 5 2001 3 566483 6 2001 3 637031 7 2001 3 655993 5 2004 3 0 6 2004 3 0 7 2004 3 0
In this example the IS PRESENT condition is used from within a CASE expression to test whether the cell s[2003,3] was present prior to execution of the MODEL clause. If the cell s[2003,3] was present, then the value of the cell s[2003,3] is assigned to the new cell s[2004,3]; if the cell s[2003,3] wasn't present, then the value 0 is assigned to s[2004,3]. As you can see from the result set, the referenced cell didn't satisfy the IS PRESENT condition. You can tell that this is the case, because each of the 2004 rows has been given a 0 value for estimated March (m=3) sales.
The PRESENTV function returns a value based on the existence of a cell prior to the execution of the MODEL clause. PRESENTV can be used only on the righthand side of a rule and takes the following form:
PRESENTV(cell, exp1, exp2)
The syntax elements are:
A cell reference
Expressions that resolve to a value for the cell referenced
PRESENTV returns exp1 if the referenced cell existed prior to the execution of the MODEL clause; otherwise, the function returns exp2. The following example does the same thing as the IS PRESENT example in the previous section, but using the PRESENTV function instead of a CASE and IS PRESENT:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,3] = PRESENTV(s[2003,3], s[2003,3], 0)) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 3 1132966 6 2000 3 1274062 7 2000 3 1311986 5 2001 3 566483 6 2001 3 637031 7 2001 3 655993 5 2004 3 0 6 2004 3 0 7 2004 3 0
In this example, the value for the cell s[2004,3] is determined based on whether the cell s[2003,3] existed before the execution of the MODEL clause. If the cell s[2003,3] existed, its value will be assigned to the cell s[2004,3]. If the cell s[2003,3] didn't exist, a value 0 will be assigned to the cell s[2004,3]. As it appears from the result set, the cell s[2003,3] didn't exist in any of the partitions prior to the execution of the MODEL clause. You can tell that this is the case, because each of the 2004 rows has been given a 0 value for estimated March (m=3) sales.
The syntax of the PRESENTNNV function is of the same form as that of the PRESENTV function, and like PRESENTV, it can be used only on the righthand side of a rule. PRESENTNNV means "present not null value," and returns exp1 if a cell existed prior to the execution of the MODEL clause, and had a NOT NULL value; otherwise, the function returns exp2. The following example illustrates the usage of PRESENTNNV function:
SELECT r, y, m, s FROM sales_history MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,3] = PRESENTNNV(s[2000,3], s[2000,3], 0)) ORDER BY y, r, m; R Y M S --------- ---------- ---------- ---------- 5 2004 3 1132966 6 2004 3 1274062 7 2004 3 1311986
The cell s[2000,3] existed prior to the execution of the MODEL clause, and had a NOT NULL value. You can know this, because all occurrences of s[2004,3] are non-zero in the result set.
In most of the examples you have seen so far in this chapter, one cell in the righthand side of a rule has been used to assign a value for one cell in the lefthand side. Or, if more than one cell has been used, each cell has been explicitly referenced. However, there are situations in which you want to use a set of cells on the righthand side to assign one value to a cell on the lefthand side. You can do that by using an aggregate function, such as AVG, COUNT, SUM, MAX, MIN, applied to the multiple cells on the righthand side. This is illustrated in the following example:
SELECT r, y, m, s FROM sales_history MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 3] = AVG(s)[y BETWEEN 1995 AND 2003,3]) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2004 3 849724.5 6 2004 3 955546.5 7 2004 3 983989.5
This example uses the syntax AVG(s)[y BETWEEN 1995 AND 2003,3] to generate the average of all March sales (month 3) between 1995 and 2003 inclusive. In our data, this range encompases: s[1995,3], s[1996,3], s[1997,3], s[1998,3], s[1999,3], s[2000,3], s[2001,3], s[2002,3], and s[2003,3]. When you invoke an aggregate function, be sure to place only the measure name within the parentheses. All the dimensions go outside the parentheses.
A MODEL clause usually consists of multiple rules. Quite often, those rules are interdependent. Cells computed by one rule are often used as input to other rules. In such cases, it is very important that rules are evaluated in a proper order. To influence the order of rule evaluation, you can qualify the RULES clause using two options: SEQUENTIAL ORDER and AUTOMATIC ORDER. The syntax to use is:
RULES [ [SEQUENTIAL | AUTOMATIC] ORDER ]
The following sections describe the difference between these two approaches to the order in which rules are evaluated.
If you don't specify the ordering option in the RULES clause, SEQUENTIAL ORDER is enforced. The rules are evaluated in the order they appear in the RULES clause (also known as, lexical order). The following example illustrates evaluation of the rules in sequential order:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES SEQUENTIAL ORDER ( s[2002,3] = (s[2000,3] + s[2001,3])/2, s[2003,3] = s[2002,3] * 1.1 ) ORDER BY y, r, m; R Y M S ---------- ---------- ---------- ---------- 5 2002 3 849724.5 6 2002 3 955546.5 7 2002 3 983989.5 5 2003 3 934696.95 6 2003 3 1051101.15 7 2003 3 1082388.45
In this query, the first rule computes the cell s[2002,3], and the second rule uses the resulting value to compute the cell s[2003,3]. If you reverse the order of the rules, you won't get the same results:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES SEQUENTIAL ORDER ( s[2003,3] = s[2002,3] * 1.1, s[2002,3] = (s[2000,3] + s[2001,3])/2 ) ORDER BY y, r, m; R Y M S ---------- ---------- ---------- ---------- 5 2002 3 849724.5 6 2002 3 955546.5 7 2002 3 983989.5 5 2003 3 6 2003 3 7 2003 3
Notice the NULL values returned by the cell s[2003,3] in all the partitions, in this example's output. We asked for sequential ordering of the rules, and the rule to compute s[2003,3] appears before the rule to compute s[2002,3]. The rule to compute s[2003,3] uses s[2002,3]. Since the cell s[2002,3] doesn't exist before the second rule is evaluated, it's value is NULL, and the value for s[2003,3] ends up being NULL as well.
With AUTOMATIC ORDER, the evaluation order of the rules is determined using a dependency graph. This is done automatically in a way that ensures that a rule computing a new value for a cell is executed prior to that cell being used to supply a value to another rule. All you need to do to get this behavior is to specify AUTOMATIC ORDER after the RULE keyword. The NULL output of the previous example can be avoided by automatic ordering of the rules, as illustrated in the following example:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES AUTOMATIC ORDER ( s[2003,3] = s[2002,3] * 1.1, s[2002,3] = (s[2000,3] + s[2001,3])/2 ) ORDER BY y, r, m; R Y M S --------- ---------- ---------- ---------- 5 2002 3 849724.5 6 2002 3 955546.5 7 2002 3 983989.5 5 2003 3 934696.95 6 2003 3 1051101.15 7 2003 3 1082388.45
In this version of the query, the automatic ordering of the rules ensures that the second rule is evaluated before the first rule.
A model with automatic ordering of rules is referred to as an automatic order model, whereas a model with sequential ordering of rules is referred to as a sequential order model. In an automatic order model, a cell can be assigned a value only once, because if a cell is assigned a value more than once, the dependency graph will involve a cycle, and rule evaluation will go into an infinite loop. If you attempt to assign a value to a given cell more than once, you will get an error, as in the following example:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES AUTOMATIC ORDER ( s[2002,3] = (s[2000,3] + s[2001,3])/2, s[2002,3] = 20000 ) ORDER BY y, r, m; s[2002,3] = 20000 * ERROR at line 11: ORA-32630: multiple assignment in automatic order SPREADSHEET
However, in a sequential order spreadsheet, you can assign a value to a cell more than once. When you do that, you should remember that the last assignment will be reflected in the final outcome of the query, as illustrated in the following example:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES ( s[2002,3] = (s[2000,3] + s[2001,3])/2, s[2002,3] = 20000 ) ORDER BY y, r, m; R Y M S --------- ---------- ---------- ---------- 5 2002 3 20000 6 2002 3 20000 7 2002 3 20000
In this example, the initial assignment of the cell s[2002,3] is overwritten by the value 20000 assigned by the last rule in the list. One more thing to notice about this query is that it doesn't specify an ordering option for the rules. Thus, SEQUENTIAL ORDER is used by default.
Why not use AUTOMATIC ORDER all the time? Whenever you are sure about the sequence of rules, you should use SEQUENTIAL ORDER. By doing so, you are saving the database from the overhead of building the dependency graph and determining the rule order every time the query is executed. There are also cases, such as when you must assign a value to a cell, and then later assign another value to the same cell, that preclude automatic ordering.
The rules in the RULES clause allow you to update existing cells, and to create new cells in a model. If the cell specified by the lefthand side of a rule is present in your model, the value for that cell is updated. If the cell doesn't exist, a new row, corresponding to that cell, is inserted into the result set of your model query. The necessary values for the columns other than the measure columns in any newly inserted row will be derived from its partition and dimension values. This is the default semantics, and is known as UPSERT (update or insert) semantics.
The alternative to UPSERT semantics is to use UPDATE semantics. You can specify which to use in the RULES clause:
RULES [UPSERT | UPDATE] [SEQUENTIAL ORDER | AUTOMATIC ORDER]
In UPDATE semantics, if the cell specified by the lefthand side of a rule is present in the model, it is updated. If the cell doesn't exist, the assignment is ignored.
The following example illustrates UPDATE semantics, in which existing cells are updated, and updates to non-existent cells are ignored:
SELECT r, y, m, s FROM sales_history MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES UPDATE ( s[2001,3] = 20000, s[2002,3] = 20000 ) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2001 3 20000 6 2001 3 20000 7 2001 3 20000
In this example, the model has the cell s[2001,3], and for each of three partitions, for the regions numbered 5, 6, and 7. The first rule arbitrarily stores the value 20000 into the s[2001,3] cell for each region. The second rule attempts to do the same for the cell s[2002,3], but since that cell doesn't already exist in the spreadsheet, the second rule is ignored. This is how UPDATE semantics work.
However, with UPSERT semantics, new cells will be inserted for s[2002,3], as illustrated by the following example:
SELECT r, y, m, s FROM sales_history MODEL UNIQUE SINGLE REFERENCE RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES UPSERT ( s[2001,3] = 20000, s[2002,3] = 20000 ) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2001 3 20000 6 2001 3 20000 7 2001 3 20000 5 2002 3 20000 6 2002 3 20000 7 2002 3 20000
With UPSERT semantics, when you specify a previously non-existent cell and assign a value to that cell, the database inserts a new row into the result set by combining the dimension, measure, and partition information.
New cells cannot be inserted in the following situations:
As a result of using the ANY wildcard on the lefthand side
As a result of using symbolic referencing on the lefthand side
The ANY wildcard is essentially a predicate that filters selected rows from the population of currently existing rows. When you use a predicate in the WHERE clause of a SELECT statement, that predicate can never generate new rows. Likewise, you can't use a predicate to generate new rows in a model.
Symbolic referencing is also a predicate, and this is a subtle, but important point to understand. For example, the cell reference s[y=2003, m=3] is loosely equivalent to a WHERE clause such as WHERE y=2003 AND m=3. Such a WHERE clause can never generate new rows , and neither can such a cell reference.
|