Referencing cells in a spreadsheet is one of the basic requirements of model queries. You reference a cell by qualifying all the dimensions in a partition. Cells in a spreadsheet can be referenced in one of the two ways?symbolic cell referencing and positional cell referencing.
In a symbolic cell reference, you specify each dimension using a boolean expression, such as:
s[y=2004, m=3]
An example query with a symbolic cell reference is:
SELECT r, y, m, s FROM sales_history WHERE month <= 10 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[y=2004,m=3] = 200000) ORDER BY y, r, m;
Look at the RULES clause in this example, and see that each cell is referenced symbolically by specifying a value for each dimension. In the RULES clause, s refers to the measure sales declared in the MEASURES clause. This measure is structured in a two-dimensional array, as defined by the DIMENSION BY clause. The dimensions are year (y) and month (m). To reference any cell in the two-dimensional array, you need to specify both the dimensions. In the preceding example, the cell for March 2004 is referenced by specifying a value for the year dimension (y=2004) and the month dimension (m=3). You need to specify the dimensions in the same order as they appear in the DIMENSION BY clause.
In a positional cell reference, each dimension is implied by its position in the DIMENSION BY clause. The example from the previous section can be rewritten using positional cell reference as follows:
s[2004,3]
An example query with a positional cell reference is:
SELECT r, y, m, s FROM sales_history WHERE month <= 10 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,3] = 200000) ORDER BY y, r, m;
In this query's RULES clause each cell is referenced positionally by specifying a value corresponding to each column listed in the DIMENSION BY clause. Since the DIMENSION BY clause has two columns (year y, month m), the first value in s[2004,3] refers to the column year, and the second value refers to the column month.
You may write queries containing both positional and symbolic cell referencing. For example:
SELECT r, y, m, s FROM sales_history WHERE month <= 10 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, m=3] = (s[2000,m=3] + s[2001,m=3]) / 2) ORDER BY y, r, m;
In this query, the RULES clause, s[2004, m=3], contains both positional and symbolic cell referencing. The first dimension (year) is specified positionally, whereas the second dimension (month) is specified symbolically.
SQL models may involve two types of non-available value: existing cells with a NULL value, and non-existing cells. In the MODEL clause, any missing cells are treated as NULLs. Whether they are missing or existing cells with NULL values, the MODEL clause allows you to treat them in either of two ways?IGNORE or KEEP.
You can keep the NULL values by specifying KEEP NAV in the MODEL clause. KEEP NAV is the default behavior. Alternatively, you can specify IGNORE NAV in the MODEL clause to return the following values for NULL, depending on the data type of the measure:
0 for numeric data types
01-JAN-2000 for datetime (DATE, TIMESTAMP, etc.) data types
An empty string for character (CHAR, VARCHAR2, etc.) data types
NULL for all other data types
The following two examples illustrate the usage of KEEP NAV and IGNORE NAV. The sales history data in the table sales_history has NULL values for the month 12. Therefore, the following query returns NULL values for the measure s:
SELECT r, y, m, s FROM sales_history WHERE month = 12 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,12] = (s[2000,12] + s[2001,12]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2004 12 6 2004 12 7 2004 12
As you can see KEEP NAV is the default behavior. If you want zeros instead of the NULL values for the computed measure s, you can use the IGNORE NAV option in the MODEL clause, as shown in the following example:
SELECT r, y, m, s FROM sales_history WHERE month = 12 MODEL IGNORE NAV RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,12] = (s[2000,12] + s[2001,12]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2004 12 0 6 2004 12 0 7 2004 12 0
Whether you choose to keep or ignore NULL values depends on your application.
There are two ways to specify the uniqueness of the rows in a MODEL query. The default option is to use UNIQUE DIMENSION, which means that the combination of columns in the PARTITION BY and DIMENSION BY forms the unique key of the input data. When you don't specify any uniqueness condition, or when you specify UNIQUE DIMENSION, the database engine performs a check on the input data to ensure that each cell of the model has at most one row for each combination of PARTITION BY and DIMENSION BY columns. For example:
SELECT r, y, m, s FROM sales_history WHERE month <= 3 MODEL UNIQUE DIMENSION PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 3] = (s[2000,3] + s[2001,3]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 1 1018430 5 2000 2 1231492 5 2000 3 1132966 6 2000 1 1221394 6 2000 2 857352 6 2000 3 1274062 7 2000 1 758042 7 2000 2 1236846 7 2000 3 1311986 5 2001 1 509215 5 2001 2 615746 5 2001 3 566483 6 2001 1 610697 6 2001 2 428676 6 2001 3 637031 7 2001 1 379021 7 2001 2 618423 7 2001 3 655993 5 2004 3 849724.5 6 2004 3 955546.5 7 2004 3 983989.5
If you are sure that your input data is keyed on the PARTITION BY and DIMENSION BY columns, you can specify UNIQUE SINGLE REFERENCE instead of UNIQUE DIMENSION. When you specify UNIQUE SINGLE REFERENCE, the database engine will not perform the uniqueness check on the entire input data. Rather it will check that all the cells referenced in the righthand side of the rules each correspond to just one row of input data. The reduced checking done by the UNIQUE SINGLE REFERENCE option may improve performance when querying large amounts of data. The following example illustrates the usage of the UNIQUE SINGLE REFERENCE option:
SELECT r, y, m, s FROM sales_history WHERE month <= 3 MODEL UNIQUE SINGLE REFERENCE PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 3] = (s[2000,3] + s[2001,3]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 1 1018430 5 2000 2 1231492 5 2000 3 1132966 6 2000 1 1221394 6 2000 2 857352 6 2000 3 1274062 7 2000 1 758042 7 2000 2 1236846 7 2000 3 1311986 5 2001 1 509215 5 2001 2 615746 5 2001 3 566483 6 2001 1 610697 6 2001 2 428676 6 2001 3 637031 7 2001 1 379021 7 2001 2 618423 7 2001 3 655993 5 2004 3 849724.5 6 2004 3 955546.5 7 2004 3 983989.5
If you are using UNIQUE DIMENSION, and the input data doesn't satisfy the uniqueness condition of the PARTITION BY and DIMENSION BY columns, you will get an error, as illustrated in the following example:
SELECT r, y, m, s FROM sales_history WHERE month >= 10 MODEL UNIQUE DIMENSION PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 10] = (s[2000,10] + s[2001,10]) / 2) ORDER BY y, r, m; FROM sales_history * ERROR at line 2: ORA-32638: Non unique addressing in spreadsheet dimensions
This example returns an error because, in our example data, we have deliberately created duplicate rows for November 2000 and 2001. It doesn't matter that we aren't referencing data from that month in our rule. The duplication causes an error, because data for that month represents a cell somewhere in our model.
The same query with the UNIQUE SINGLE REFERENCE option will not cause any error, because the cells referenced in the righthand side of the rules satisfy the required uniqueness condition:
SELECT r, y, m, s FROM sales_history WHERE month >= 10 MODEL UNIQUE SINGLE REFERENCE PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 10] = (s[2000,10] + s[2001,10]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 10 1099296 5 2000 11 922790 5 2000 11 922790 5 2000 12 6 2000 10 1020234 6 2000 11 1065778 6 2000 11 1065778 6 2000 12 7 2000 10 1073682 7 2000 11 1107732 7 2000 11 1107732 7 2000 12 5 2001 10 549648 5 2001 11 461395 5 2001 11 461395 5 2001 12 6 2001 10 510117 6 2001 11 532889 6 2001 11 532889 6 2001 12 7 2001 10 536841 7 2001 11 553866 7 2001 11 553866 7 2001 12 5 2004 10 824472 6 2004 10 765175.5 7 2004 10 805261.5
Notice the duplicate rows of data in the above output for month 11 in the years 2000 and 2001. If our rules referenced that data, the query would have caused an error. For example:
SELECT r, y, m, s FROM sales_history WHERE month >= 10 MODEL UNIQUE SINGLE REFERENCE PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, 11] = (s[2000,11] + s[2001,11]) / 2) ORDER BY y, r, m; FROM sales_history * ERROR at line 2: ORA-32638: Non unique addressing in spreadsheet dimensions
This query fails, because there are multiple input rows for a single cell referenced by the rule. Which of the available rows for a given cell should the database choose to use? The answer is that the database doesn't know the answer. That's why the database throws an error. Without this checking for duplicate rows, the database would not be able to guarantee repeatable results.
The objective of all SQL queries is to return a result set. With a model query, you have two options: you can choose to return all the rows represented in the model, or you can choose to return only those rows updated by the rules. Returning all the rows is the default behavior. Use the following clause to specify which behavior you desire:
RETURN [ALL | UPDATED] ROWS
The RETURN clause belongs immediately after the MODEL keyword, except when you are using any cell reference options such as IGNORE NAV, KEEP NAV, UNIQUE DIMENSION, or UNIQUE SINGLE REFERENCE. If you are using cell reference options, then those cell reference options need to come before the RETURN clause.
The following example illustrates the default behavior:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,3] = (s[2000,3] + s[2001,3]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2000 3 1132966 6 2000 3 1274062 7 2000 3 1311986 5 2001 3 566483 6 2001 3 637031 7 2001 3 655993 5 2004 3 849724.5 6 2004 3 955546.5 7 2004 3 983989.5
The sales_history table has rows for the year 2000 and 2001. The rows for the year 2004 are computed based on the rules. Since the query didn't specify a RETURN clause, all the rows that satisfy the WHERE condition are returned.
To return only updated rows, use the RETURN UPDATED ROWS option, as in the following MODEL query:
SELECT r, y, m, s FROM sales_history WHERE month = 3 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004,3] = (s[2000,3] + s[2001,3]) / 2) ORDER BY y, r, m; R Y M S ----- ---------- ---------- ---------- 5 2004 3 849724.5 6 2004 3 955546.5 7 2004 3 983989.5
This time, only the new rows generated by the query are returned. For the purpose of the RETURN clause, the newly generated rows are also considered "UPDATED ROWS." If the model query had updated some existing rows, those rows would also have been returned in the result set.