# 18.2 Cell References

Referencing cells in a spreadsheet is one of the basic requirements of model queries. You reference a cell by qualifying all the dimensions in a partition. Cells in a spreadsheet can be referenced in one of the two ways?symbolic cell referencing and positional cell referencing.

#### 18.2.1 Symbolic Cell References

In a symbolic cell reference, you specify each dimension using a boolean expression, such as:

`s[y=2004, m=3]`

An example query with a symbolic cell reference is:

```SELECT r, y, m, s

FROM sales_history

WHERE month <= 10

MODEL

RETURN UPDATED ROWS

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[y=2004,m=3] = 200000)

ORDER BY y, r, m;```

Look at the RULES clause in this example, and see that each cell is referenced symbolically by specifying a value for each dimension. In the RULES clause, s refers to the measure sales declared in the MEASURES clause. This measure is structured in a two-dimensional array, as defined by the DIMENSION BY clause. The dimensions are year (y) and month (m). To reference any cell in the two-dimensional array, you need to specify both the dimensions. In the preceding example, the cell for March 2004 is referenced by specifying a value for the year dimension (y=2004) and the month dimension (m=3). You need to specify the dimensions in the same order as they appear in the DIMENSION BY clause.

#### 18.2.2 Positional Cell References

In a positional cell reference, each dimension is implied by its position in the DIMENSION BY clause. The example from the previous section can be rewritten using positional cell reference as follows:

`s[2004,3]`

An example query with a positional cell reference is:

```SELECT r, y, m, s

FROM sales_history

WHERE month <= 10

MODEL

RETURN UPDATED ROWS

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004,3] = 200000)

ORDER BY y, r, m;```

In this query's RULES clause each cell is referenced positionally by specifying a value corresponding to each column listed in the DIMENSION BY clause. Since the DIMENSION BY clause has two columns (year y, month m), the first value in s[2004,3] refers to the column year, and the second value refers to the column month.

#### 18.2.3 Combined Positional and Symbolic References

You may write queries containing both positional and symbolic cell referencing. For example:

```SELECT r, y, m, s

FROM sales_history

WHERE month <= 10

MODEL

RETURN UPDATED ROWS

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004, m=3] = (s[2000,m=3] + s[2001,m=3]) / 2)

ORDER BY y, r, m;```

In this query, the RULES clause, s[2004, m=3], contains both positional and symbolic cell referencing. The first dimension (year) is specified positionally, whereas the second dimension (month) is specified symbolically.

#### 18.2.4 NULL Measures and Missing Cells

SQL models may involve two types of non-available value: existing cells with a NULL value, and non-existing cells. In the MODEL clause, any missing cells are treated as NULLs. Whether they are missing or existing cells with NULL values, the MODEL clause allows you to treat them in either of two ways?IGNORE or KEEP.

You can keep the NULL values by specifying KEEP NAV in the MODEL clause. KEEP NAV is the default behavior. Alternatively, you can specify IGNORE NAV in the MODEL clause to return the following values for NULL, depending on the data type of the measure:

• 0 for numeric data types

• 01-JAN-2000 for datetime (DATE, TIMESTAMP, etc.) data types

• An empty string for character (CHAR, VARCHAR2, etc.) data types

• NULL for all other data types

The following two examples illustrate the usage of KEEP NAV and IGNORE NAV. The sales history data in the table sales_history has NULL values for the month 12. Therefore, the following query returns NULL values for the measure s:

```SELECT r, y, m, s

FROM sales_history

WHERE month = 12

MODEL

RETURN UPDATED ROWS

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES  (s[2004,12] = (s[2000,12] + s[2001,12]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2004         12

6       2004         12

7       2004         12```

As you can see KEEP NAV is the default behavior. If you want zeros instead of the NULL values for the computed measure s, you can use the IGNORE NAV option in the MODEL clause, as shown in the following example:

```SELECT r, y, m, s

FROM sales_history

WHERE month = 12

MODEL

IGNORE NAV

RETURN UPDATED ROWS

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES  (s[2004,12] = (s[2000,12] + s[2001,12]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2004         12          0

6       2004         12          0

7       2004         12          0```

Whether you choose to keep or ignore NULL values depends on your application.

#### 18.2.5 UNIQUE DIMENSION/UNIQUE SINGLE REFERENCE

There are two ways to specify the uniqueness of the rows in a MODEL query. The default option is to use UNIQUE DIMENSION, which means that the combination of columns in the PARTITION BY and DIMENSION BY forms the unique key of the input data. When you don't specify any uniqueness condition, or when you specify UNIQUE DIMENSION, the database engine performs a check on the input data to ensure that each cell of the model has at most one row for each combination of PARTITION BY and DIMENSION BY columns. For example:

```SELECT r, y, m, s

FROM sales_history

WHERE month <= 3

MODEL

UNIQUE DIMENSION

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004, 3] = (s[2000,3] + s[2001,3]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2000          1    1018430

5       2000          2    1231492

5       2000          3    1132966

6       2000          1    1221394

6       2000          2     857352

6       2000          3    1274062

7       2000          1     758042

7       2000          2    1236846

7       2000          3    1311986

5       2001          1     509215

5       2001          2     615746

5       2001          3     566483

6       2001          1     610697

6       2001          2     428676

6       2001          3     637031

7       2001          1     379021

7       2001          2     618423

7       2001          3     655993

5       2004          3   849724.5

6       2004          3   955546.5

7       2004          3   983989.5```

If you are sure that your input data is keyed on the PARTITION BY and DIMENSION BY columns, you can specify UNIQUE SINGLE REFERENCE instead of UNIQUE DIMENSION. When you specify UNIQUE SINGLE REFERENCE, the database engine will not perform the uniqueness check on the entire input data. Rather it will check that all the cells referenced in the righthand side of the rules each correspond to just one row of input data. The reduced checking done by the UNIQUE SINGLE REFERENCE option may improve performance when querying large amounts of data. The following example illustrates the usage of the UNIQUE SINGLE REFERENCE option:

```SELECT r, y, m, s

FROM sales_history

WHERE month <= 3

MODEL

UNIQUE SINGLE REFERENCE

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004, 3] = (s[2000,3] + s[2001,3]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2000          1    1018430

5       2000          2    1231492

5       2000          3    1132966

6       2000          1    1221394

6       2000          2     857352

6       2000          3    1274062

7       2000          1     758042

7       2000          2    1236846

7       2000          3    1311986

5       2001          1     509215

5       2001          2     615746

5       2001          3     566483

6       2001          1     610697

6       2001          2     428676

6       2001          3     637031

7       2001          1     379021

7       2001          2     618423

7       2001          3     655993

5       2004          3   849724.5

6       2004          3   955546.5

7       2004          3   983989.5```

If you are using UNIQUE DIMENSION, and the input data doesn't satisfy the uniqueness condition of the PARTITION BY and DIMENSION BY columns, you will get an error, as illustrated in the following example:

```SELECT r, y, m, s

FROM sales_history

WHERE month >= 10

MODEL

UNIQUE DIMENSION

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004, 10] = (s[2000,10] + s[2001,10]) / 2)

ORDER BY y, r, m;

FROM sales_history

*

ERROR at line 2:

This example returns an error because, in our example data, we have deliberately created duplicate rows for November 2000 and 2001. It doesn't matter that we aren't referencing data from that month in our rule. The duplication causes an error, because data for that month represents a cell somewhere in our model.

The same query with the UNIQUE SINGLE REFERENCE option will not cause any error, because the cells referenced in the righthand side of the rules satisfy the required uniqueness condition:

```SELECT r, y, m, s

FROM sales_history

WHERE month >= 10

MODEL

UNIQUE SINGLE REFERENCE

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004, 10] = (s[2000,10] + s[2001,10]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2000         10    1099296

5       2000         11     922790

5       2000         11     922790

5       2000         12

6       2000         10    1020234

6       2000         11    1065778

6       2000         11    1065778

6       2000         12

7       2000         10    1073682

7       2000         11    1107732

7       2000         11    1107732

7       2000         12

5       2001         10     549648

5       2001         11     461395

5       2001         11     461395

5       2001         12

6       2001         10     510117

6       2001         11     532889

6       2001         11     532889

6       2001         12

7       2001         10     536841

7       2001         11     553866

7       2001         11     553866

7       2001         12

5       2004         10     824472

6       2004         10   765175.5

7       2004         10   805261.5```

Notice the duplicate rows of data in the above output for month 11 in the years 2000 and 2001. If our rules referenced that data, the query would have caused an error. For example:

```SELECT r, y, m, s

FROM sales_history

WHERE month >= 10

MODEL

UNIQUE SINGLE REFERENCE

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES (s[2004, 11] = (s[2000,11] + s[2001,11]) / 2)

ORDER BY y, r, m;

FROM sales_history

*

ERROR at line 2:

This query fails, because there are multiple input rows for a single cell referenced by the rule. Which of the available rows for a given cell should the database choose to use? The answer is that the database doesn't know the answer. That's why the database throws an error. Without this checking for duplicate rows, the database would not be able to guarantee repeatable results.

#### 18.2.6 Returning Rows

The objective of all SQL queries is to return a result set. With a model query, you have two options: you can choose to return all the rows represented in the model, or you can choose to return only those rows updated by the rules. Returning all the rows is the default behavior. Use the following clause to specify which behavior you desire:

`RETURN [ALL | UPDATED] ROWS`

The RETURN clause belongs immediately after the MODEL keyword, except when you are using any cell reference options such as IGNORE NAV, KEEP NAV, UNIQUE DIMENSION, or UNIQUE SINGLE REFERENCE. If you are using cell reference options, then those cell reference options need to come before the RETURN clause.

The following example illustrates the default behavior:

```SELECT r, y, m, s

FROM sales_history

WHERE month = 3

MODEL

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES  (s[2004,3] = (s[2000,3] + s[2001,3]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2000          3    1132966

6       2000          3    1274062

7       2000          3    1311986

5       2001          3     566483

6       2001          3     637031

7       2001          3     655993

5       2004          3   849724.5

6       2004          3   955546.5

7       2004          3   983989.5```

The sales_history table has rows for the year 2000 and 2001. The rows for the year 2004 are computed based on the rules. Since the query didn't specify a RETURN clause, all the rows that satisfy the WHERE condition are returned.

To return only updated rows, use the RETURN UPDATED ROWS option, as in the following MODEL query:

```SELECT r, y, m, s

FROM sales_history

WHERE month = 3

MODEL

RETURN UPDATED ROWS

PARTITION BY (region_id r)

DIMENSION BY (year y, month m)

MEASURES (sales s)

RULES  (s[2004,3] = (s[2000,3] + s[2001,3]) / 2)

ORDER BY y, r, m;

R          Y          M          S

----- ---------- ---------- ----------

5       2004          3   849724.5

6       2004          3   955546.5

7       2004          3   983989.5```

This time, only the new rows generated by the query are returned. For the purpose of the RETURN clause, the newly generated rows are also considered "UPDATED ROWS." If the model query had updated some existing rows, those rows would also have been returned in the result set.

 Mastering Oracle SQL, 2nd Edition
 Preface
 Chapter 1. Introduction to SQL
 Chapter 2. The WHERE Clause
 Chapter 3. Joins
 Chapter 4. Group Operations
 Chapter 5. Subqueries
 Chapter 6. Handling Temporal Data
 Chapter 7. Set Operations
 Chapter 8. Hierarchical Queries
 Chapter 9. DECODE and CASE
 Chapter 10. Partitioning
 Chapter 11. PL/SQL
 Chapter 12. Objects and Collections