Let's take an example to understand the basic elements of a model query. The sales_history table holds the sales data for three regions for each of the 12 months of the years 2000 and 2001. We want to forecast sales for the first three months of the year 2004, by using a simple formula: the sales for each region for each month of 2004 will be forecasted to be the average sales for that region and that month for years 2000 and 2001. Mathematically, our formula looks as follows:
sales_2004 = (sales_2000 + sales_2001) / 2
Using the MODEL clause introduced in Oracle Database 10g, this forecasting model can be written into a SQL query as follows:
SELECT r, y, m, s FROM sales_history WHERE month <= 3 MODEL RETURN UPDATED ROWS PARTITION BY (region_id r) DIMENSION BY (year y, month m) MEASURES (sales s) RULES (s[2004, FOR m in (1,2,3)] = (s[2000,CV( )] + s[2001,CV( )]) / 2) ORDER BY y, r, m; R Y M S ---------- ---------- ---------- ---------- 5 2004 1 763822.5 5 2004 2 923619 5 2004 3 849724.5 6 2004 1 916045.5 6 2004 2 643014 6 2004 3 955546.5 7 2004 1 568531.5 7 2004 2 927634.5 7 2004 3 983989.5 9 rows selected.
The preceding query is called a model query, and introduces some new keywords: MODEL, PARTITION BY, DIMENSION BY, MEASURES, and RULES.
The keyword MODEL marks the start of the MODEL clause. The MODEL clause enables you to work with the relational data as a multidimensional array, which is referred to as a model. Once you've arranged your data into an array, you perform spreadsheet-like calculations.
The PARTITION BY clause defines logical blocks of the model. You can think of the PARTITION BY clause as separating the data into multiple models, each model being of the same structure, but containing a different subset of the data. This is very similar to the effect of the PARTITION BY clause used with the analytical functions discussed in Chapter 13. If you wish to apply the same calculations to multiple subsets of your data, and you wish each subset to be independent of the other, then partition your data such that each partition corresponds to one of those subsets.
The DIMENSION BY clause specifies the dimensions of the multidimensional array created by the MODEL clause. The columns in the DIMENSION BY clause uniquely identify a cell in a partition of the multidimensional array. The dimensions in a model query are equivalent to the dimensions in a star schema. In the example under discussion, the columns year and month are specified as the dimensions, which indicates that each partition will be a two-dimensional array, and a combination of year and the month values will identify each cell.
The columns specified in the MEASURES clause are the columns on which the spreadsheet calculations are performed. Measures in a model query are equivalent to the measures in the fact table of a star schema. In our example in this section, the sales column is identified as the measure, and the spreadsheet calculations (estimating future sales) are performed on that column.
The RULES keyword introduces the clause specifying the formulae for calculations that you wish to perform. We'll talk more about rules in Section 18.3.
When you execute a MODEL query, the MODEL clause is almost the last clause to be executed. Only SELECT and ORDER BY come later. Thus, to see the data feeding into a model, you need only remove the MODEL clause, execute the remaining query, and look at the output.
A discussion of aliases is in order. Look carefully at the preceding query, and you'll see that aliases are specified in both the SELECT and MODEL clauses. The SELECT and ORDER BY clauses "see" the data that is returned from the MODEL clause. Thus, when you give a column an alias in your MODEL clause, you must use that same alias to refer to the column in your SELECT and ORDER BY clauses. Your SELECT clause may provide yet another alias, which will become the column name "seen" by the user or application program executing the query.