Partitioned Views

Partitioned views are used to access data that has been horizontally split, or partitioned, across multiple tables. These tables can be in the same or different databases, or even spread across multiple servers. Partitioning of tables is used to spread the I/O and processing load of large tables across multiple disks or servers. Partitioned views allow access to the tables as if they were one. For example, a multinational company could have an order table with millions of rows. To speed access to the data, the table could be partitioned into multiple tables, with identical column definitions, each with a check constraint specifying a specific country code. Listing 27.8 shows a sample CREATE TABLE command to create the tables.

Listing 27.8 Creating the Base Tables for a Partitioned View

CREATE TABLE UK_Order (
   Ord_No    INT,
   Country   Char(2) CHECK (Country = 'UK'),
   Cust_id   INT
   PRIMARY KEY (Ord_no, Country)
   )
go
CREATE TABLE US_Order (
   Ord_No    INT,
   Country   Char(2) CHECK (Country = 'US'),
   Cust_id   INT
   PRIMARY KEY (Ord_no, Country)
   )
go
CREATE TABLE CA_Order (
   Ord_No    INT,
   Country   Char(2) CHECK (Country = 'CA'),
   Cust_id   INT
   PRIMARY KEY (Ord_no, Country)
   )
go

Creating a Partitioned View

To create a partitioned view to access the tables, the view definition combines data from multiple tables using the UNION ALL operator. If all the tables are on the same server, it is referred to as a local partitioned view; if the tables span multiple servers, it is considered a distributed partitioned view. Listing 27.9 shows the CREATE VIEW statement to create a local partitioned view on the tables in the preceding listing.

Listing 27.9 Creating a Local Partitioned View

CREATE VIEW World_Sales
AS
SELECT *
FROM UK_Order
       UNION ALL
SELECT *
FROM US_Order
       UNION ALL
SELECT *
FROM CA_Order

When designing your partitioning scheme, it should be clear what data belongs to which table. This can be accomplished by defining a check constraint on the column(s) that holds the data used to partition the tables, as shown in Listing 27.8.

A view containing UNION ALL statements is not technically considered a partitioned view unless the view definition meets the requirements for partitioned views. A view that doesn't meet the requirements can still be created and queries run against it, but no updates are allowed and queries against the view cannot take advantage of the optimization enhancements provided by partitioned views. The following restrictions must be met to create views that SQL Server can recognize as partitioned views and optimize them effectively:

A CHECK constraint must exist on the partitioning column that specifies the data that resides in each of the table partitions. There cannot be any overlap between the key ranges of the CHECK constraints.
Tables cannot be referenced more than once in the view.
Tables in the view cannot have any indexes created on computed columns.
The PRIMARY KEY constraints must be defined on the same number of columns for each table in the view and the partitioning columns must be a part of the primary key of the table.
All columns must be specified for each table in the view and the columns cannot be referenced more than once in the select list. Using SELECT * in the view definition is the easiest way to ensure these column requirements are met.
The columns for each table must be in the same ordinal position in the select list and must be of the same type (including data type, precision, scale, and collation).
The column used to define the partitioning scheme cannot be an identity, default, timestamp, or computed column and cannot allow NULL values.
There can be only one constraint defined on the partitioning column.

Queries against partitioned views that meet all of these rules will be optimized using the partitioned view optimizations that are supported by the SQL Server 2000 query optimizer.

When the CHECK constraint specifies the range of values each partition holds, the constraint can be used by the query optimizer to determine which table(s) to access if the partitioning column is specified in a search argument. If no CHECK constraint exists on the partitioning column, the query optimizer cannot make any assumptions about the values stored in different tables and will search all the underlying base tables.

Modifying Data Through Partitioned Views

Data can be modified through a partitioned view as long as it meets all the requirements specified in the previous section to be considered a truly partitioned view, and also meets the standard requirements for inserts, updates, and deletes against partitioned views. In addition, the following restrictions apply:

All columns must be specified for INSERT statements, even if the columns allow NULL values or have DEFAULT constraints defined.
The DEFAULT keyword cannot be specified in the VALUES clause of an insert or the SET clause of an update.
Inserts are not allowed if any of the participating tables contain identity columns.
Inserts and updates are not allowed if any of the participating tables columns contain a timestamp column.
The view cannot contain any self-joins with the same view or any of the participating tables.

NOTE

Data can be modified through partitioned views only in the Enterprise and Developer Editions of SQL Server 2000.

If a partitioned view is not updatable, you can circumvent the update restrictions by creating an INSTEAD OF trigger on the view. The INSTEAD OF trigger can be designed to determine which underlying table(s) to modify and submit the appropriate SQL statements to the underlying tables. Be sure to include appropriate error handling into the trigger to ensure no duplicate rows are inserted. For more information on creating INSTEAD OF triggers, see Chapter 29.

Distributed Partitioned Views

A distributed partitioned view is defined just like a partitioned view, but one or more of the base tables resides on a different server. You can create distributed partitioned views by first adding linked server definitions for each server containing tables that participate in the partitioned view and referencing the tables using the full four-part name (server.database.owner.table) or using an OPENDATASOURCE- or OPENROWSET-based name. If using the OPENDATASOURCE or OPENROWSET function, the function must specify a table name rather than a pass-through query. (For more information on configuring and using linked servers, see Chapter 19, "Managing Linked and Remote Servers.")

To help optimize performance of distributed partitioned views, set the lazy schema validation option using the sp_serveroption system procedure. This helps ensure the query processor does not request metadata for any of the linked tables until data is actually needed from the remote table partition. Listing 27.10 displays an example of a distributed partitioned view.

Listing 27.10 Creating a Distributed Partitioned View

CREATE VIEW World_Sales
AS
-- Get data from remote table on UKServer
SELECT *
  FROM UKServer.Northwind.dbo.Order
UNION ALL
  SELECT *
    FROM Northwind.dbo.Order
UNION ALL
-- Get data from remote table on CAServer
  SELECT *
    FROM CAServer.Northwind.dbo.Order

Typically, each server participating in the distributed view would have a copy of the World_Sales view created on it, with fully qualified names referencing the tables on the other servers. Therefore, each of three servers in this scenario would contain one table (the local data) and a World_Sales view to access data from the other countries.

If properly defined, SQL Server 2000 will attempt to optimize the performance of distributed partitioned views by minimizing the amount of data transferred between member servers. The query processor uses OLE DB to retrieve the CHECK constraint definitions from each member table. This allows the query processor to map the specified search arguments to the appropriate table(s). The query execution plan then accesses only the necessary tables and retrieves only those remote rows needed to complete the SQL statement.

SQL Server 2000 introduces support for updateable distributed partitioned views. Data modifications are performed against the view, allowing true transparency. The view is accessed as if it were a base table, with the user or application unaware of the actual location of the data. If configured properly, SQL Server will determine via the WHERE clause specified in the update query which partition defined in the view must be updated rather than updating all tables in the join.