Partitioning Data

Partitioning the data across multiple files within a database, or even across multiple databases on multiple servers, can alleviate many of the woes associated with VLDBs. By breaking the database up into smaller, more manageable "chunks," the time and performance constraints of the VLDB are reduced.

When multiple servers are used to partition data, they are referred to as federated servers. Partitioning data across multiple servers adds a high degree of administrative overhead and should really only be considered when your performance or storage needs absolutely exceed that of a single server. With SQL Server 2000 on Windows 2000 Datacenter supporting 32 processors and 64GB of RAM, only a small percentage of installations will need the added capacity of federated servers.

Vertical Partitioning

Vertical partitioning is used to split a "wide" table with many or large columns into two or more tables joined by a common key, or to separate multiple tables for performance or manageability. Splitting a wide table is useful with tables containing text or image data, or a subset of rarely accessed columns. This could be considered denormalizing the table, which is acceptable if it improves performance. In truth, however, partitioning is often done to compensate for data that wasn't properly normalized in the first place.

When vertically partitioning multiple tables, filegroups can be leveraged to spread the tables over multiple disks to improve I/O and allow backup by filegroup. When vertically partitioning across filegroups, planning the location of data is critical. Tables should be located to optimize I/O. Consider a database with four tables: orders, items, products, and suppliers. If orders and items are frequently joined in queries, and the same is true for products and suppliers, placing orders and products on one filegroup and items and suppliers on the other should improve performance. Remember to consider how this placement will affect your backup plan.

By using vertical partitioning across multiple servers, you can scale your database for high performance and high storage capacity. In this case, rather than the tables residing in multiple filegroups within a database, they are spread over two or more servers. These servers are generally configured as linked servers, and the data is queried using the fully qualified four-part object name. The following statement joins the local products table with the suppliers table on a linked server:

SELECT ProductName, CompanyName 
FROM Products p JOIN
RemoteServer.Northwind2.dbo.Suppliers s
ON p.supplierid= s.supplierid

When modifying data on linked servers, you must perform a distributed transaction. This can be referenced through the API functions of a client application or by executing the BEGIN DISTRIBUTED TRANSACTION statement. This ensures that data on both the local and linked server is updated or rolled back as a unit.

Horizontal Partitioning

When data is partitioned horizontally, the data in a table is split across multiple tables based on values within the data. The division here is to break up long, as opposed to wide tables. For example, a single table containing customer data for the year could split into 12 tables, each containing rows that correspond to a particular month. Check constraints on each table ensure that only data for the correct month is inserted into each table; the check constraint is also referenced by the query optimizer, allowing it to build an execution plan that uses only the appropriate tables. As with vertical partitioning, these tables can be spread among multiple filegroups, databases, or servers, depending on the degree to which you must scale. A view can be created that accesses all the tables (known as member tables) using the UNION operator, providing access to the data that is transparent to the application. Views that access tables on the local server are referred to as Local Partitioned Views and views that reference member tables on remote servers are referred to as Distributed Partitioned Views.

Using Distributed Partitioned Views

When distributing data using partitioned views, distribute the data by a value that spreads the data evenly over all servers, or by a value, such as region, that distributes the data with a logical association with the server location. This will minimize the need for distributed queries. SQL Server 2000 introduces support for updateable distributed partitioned views. Data modifications are performed against the view, allowing true transparency. The view is accessed as if it were a base table, with the user or application unaware of the actual location of the data. If configured properly, SQL Server will determine via the where clause specified in the update query which partition defined in the view must be updated rather than updating all tables in the join. For more information on defining distributed partitioned views, see Chapter 27, "Creating and Managing Views in SQL Server."