Distributed Queries

This chapter will first look at what needs to be in place to support a pure read-only, distributed query requirement. For read only, Microsoft offers the most data access power because not as many things need to be coordinated for reads. Distributed queries are easily created for most types of datasources via OLE DB providers and ODBC drivers. This even includes flat files. Basically, any datasource that can provide the ODBC or OLE DB compliance information can serve as a datasource in the Microsoft SQL Server environment. OLE DB providers and ODBC expose their data in tabular objects called rowsets that can be referenced in Transact-SQL statements as if they were a SQL Server table.

One of the previously mentioned distributed data examples was that of a company with disparate (and distributed) datasources (refer to Figure 32.3). Now take this a step further and implement a distributed query that reads across one or more servers. More specifically, the sales director needs to see a weekly report of all customers, their most current addresses, and the "YTD business" (sales) of each of these customers. The Excel spreadsheet is being used for the address information because this spreadsheet (SWCustomers.xls) comes directly from the sales force and is highly accurate.

Unfortunately, the customer information on the SQL Server tables (CustomersPlus table) is not updated very often and has a tendency to become quickly out of date. However, the YTD business sales data on this CustomersPlus table is maintained via triggers and is up to the minute. The report is called Customer Orders Report (CustOrders.rpt). This report can easily be fulfilled via a single distributed query that will join the customer information in Excel with that of the OLTP Customer YTD business data on SQL Server 2000, as seen in Figure 32.5.

Figure 32.5. Cross-system report (distributed query).

graphics/32fig05.gif

In general, you will need to create a SQL Query that joins the CustomerPlus table on SQL Server 2000 in the Southwind database with the Excel spreadsheet on the sales director's machine (SWCustomers.xls).

SELECT a.CustomerID, a.CompanyName, b.ContactName, b.Address, 
b.City, b.Region, b.PostalCode, b.Country, b.Phone,
b.Fax, a.YTDBusiness
FROM CustomersPlus AS a
INNER JOIN SWCustomers (from the Excel spreadsheet) AS b
ON a.CustomerID = b.CustomerID

The previous Transact-SQL statement won't run this way because it is not referencing the tables or Excel spreadsheet correctly yet. A few things will need to be set up on the SQL Server side first, such as defining a linked server and a login. You will also need to modify the FROM statement to reflect a distributed datasource location (a linked server location). For this requirement, you will do the following:

Define a linked server reference that will represent the Excel spreadsheet as if it were a relational table. (This Linked Server reference will be called ExcelSW.)
Set up a linked server login for SQL Server 2000 to use to access this spreadsheet. This is needed when using the OLE DB provider of Microsoft.Jet.OLEDB.4.0.