A federated system consists of:
A DB2 instance that operates as a federated server
One or more data sources
A database that acts as the federated database
Clients (users and applications) that access the federated database and data sources
With a federated system, you can send distributed requests to multiple data sources within a single SQL statement. For example, you can join data that is located in a DB2 UDB table, an Oracle table, and a Sybase view in a single SQL statement.
The federated server is a database manager instance to which application processes connect and submit requests. Any number of DB2 instances can be configured to function as federated servers. You can use existing DB2 instances as your federated server, or you can create new ones specifically for the federated system. The DB2 federated instance that manages the federated system is called a server because it responds to requests from end users and client applications.
The federated server often sends parts of the requests it receives to the data sources for processing. A pushdown operation is an operation that is processed remotely. The federated instance is referred to as the federated server, even though it acts as a client when it pushes down requests to the data sources.
There are two main features distinguishing a federated server from other application servers:
A federated server is configured to receive requests that might be partially or entirely intended for data sources. The federated server distributes these requests to the data sources.
A federated server uses DRDA communication protocols (such as SNA and TCP/IP) to communicate with DB2 family instances. However, unlike other application servers, a federated server uses other protocols to communicate with non-DB2 family instances.
Typically, a federated system data source is a relational DBMS instance (such as Oracle, Informix, Microsoft SQL Server, or Sybase) and one or more databases that are supported by the instance. However, there are other types of data sources (such as life sciences data sources and search algorithms) that you can include in your federated system.
Data sources are semi-autonomous:
Local operations can continue simultaneously.
Data sources can join or leave the federation.
Data sources cooperate to support two-phase commit.
For example, the federated server can send queries to Oracle data sources at the same time that Oracle applications can access these data sources. A DB2 federated system does not monopolize or restrict access to the other data sources, beyond integrity and locking constraints.
Users and applications interface with the federated database managed by the federated server. The federated database contains catalog entries that identify data sources and their characteristics. The federated server consults the information stored in the federated database system catalog and the data source wrapper to determine the best plan for processing SQL statements.
The federated database system catalog contains:
Information about objects at the data sources.
Information about the objects in the federated database.
The catalog in a federated database is called the global catalog because it contains information about the entire federated system.
DB2 query optimizer uses the information in the global catalog and the data source wrapper to plan the best way to process SQL statements.
The information stored in the global catalog includes remote and local information, such as column names, column data types, column default values, and index information.
Remote catalog information is the information or name used by the data source.
Local catalog information is the information or name used by the federated database.
The global catalog also includes other information about the data sources. For example, it includes information the federated server uses to connect to the data source and map the federated user authorizations to the data source user authorizations.
NOTE
For relational data sources, the information stored in the global catalog includes both remote and local information. For non-relational data sources, the information stored in the global catalog varies from data source to data source.
To see the data source table information that is stored in the global catalog, query the federated SYSCAT.TABLES, SYSCAT.TABOPTIONS, SYSCAT.COLUMNS, and SYSCAT.COLOPTIONS catalog views.
To obtain data from data sources, users and applications submit queries in DB2 SQL to the federated database. When a query is submitted, the DB2 SQL Compiler uses information in the global catalog and the data source wrapper to help it process the query. This includes information about connecting to the data source, server attributes, mappings, index information, and processing statistics.
As part of the SQL Compiler process, the query optimizer analyzes a query. The Compiler develops alternative strategies, called access plans, for processing the query. Access plans might call for the query to be:
Processed by the data sources
Processed by the federated server
Processed partly by the data sources and partly by the federated server
DB2 evaluates the access plans primarily on the basis of information about the data source capabilities and the data. The wrapper and the global catalog contain this information. DB2 decomposes the query into segments that are called query fragments. Typically, it is more efficient to push down a query fragment to a data source if the data source can process the fragment.
NOTE
The processing of segments of a query at a data source instead of at the federated server is called push-down processing.
However, the query optimizer takes into account other factors, such as:
The amount of data that needs to be processed
The processing speed of the data source
The amount of data that the fragment will return
The communication bandwidth
The query optimizer generates local and remote access plans for processing a query fragment, based on resource cost. DB2 then chooses the plan it believes will process the query with the least resource cost.
If any of the fragments are to be processed by data sources, DB2 submits these fragments to the data sources. After the data sources process the fragments, the results are retrieved and returned to DB2. If DB2 performed any part of the processing, it combines its results with the results retrieved from the data source. DB2 then returns all results to the client.
DB2 Federated Server does not push down a query fragment if the data source cannot process it or if the federated server can process it faster than the data source can process it. The ability by DB2 to process SQL that is not supported by a data source is called compensation. A federated server compensates for the loss of functionality at the data source either by simulating the data source function or by returning the set of data to the federated server and performing the function locally.
The federated server compensates for lack of functionality at the data source in two ways:
It can ask the data source to use one or more operations that are equivalent to the DB2 function stated in the query.
It can return the set of data to the federated server and perform the function locally.
With compensation, the federated server can support the full DB2 SQL dialect for queries against data sources, even data sources with weak SQL support or no SQL support. You must use the DB2 SQL dialect with a federated system, except in a pass-through session.
In a federated system, a special DB2 session submits SQL statements directly to data sources, using the SQL dialect associated with that data source. You use a pass-through session to perform an operation that is not possible with DB2 SQL/API or to perform actions not supported by SQL. You can use both static and dynamic SQL in a pass-through session.
For example, use a pass-through session to create a procedure, create an index, or perform queries in the native dialect of the data source.
CREATE USER MAPPING FOR USER SERVER SYBSERVER OPTIONS (REMOTE_AUTHID 'xxxxxxxx', REMOTE_PASSWORD 'yyyyyyyy'); SET PASSTHRU SYBSERVER; SELECT count(*) FROM dbo.sysobjects; SET PASSTHRU RESET;
The following restrictions apply to pass-through sessions:
You must use the SQL dialect or language commands of the data source. As a result, you do not query a nickname, but instead query the data source objects directly.
When performing UPDATE or DELETE operations in a pass-through session, you cannot use the WHERE CURRENT OF CURSOR condition.
Wrappers are mechanisms by which the federated server interacts with data sources. The federated server uses routines stored in a library called a wrapper module to implement a wrapper. These routines allow the federated server to perform operations such as connecting to a data source and retrieving data from it iteratively. Typically, the DB2 federated instance owner uses the CREATE WRAPPER statement to register a wrapper in the federated system.
create WRAPPER sqlnet;
There are wrappers for each supported data source. Some wrappers have default wrapper names. When you use the default name (Table 7.1) to create the wrapper, the federated server automatically picks up the data source library associated with the wrapper.
Data Source | Default Wrapper Name(s) |
---|---|
DB2 UDB for Linux, UNIX, and Windows | DRDA |
DB2 UDB for z/OS and OS/390 | DRDA |
DB2 UDB for iSeries | DRDA |
DB2 UDB for VM and VSE | DRDA |
Informix | INFORMIX |
Oracle | SQLNet or Net8 |
Microsoft SQL Server | no default |
ODBC | OLEDB |
Sybase | CTLIB, DBLIB |
BLAST | no default |
Documentum | no default |
Microsoft Excel | no default |
Table-structured files | no default |
XML | no default |
There is one wrapper for each type of data source. For example, suppose that you:
Want to access data in three DB2 for z/OS database tables, one DB2 for iSeries table, two Informix tables, and one Informix view.
Need to create only two wrappers: one for the DB2 data source objects (DRDA wrapper) and one for the Informix data source objects (INFORMIX wrapper). After these wrappers are registered in the federated database, you can use these wrappers to access other objects from these data sources.
You use the server definitions and nicknames to identify the specifics (name, location, and so forth) of each data source object. A wrapper can:
Connect to the data source using the standard connection API of that data source.
Submit queries to the data source.
For data sources that do not support SQL, one of two actions will occur:
For data sources that support SQL, the query is submitted in SQL.
For data sources that do not support SQL, the query is translated into the native query language of the source or into a series of source API calls.
Receive results sets from the data source using the data source standard API of that data source.
Respond to the federated server queries about the default data type mappings for a data source.
Respond to the federated server queries about the default function mappings for a data source.
Wrapper options are used to configure the wrapper or to define how DB2 uses the wrapper. Currently, there is only one wrapper option, DB2_FENCED. The DB2_FENCED wrapper option indicates whether the wrapper is fenced or trusted by DB2. A fenced wrapper operates under some restrictions.
The current default setting for DB2_FENCED is N. The following example sets the DB2_FENCED option on for wrapper SQLNET:
alter WRAPPER sqlnet OPTIONS(set DB2_FENCED 'Y');
After wrappers are created for the data sources, the federated instance owner defines the data sources to the federated database. The instance owner supplies a name to identify the data source and other information that pertains to the data source. If the data source is an RDBMS, this information includes:
The type and version of the RDBMS
The database name for the data source on the RDBMS
Metadata that is specific to the RDBMS
For example, a DB2 family data source can have multiple databases. The definition must specify which database the federated server can connect to. In contrast, an Oracle data source has one database, and the federated server can connect to the database without knowing its name. The database name is not included in the federated server definition of an Oracle data source.
The name and other information that the instance owner supplies to the federated server are collectively called a server definition. Data sources answer requests for data and are servers in their own right.
The CREATE SERVER and ALTER SERVER statements are used to create and modify a server definition.
create server HOST390 type DB2/390 version 7.1 wrapper drda authorization TST1TXP password xxxxxxxx OPTIONS (node 'HDB1NODE', dbname 'HDB1', password 'Y');
When you create server definitions, it is important to understand the options that you can specify about the server. Server options specify data integrity, location, security, and performance information. Some server options are specific to data sources. Some server options configure the wrapper, and some affect the way DB2 uses the wrapper. Server options are specified as parameters in the CREATE SERVER and ALTER SERVER statements.
The federated server options:
Compatibility options: COLLATING_SEQUENCE, IGNORE_UDT
Data integrity options: IUD_APP_SVPT_ENFORCE
Location options: CONNECTSTRING, DBNAME, IFILE
Security options: FOLD_ID, FOLD_PW, PASSWORD
Performance options: COMM_RATE, CPU_RATIO, IO_RATIO, LOGIN_TIMEOUT, PACKET_SIZE, PLAN_HINTS, PUSHDOWN, TIMEOUT, VARCHAR_NO_TRAILING_BLANKS
To set a server option value temporarily, use the SET SERVER OPTION statement. This statement overrides the value for the duration of a single connection to the federated database. The overriding value does not get stored in the global catalog.
An example statement to set the passwords that will not need to be validated at the data source when the server option PASSWORD is set to N follows:
set server option password to 'N' for server HOST390;
The CPU_RATIO option indicates:
The processing capacity of the data source compared with the federated server.
The data source is faster or the federated server is slower if the value is less than 1.
The data source is slower or the federated server is faster if the value is greater than 1.
The federated server and the data source have the same processing capacity if the value is set to 1. It is the default value.
This is an example of setting the CPU_RATIO to indicate that the data source SUNSERV is 2 times faster than the federated server. Setting this value low influences the optimizer indicating that if an element of an SQL statement can be pushed down to the data source, it can be performed faster by the data source. Therefore, the optimizer tends to give more work to the datasource and to push down to the data source the more complex SQL to perform.
ALTER SERVER SUNSERV OPTIONS (ADD CPU_RATIO '0.5');
Here is an example to ensure that when authorization IDs are sent to your server HOST390, the case of the IDs will remain unchanged. Let's assume that this data source has started to run on an upgraded CPU that's only half as fast as your local CPU. You will need to inform the optimizer of this statistic:
ALTER SERVER HOST390 OPTIONS (ADD FOLD_ID 'N', SET CPU_RATIO '2.0');
The IO_RATIO option indicates:
The disk I/O rate of the data source compared with the federated server.
The data source is faster or the federated server is slower if the value is less than 1.
The data source is slower or the federated server is faster if the value is greater than 1.
The federated server and the data source have the same disk I/O rate if the value is set to 1. It is the default value.
This is an example of setting the CPU_RATIO to indicate that the federated server runs twice as fast as the CPU that supports HPSERV and the I/O devices at the federated server process data one and one-half times as fast as the I/O devices at HPSERV server.
ALTER SERVER HPSERV OPTIONS (ADD CPU_RATIO '2', IO_RATIO '1.5');
The COMM_RATE option indicates:
The communications data rate between the federated server and the data source.
Default value is 2, meaning 2,000,000 bytes/second.
Values can be set only in whole numbers, with each whole number meaning 1,000,000 bytes/second.
1 means 1,000,000 bytes/second
3 means 3,000,000 bytes/second
Setting a lower value influences the federated server's cost-based optimizer by indicating it should send more SQL, and SQL of greater complexity, to the data source to reduce the volume of data that has to cross the network between the data source and the federated server.
This is an example of setting COMM_RATE to indicate the network communications data rate to data source SUNSERV is 1,000,000 bytes/second.
ALTER SERVER SUNSERV OPTIONS (ADD COMM_RATE '1');
When a federated server needs to push down a request to a data source, the server must first establish a connection to the data source. The server does this by using a valid user ID and password to that data source. By default, the federated server attempts to access the data source with the user ID and password that are used to connect to DB2. If the user ID and password are the same between the federated server and the data source, the connection is established. If the user ID and password to access the federated server differ from the user ID and password to access a data source, you must define an association between the two authorizations. Once you define the association, distributed requests can be sent to the data source. This association is called a user mapping.
You use the CREATE USER MAPPING statement and the user option REMOTE_PASSWORD to map the passwords. Use the ALTER USER MAPPING statement to modify an existing user mapping.
create user mapping for TPHAN server HOST390 OPTIONS (remote_authid 'TST1TXP', remote_password 'xxxxxxxx');
User options provide authorization and accounting string information for user mappings between the federated server and a data source. These options can be used with any data source that supports user ID and password authorization. For example, suppose that a user has the same ID but different passwords for the federated database and a data source. For the user to access the data source, it is necessary to map the passwords to one another.
These options are used with the CREATE USER MAPPING statement:
ACCOUNTING_STRING is used to specify a DRDA accounting string. Valid settings include any string of length 255 or less. This option is required only if accounting information is needed. The default setting is none.
REMOTE_AUTHID indicates the authorization ID used at the data source. Valid settings include any string of length 255 or less. If this option is not specified, the ID used to connect to database is used. The default setting is none.
REMOTE_DOMAIN indicates the Windows NT domain used to authenticate users connecting to this data source. Valid settings include any valid Windows NT domain name. If this option is not specified, the data source will authenticate using the default authentication domain for that database. The default setting is none.
REMOTE_PASSWORD indicates the authorization password used at the data source. Valid settings include any string of length 32 or less. If this option is not specified, the password used to connect to the database is used. The default setting is none.
This is an example of a user, TPHAN, who uses a federated database to connect to a DB2 UDB for OS/390 and z/OS data source called HOST390. He uses one authorization ID to access DB2 and another authorization ID to access HOST390, and he has created a mapping between these two IDs. He has been using the same password with both IDs, but now he decides to use a separate password, pw390, with the ID for HOST390. Accordingly, he needs to map his federated database password to pw390.
alter user mapping for TPHAN server HOST390 OPTIONS (ADD REMOTE_PASSWORD 'pw390');
Another example is where Tom uses a local database to connect to an Oracle data source called MYORA. He accesses the local database under the authorization ID TPHAN; TPHAN maps to ORAID, the authorization ID under which he accesses MYORA. Tom is going to start accessing MYORA under a new ID, PHANTOM. So TPHAN now needs to map to PHANTOM.
alter user mapping for TPHAN server MYORA options ( SET REMOTE_AUTHID 'PHANTOM' );
After you create the server definitions and user mappings, the federated instance owner creates the nicknames. A nickname is an identifier that is used to reference the object located at the data sources that you want to access. The objects that nicknames identify are referred to as data source objects.
Table 7.2 shows the data source objects you can reference when you create a nickname.
Data Source | Reference Objects |
---|---|
DB2 UDB for Linux, UNIX, and Windows | Nicknames, summary tables, tables, views |
DB2 UDB for z/OS and OS/390 | Tables, views |
DB2 UDB for iSeries | Tables, views |
DB2 UDB for VM and VSE | Tables, views |
Informix | Tables, views, synonyms |
Oracle | Tables, views |
Microsoft SQL Server | Tables, views |
ODBC | Tables, views |
Sybase | Tables, views |
BLAST | FASTA files indexed for BLAST search algorithms |
Documentum | Objects and registered tables in a Documentum Docbase |
Microsoft Excel | .xls files (only the first sheet in the workbook is accessed) |
Table-structured files | .txt files (text files that meet a verify specific format) |
XML | Sets of items in an XML document |
Nicknames are not alternative names for data source objects in the same way that aliases are alternative names. They are pointers by which the federated server references these objects. Nicknames are typically defined with the CREATE NICKNAME statement.
When an end user or a client application submits a distributed request to the federated server, the request does not need to specify the data sources. Instead, it references the data source objects by their nicknames. The nicknames are mapped to specific objects at the data source. The mappings eliminate the need to qualify the nicknames by data source names. The location of the data source objects is transparent to the end user or the client application.
When you create a nickname for a data source object, metadata about the object is added to the global catalog. The query optimizer uses this metadata, and the information in the wrapper, to facilitate access to the data source object. For example, if the nickname is for a table that has an index, the global catalog contains information about the index. The wrapper contains the mappings between the DB2 data types and the data source data types.
create nickname DEPT for HOST390.DB2TTEST.DEPT;
NOTE
With the current version, you cannot execute DB2 utility operations (LOAD, REORG, REORGCHK, IMPORT, RUNSTATS, and so on) on nicknames.
You can supply the global catalog with additional metadata information about the nicknamed object. This metadata describes values in certain columns of the data source object. You assign this metadata to parameters that are called column options. The column options tell the wrapper to handle the data in a column differently than it normally would.
Column options are used to provide other information to the wrapper, as well. For example, for XML data sources, a column option is used to tell the wrapper the Xpath expression to use when the wrapper parses the column out of the XML document. The SQL compiler and query optimizer use the metadata to develop better plans for accessing the data.
The primary purpose of column options is to provide information about nickname columns to the SQL compiler. Setting column options for one or more columns to Y allows the SQL compiler to consider additional pushdown possibilities for predicates that perform the evaluation operation. This assists the SQL compiler in reaching global optimization.
These columns options are:
NUMERIC_STRING
VARCHAR_NO_TRAILING_BLANKS
You can define column options in the CREATE NICKNAME and ALTER NICKNAME statements. For example, a column with the data type of VARCHAR doesn't have trailing blanks. The nickname for the table is NCKTAB1, and the local name for the column is COL1:
alter nickname NCKTAB1 alter column COL1 OPTIONS ( ADD VARCHAR_NO_TRAILING_BLANKS 'Y')
The data types at the data source must map to corresponding DB2 data types so that the federated server can retrieve data from data sources. For most data sources, the default type mappings are in the wrappers. The default type mappings for DB2 data sources are in the DRDA wrapper. The default type mappings for Informix are in the INFORMIX wrapper, and so forth.
For some nonrelational data sources, you must specify data type information in the CREATE NICKNAME statement. The corresponding DB2 for Linux, UNIX, and Windows data types must be specified for each column of the data source object when the nickname is created. Each column must be mapped to a particular field or column in the data source object.
For example:
The Oracle type FLOAT maps by default to the DB2 type DOUBLE.
The Oracle type DATE maps to the DB2 type TIMESTAMP.
The DB2 for z/OS type DATE maps by default to the DB2 type DATE.
When values from a data source column are returned to the federated database, the values conform fully to the DB2 data type to which the data source column is mapped. If this is a default mapping, the values also conform fully to the data source type in the mapping. For example, suppose an Oracle table with a FLOAT column is defined to the federated database. The default mapping of Oracle FLOAT to DB2 DOUBLE automatically applies to that column. Consequently, the values that are returned from the column will conform fully to both FLOAT and DOUBLE.
For some wrappers, you can change the format or length of values that are returned. You do this by changing the DB2 data type to which the values must conform. For example, the Oracle data type DATE is used as a timestamp; the Oracle DATE data type contains century, year, month, day, hour, minute, and second. It is the default; the Oracle DATE data type maps to the DB2 TIMESTAMP data type. Suppose that several Oracle table columns have a data type of DATE. You want queries of these columns to return only the hour, minute, and second. You can override the default data type mapping so that the Oracle DATE data type maps to the DB2 TIME data type. When Oracle DATE columns are queried, only the time portion of the timestamp values is returned to DB2.
create type mapping MY_ORACLE_DATE from SYSIBM.TIME to SERVER TYPE ORACLE TYPE DATE;
NOTE
If you change a type mapping, nicknames created before the type mapping change do not reflect the new mapping.
DB2 federated servers do not support:
LONG VARCHAR
LONG VARGRAPHIC
DATALINK
User-defined data types (UDTs) created at the data source
You cannot create a user-defined mapping for these data types. However, you can create a nickname for a view at the data source that is identical to the table that contains the user-defined data types. The view must cast the user-defined type column to the built-in, or system, type.
A nickname can be created for a remote table that contains LONG VARCHAR columns. However, the results will be mapped to a local DB2 data type that is not LONG VARCHAR.
For the federated server to recognize a data source function, the function must be mapped against an existing DB2 function. DB2 supplies default mappings between existing built-in data source functions and built-in DB2 functions. For most data sources, the default function mappings are in the wrappers. The default function mappings from DB2 for Linux, UNIX, and Windows functions to DB2 for z/OS functions are in the DRDA wrapper. The default function mappings from DB2 for Linux, UNIX, and Windows functions to Sybase functions are in the CTLIB and DBLIB wrappers, and so forth.
To use a data source function that the federated server does not recognize, you must create a function mapping. The mapping you create is between the data source function and a counterpart function at the federated database.
Function mappings are typically used when a new built-in function and a new user-defined function become available at the data source. Function mappings are also used when a DB2 counterpart function does not exist; you must create one on the DB2 federated server that meets the following requirements:
If the data source function has input parameters:
The DB2 counterpart function must have the same number of input parameters that the data source function has.
The data types of the input parameters for the DB2 counterpart function must be compatible with the corresponding data types of the input parameters for data source function.
If the data source function has no input parameters:
The DB2 counterpart function cannot have any input parameters.
The DB2 counterpart function can be either a complete function or a function template. A function template is a DB2 function that you create to invoke a function on a data source. The federated server recognizes a data source function when there is a mapping between the data source function and a counterpart function at the federated database. You can create a function template to act as the counterpart when no counterpart exists.
NOTE
When you create a function mapping, it is possible that the return values from a function evaluated at the data source will be different than the return values from a compatible function evaluated at the DB2 federated database. DB2 will use the function mapping, but it might result in an SQL syntax error or unexpected results.
However, unlike a regular function, a function template has no executable code. After you create a function template, you must then create the function mapping between the template and the data source function. You create a function template with the CREATE FUNCTION statement, using the AS TEMPLATE parameter. You create a function mapping by using the CREATE FUNCTION MAPPING statement. When the federated server receives queries specifying the function template, the federated server will invoke the data source function.
The CREATE FUNCTION MAPPING statement includes parameters called function mapping options. You can assign values that pertain to the mapping or to the data source function within the mapping. For example, you can include estimated statistics on the overhead that will be consumed when the data source function is invoked. The query optimizer uses these estimates to decide whether the function should be invoked by the data source or by the DB2 federated database.
When you create a nickname for a data source table, information about any indexes that the data source table has is added to the global catalog. The query optimizer uses this information to expedite the processing of distributed requests. The catalog information about a data source index is a set of metadata and is called an index specification. A federated server does not create an index specification when you create a nickname for:
A table that has no indexes
A view, which typically does not have any index information stored in the remote catalog
A data source object that does not have a remote catalog from which the federated server can obtain the index information
NOTE
You cannot create an index specification for an Informix view.
Suppose that a nickname is created for a table that has no index, but the table acquires an index later. Suppose that a table acquires a new index, in addition to the ones it had when the nickname was created. Because index information is supplied to the global catalog at the time the nickname is created, the federated server is unaware of the new indexes.
Similarly, when a nickname is created for a view, the federated server is unaware of the underlying table (and its indexes) from which the view was generated. In these circumstances, you can supply the necessary index information to the global catalog. You can create an index specification for tables that have no indexes. The index specification tells the query optimizer which column or columns in the table to search on to find data quickly.
In a federated system, you use the CREATE INDEX statement against a nickname to supply index specification information to the global catalog. If a table acquires a new index, the CREATE INDEX statement that you create will reference the nickname for the table and contain the information about the index of the data source table. If a nickname is created for a view, the CREATE INDEX statement that you create will reference the nickname for the view and contain information about the index of the underlying table for the view.
create unique index IDEPT on EMPLOYEE (WORKDEPT, JOB) SPECIFICATION ONLY;