9.5 Mapping with Other Structures

In projects that involve extracting data from one or more sources and moving to a new target data structure, there is an additional task of structure mapping. The last chapter described the process of column-level mapping. Too often that is the only mapping exercise that is done. Failure to consider structure issues can lead to inaccurate results.

Mapping is much easier if the data structures for source and target systems is represented in third-normal form. If you can do that, you can always transform the data to the third-normal form in order to move it correctly.

After you have completed the column mapping as described in the previous chapter, you begin the structure mapping process. Each target table should be traced back to find the various tables data is to come from.

Single-Source Tables

If the data comes from a single table, you need to look at primary keys only. If the primary key of the source table maps to a primary key of the target table, it is a good match. If not, you need to look further.

If the columns in the source table that map to the primary key of the target are not a primary key or constitute only part of a source table primary key, you need to determine if it could be a primary key for the target table. It only need be a primary key in the source over the column set that is being mapped to the target. Thus, if not all of the source table columns are mapped, it is possible that the new, proposed key is a primary key over that subset. You need to execute a test in the source data for this key to determine if it works. If it does, there is no issue; if not, you can see the extent to which the data does not support that combination as a primary key.

If the target key is not fully mapped from the source, a problem may also exist. You need to determine if the columns that are mapped constitute a valid key in the source, as described previously. If so, you can do the move by adding a NULL to the missing columns. If not, you have an issue. Of course, this requires that the target system allows NULL values in multicolumn primary keys.

Multiple-Source Tables

When data lands in the target table from more than one source table, you need to not only verify the primary key compatibility but ensure that the data can be properly aggregated for the target.

If the source tables are vertical or horizontal merge pairs, you already have confidence that a merge can occur. You only need to verify that the merged rows end up with a primary key that is acceptable to the target. This is performed by using the same logic described previously for a single-source table but using it instead on the merged set.

If the target table contains data from more than two source tables, each table should participate in a merge relationship with the other tables. It may be a mixture of horizontal and vertical merges. You need to be able to combine the tables using information from structure analysis to get a single-target-row image. This image is used to validate primary key compatibility, as described previously.

The multiple-merge case can become complex. It is best accomplished by first picking two tables to combine to get an intermediate output image. This image should include only columns matched to target columns. A new image is then created by adding one more table. This process is completed until all tables are in the image. At each step you need to ensure that a proper merge synonym step exists to make the match. This becomes the key to combining data. The merge key must be preserved throughout the matching process.

If the source tables are not in a merge synonym relationship, you may still be able to accomplish the task. The merge relationship may have been missed in structure analysis. If the target uses only some of the source columns, it may exist on that subset even if it does not on the total set of source columns. The subsets of the tables can be analyzed to see if the merge will work.

If the multiple-source tables have a primary / foreign key relationship but are not a merge relationship, you can still move the data, provided primary key tests are positive. You will be creating a denormalized target table. If that is what the target wants, it will work. Target systems frequently denormalize data to achieve better query performance. You only need to satisfy yourself that it can be done.