8.4 Mapping with Other Columns

One of the most important uses of value property analysis outputs is in mapping columns together between systems. You may be moving data from one system to another. You may be summarizing data from one system into summary columns of a target system. You may be combining data from two or more source systems into a single column of a target system.

If the target column is not populated, mapping is done at the properties level only. If you have a property list for the target and have developed a property list for the source through data profiling, mapping can proceed. If the target is already populated, its property list should have been generated or validated through data profiling. This provides more information to map to and increases the confidence in the accuracy of the mapping.

Business Meaning

The first check is at the business meaning level. Is the business meaning of the columns the same or are they different? The physical content may make them look the same but in fact they may not be. The business analysts must ensure that you are not mixing apples and oranges.

Storage Properties

The data types for the corresponding columns being mapped need not be the same if the content can be transformed. For example, dates stored in a CHARACTER column may be able to be mapped to a DATE column without a problem. If all values in the source system are valid dates, the column is transformable.

Length checks also must be performed. The target only needs to be large enough to hold the largest value found in the source. It does not need to be as large as defined in the source. This applies to CHARACTER length columns, integer/small integer columns, and decimal precision. For example, a 30-character column can be mapped to a 25-character column if all values found are 25 characters in length or less.

Value Properties

The valid values as determined by discrete lists, ranges, text rules, and null value rules need to be compared. This needs to be done even if the target is not populated, because you do not want to put data into a target that violates any of the target rules.

Value lists need to be mapped at the individual value level to determine if the encoding of one system is truly identical to that of the target. The semantic meaning of each value needs to match across the systems. You can map different encodings by mapping values together that mean the same thing even if their representation is different. This provides the input to data transformation processes that need to be executed when the data is moved.

Ending up with values that cannot be mapped because of differences in ranges or discrete value lists or other rule differences exposes invalid mapping data. These are values that are valid in the source system but not valid in the target system. The user must determine on a column-by-column or value-by-value basis what needs to be done with these values.

One of the most common mistakes made in moving data is to automatically assume that a 1-byte CHARACTER column in one system can be mapped to a 1-byte CHARACTER column in the other system without looking at the data. A real example is the gender code that in one source system was encoded 1 and 2 and in the target system was encoded F and M. Additionally, a second source system was encoded M and F but had some other curious values stored. In this example the user moved the data without looking at values and created a mess in the target system.