This chapter begins the
examination of the most important technology available to the data quality
assurance team: data profiling.
Note to the reader: This text uses the terms
column and table throughout the data
profiling chapters in order to provide consistency. Data profiling is used for
data from a wide variety of data sources that use different terminology for the
same constructs. Consider table the equivalent of
file, entity, relation, or segment,
and column the equivalent of data element,
attribute, or field.
The text uses the term data profiling
repository to mean a place to record all of the information used in
and derived from the data profiling process. Much of this information is
metadata. However, I do not want to confuse the reader by referring to it as a
metadata repository. A user could use an existing metadata repository for this
information provided it was robust enough to hold all of the types of
information. Otherwise, they could use the repository provided by a data
profiling software vendor or fabricate their own repository. It is not
unreasonable to expect that much of this information would subsequently be
moved to an enterprise metadata repository after data profiling is
complete.