12.4 Remedies for Value Rule Violations

12.4 Remedies for Value Rule Violations

Issues resulting from value tests can take the same path as those of other data profiling activities if they uncover data inaccuracies as the cause of output deviations. However, they lend themselves to additional remedies through adding the value tests as part of the operational environment to help data stewards monitor data over time.

This is useful for catching changes that are negatively affecting the quality of data, for capturing metrics on the impact of improvements made, or to catch one-time problems such as the loss of a batch of data or trying to push data to a summary store before all detail data is collected.

Transaction Checkers

Most value tests do not apply to executing transactions because they deal with values over a group of data. However, it is possible to perform continuous monitoring of transactions by caching the last n transactions and then periodically, such as every minute or 10 minutes, executing the value test against the cached set. The cache would be designed to kick out the oldest transaction data every time a new one is entered. This is a circular cache.

Although this does little to validate a single transaction, it can catch hot spots where the data accuracy is making a rapid turn for the worse. Profiles of value distributions are particularly appropriate for this type of monitoring. You would not use it for most value tests.

Periodic Checkers

Value tests are particularly suited for execution on a periodic basis: daily, weekly, monthly, or quarterly. They are also useful when performing extractions for moving data to decision support stores. In fact, every extraction should include some tests on the data to ensure that it is a reasonable set of data to push forward. It is much more difficult to back out data after the fact than to catch errors before they are loaded.

Value tests are also very useful to execute on batches of data imported to your corporation from external sources. For example, you may be getting data feeds from marketing companies, from divisions of your own company, and so on. It only makes sense that you would provide some basic tests on the data as part of the acceptance process.

Periodic checks suggest that the steward of the data be able to modify the expectations on a periodic basis as well. Each value rule can have a data profiling repository entry that includes documentation for the test, expectations, and result sets. This facilitates comparing results from one period to another and tracking the changes to expectations.