10.5 Simple Data Rule Remedies

10.5 Simple Data Rule Remedies

The analysis of data rules generally involves creating several new data rules that are not implemented anywhere in the operational systems. Violations indicate the presence of inaccurate data or of excessive exceptions to rules. Because most system developers are not rule oriented, this presents a large amount of new material that can be incorporated into their operational systems or company procedures to prevent errors from getting into databases or to improve the design of the systems themselves.

System developers are becoming more and more aware of rule approaches to building better systems. This encompasses not only data rules but process rules. This leads to a better documentation of rules, standardization on how and where rules are executed in operational systems, and using rules for feedback.

The data profiling process will identify a number of data rules, along with test results from applying them to real data. The logical process to follow at this point is to evaluate the usefulness of the rules and to determine how they can play a role in improving operational systems. Not all rules can be handled the same way. Some are more suitable for direct monitoring of transactions and some are not. It is the job of the analyst to determine what is the best disposition of each rule.

Data Rule Evaluation

Not all data rules are equal. Some have important business implications if not followed, and others do not.

Some rules have no violations. This is often because checks exist some-where in the data acquisition process that enforce the rule, thus making it impossible for violations to exist after the data is centered. For example, if screen specifications for entering orders and subsequent steps always use the CURRENT DATE to fill in date fields for ORDER_DATE, SHIPPING_DATE, and RECEIVED_DATE, these columns will always have correct values and the ordering rules will not turn up violations.

It is always important to test rules in the data profiling process regardless of whether you think they are enforced in operational systems or not. You are sometimes fooled into thinking they cannot be violated only to discover that there is an alternative way for data to enter that causes problems. However, after profiling the data, you have a better idea of which rules expose most of the problems.

You cannot put all data rules into operational settings. They will consume too much system resource and slow down the systems. If you attempt to, some database administrator or other steward of the operational systems will take them out to achieve the performance goals of the applications.

Rules That Can Be Checked During Data Entry

These are rules that can be checked during data entry and that would not incur significant overhead in processing. The checking logic can be placed in code at the data entry station (usually a personal computer) and requires no system time to execute. These data rules generally only involve interaction between values in the transaction. They do not generally involve reading data at the server side.

Rules That Should Be Checked During Transaction Processing

These data rules are important enough to spend valuable server-side processing time to check in order to avoid improper data from entering a database. They are done at the server end because they require access to data that is not in the transaction. They usually execute within the context of the transaction, sending rejection or confirmation information back to the data entry station within the context of the transaction.

Rules That Should Be Deferred For Execution

These are rules that are more complex and require extensive machine resources to execute. They show few violations in the data. The checking for these rules can be deferred until a periodic sweep of the database can be done.

Rules That Do Not Need To Be Checked

These are rules that do not show violations. Analysis of operational systems assures you that the rule will not be violated on data entering the database.

You may be willing to accept some others that do not have any significant amount of violations and for which the consequences of a violation are not important. This choice would be made to avoid overloading operational systems.

You may want to re-execute these rules in the data profiling venue periodically to ensure that they continue to have little operational value. Systems change a lot, and so do people executing data acquisition processes. A rule may move up in importance if a change starts causing more inaccuracies.

Data Rule Checkers for Transactions

Checkers in transaction code can be implemented for rules in the second category previously mentioned. These can be implemented as screen logic, application server programs, database procedures, or by the insertion of a rule engine for processing. When this remedy is used, the data profiling repository should keep track of where the rule is executing.

Note that almost all rules for individual business objects can be checked this way. Because only the data of a single object is involved, it never takes much resource to perform the check.

The point in the data flow process where a rule is executed can be influenced by what data is available at each point in the process. If the data entry screens contain all of the columns in the rule, the checking can be done right on the screen and the transaction stopped before it enters the transaction path. If it requires fetching some of the data that is not on the screen but already in the database, processing may be deferred until later in the process.

Data Rule Checkers for Periodic Checks

Batch programs can be written that sweep the database periodically and look for rule violations. This should be done for rules that are not appropriate for transaction checking. Almost none of the single business object rules fall into this category.

Improving Business Procedures

Analysis of the results of data rule execution may uncover the need to alter business policies and practices. This can occur in either direction. A rule often violated may indicate that the corporation is really not following the rule. Evaluation of the business process may uncover the fact that the rule is no longer applicable and should be discarded.

Similarly, the frequent violation of a rule that is deemed to be important may lead to changes in business practices that ensure that the rule is honored either all of the time or that acceptable exceptions are made. Analysis of results may also indicate the opportunity to provide feedback to people generating the data. They may not be aware of the mistakes they are making or the existence of a rule.

Maintaining a Data Rule Repository

If you have gone through the process of generating a library of data rules, tested the rules against the data, and then used the output to generate issues for system improvements, it only makes sense that you would want to preserve all of this information for future use. It is extremely important to either capture the data rule in the data profiling repository or move it to a rule engine repository or an enterprise repository at the end of data profiling. You will want to use these rules to check improvements, ensure data continues to conform, and as the basis for system change decisions in the future.