3.2 Data Accuracy Decay

3.2 Data Accuracy Decay

Data that is accurate when initially created can become inaccurate within a database through time. The data value does not change; its accuracy does. Some data elements are subject to value accuracy decay and some are not. Some examples will illustrate the concept of decay.

Personal information in an employee database easily becomes wrong. People move, they change their marital status, they complete new education programs, they change telephone numbers. Most employees do not run into HR and fill out a form every time something changes in their life. The information in HR reflects the accuracy at the time they initially joined the company or the last time an update was done. Inventory-on-hand information can become wrong due to spoilage, parts taken and used, and new transactions not processed.

The value carried on the books for capital assets can change due to market demand changes, changes in the environment in which the asset is used, unusual wear and tear, or damage done and not reported. A state driver's license database indicates that a person has acceptable eyesight to drive without glasses. However, during the time since the license was issued, the person's eyesight may deteriorate to the point where she cannot safely drive without glasses. The inaccuracy will not be corrected until a renewal requires a new eye test.

All of these examples show that a change occurred in the object being represented in the database and the database was not updated to reflect it. Another way of saying this is that a transaction was needed to be processed and was not. However, these "missing transactions" are commonplace in the real world.

Not all data elements are subject to decay. Information defining an object generally does not decay, whereas information providing other information about the object generally can be subject to decay. Good database designers will note the decay characteristic of a data element as part of the metadata and will design processes to verify or update the information as needed by the consumers of the data. For example, in an inventory application, the data elements for PART_NUMBER, PART_DESCRIPTION, and UNIT_OF_MEASURE would not be considered subject to decay, whereas QUANTITY_ON_HAND, SUPPLIER_ID, and STORAGE_BIN_NUMBER would be subject to decay.

Another characteristic of decay-prone data elements is that the overall accuracy of the element tends to follow a sawtooth graph over time. Data is entered. Over time, the older data gets progressively more inaccurate (it decays). The accuracy of the element is determined by the mix of older data versus newer data, which tends to be more accurate. A corrective action occurs that pulls the data back into a higher state of accuracy.

In the previously cited examples, the corrective event for HR information may be an annual data review conducted with all employees; for inventory, an annual inventory exercise; for capital equipment, an annual reappraisal. Note that for the driver's license example there is no event that brings all or a set of records into accuracy at a single point in time. It is a continuous database with new records and renewals occurring on an individual timeline. This does not follow a sawtooth pattern because the decaying and correcting occurs continuously over the database. However, it does have an inaccuracy component due to decay that remains fairly constant over time. This relationship of accuracy and time is depicted in Figure 3.2.

Click To expand Figure 3.2: Accuracy of decayable elements over time.

An example of a good HR system is to have new employees verify all HR information immediately after initial entry into the database, request that employees review all personal information each time they make any change, and then request a specific review by the employee any time no review has been done in a year (or less if you can get away with it). Another mechanism is to make the review a part of the annual employee evaluation process. It would not hurt to monitor marriage and birth announcements as well.

Unfortunately, most companies are not willing to be that concerned about data decay accuracy issues. It is not because they have considered the cost of a continuous updating process to be higher than the cost of not being more accurate. It is more a case of not thinking about the problem at all. The cost of a corrective program is not very high once it has been put in place. It will generally return more value than it costs.

A proper way of handling decay-prone data elements is to identify them in the database design and indicate this in the metadata repository. Periodic events should be planned to update information to get the database to recover from decay. The times that these events occur should be included in the database and/or the metadata repository so that database business users can know what to expect. The metadata repository should also include information on the rate of change that occurred in the periodic auditing event. A good data analyst knows the rate of decay and the probable accuracy index for any element at any given point in time.