2.7 How Important Is It to Get Close?

You can achieve very high levels of accuracy of data within a database if enough energy and resources are applied. Although data accuracy is only one component of data quality, it is clearly the single most important component. If the data values are just plain wrong, missing, or inconsistent, any attempt to use the data will be tainted. Every quality improvement program must begin by dealing with data accuracy.

Most decision support applications have a tolerance level for inaccurate data. Inaccuracies up to the tolerance level allow the application to provide high-quality decisions. The inaccuracies do not change the outcome from what it would be if the data were 100% accurate, provided the data inaccuracies are not unduly biased.

Above the tolerance level, the data will generate wrong decisions but will not be noticed because the decisions are not so bad. This is a dangerous situation because the company is acting in the wrong way to data that they believe to be good. It leads to inefficiencies that are not noticed. At some higher level of inaccuracies, the data becomes not believed and has no effect on decisions because it is not used. Figure 2.4 depicts the relationship of usefulness and accuracy as a step function influence on tolerance levels.

Figure 2.4: Step function influence on tolerance levels.

Most business analysts have no idea how to compute the tolerance levels for decisions they make. Because they have no idea how bad the data is, they must believe what they see.

This highlights two important aspects of working to improve the accuracy of data. First, you have a good chance of bringing the data accuracy back to the first area (the good place to be). You do not have to get to zero errors, you just need to get into the good zone. Second, you provide valuable information to the decision makers on the relative accuracy of the data. If you do not have an assessment program, you either blindly believe the data or mistrust it enough to either use it with caution or never use it.

Because decision-making efficiency is a step function of data accuracy, it follows that small improvements in the accuracy can lead to very large payoffs in value. If the quantity of inaccuracies is putting you in the wrong decision zone, and improvements move you into the zone of right decisions, the difference in value to the corporation can be enormous.

If you have no data quality program, there is probably a huge potential value in instituting one. You have no idea how much value is there because you are blindly using the data you have and have no idea how bad it is. The cost of lowering the percentage of inaccurate data will almost always pay off big for early improvements. As you get closer to zero errors, the cost will prove to be excessive in comparison to gain. However, you can get very close before the crossover occurs.