2.5 Can Total Accuracy Be Achieved?

2.5 Can Total Accuracy Be Achieved?

The short answer is no. There will always be some amount of data in any database that is inaccurate. There may be no data that is invalid. However, as we have seen, being valid is not the same thing as being accurate.

Data accuracy is much like air quality. You can never hope to get to 100% pure air quality within an area where people live and work. It is just not possible. However, most people can distinguish between good air quality and poor air quality, even though both have some level of imperfections. People value higher-quality air over lower-quality air; and they know the difference.

Data accuracy is the same thing. Improvements in the accuracy of data can change the perception of poor data quality to good data quality in most databases even though inaccuracies persist. It is a rare application that demands 100% accurate data to satisfy its requirements.

A database that has a 5% inaccurate data element rate will probably be very troublesome to most users. The same database at a 0.5% inaccurate rate would probably be very useful and considered high quality.

Another important concept to understand is that data inaccuracies arise for a variety of reasons. Some of these are

  • wrong values entered

  • data entry people who do not care to do the job right

  • confusing and contradictory data entry screens or forms

  • procedures that allow for data to not be entered or not be entered on time

  • procedures or policies that promote entering wrong values

  • poorly defined database systems

If you can identify and correct all of the sources except the first one, you can get very high levels of data accuracy. You are left with only the case where I meant "blue" but entered "black". Data entry technology and best practices exist that can minimize the amount of these types of errors as well.

In almost all cases where poor data quality is reported, no effort has been made to identify root causes of wrong values. Without finding root causes, improvements in the quality are not going to occur. Whenever effort is spent to identify root causes and correct them, improvements follow. The improvements are almost always noticeable and impressive.

All other reasons tend to cause a clustering of data inaccuracies around the faulty process. These are easier to find and correct than the random errors that occur just because people enter data mistakenly. If all we had left were the random errors of people, the errors would be more evenly distributed throughout the database, would be small in number, and would have minimal impact on the uses of data.

So, the long answer is yes. You can get accurate data to a degree that makes it highly useful for all intended requirements.