Chapter 4: Data Quality Assurance

Chapter 4: Data Quality Assurance


The previous chapters define accurate data. They talk about the importance of data and in particular the importance of accurate data. They describe how complex the topic really is. You cannot get to accurate data easily. They show that data can go wrong in a lot of different places. They show that you can identify much but not all inaccurate data and that you can fix only a small part of what you find.

Showing improvements in the accuracy of data can be done in the short term with a respectable payoff. However, getting your databases to very low levels of inaccuracies and keeping them there is a long-term process.

Data accuracy problems can occur anywhere in the sea of data residing in corporate information systems. If not controlled, in all probability that data will become inaccurate enough to cause high costs to the corporation. Data accuracy problems can occur at many points in the life cycle and journeys of the data. To control accuracy, you must control it at many different points. Data can become inaccurate due to processes performed by many people in the corporation. Controlling accuracy is not a task for a small, isolated group but a wide-reaching activity for many people.

Data accuracy cannot be "fixed" one time and then left alone. It will revert back to poor quality quickly if not controlled continuously. Data quality assurance needs to be ongoing. It will intensify over time as the practitioners become more educated and experienced in performing the tasks necessary to get to and maintain high levels of data accuracy.

This chapter outlines the basic elements of a data quality assurance program. It focuses on data accuracy, a single dimension of data and information quality. This is not to mean that the other dimensions should not also be addressed. However, data accuracy is the most important dimension, and controlling that must come first.