Over the past thirty years I have helped organizations in many different countries design and deploy a wide range of IT applications. Throughout this time, the topic of data accuracy and quality has been ever present during both the development and the operation of these applications. In many instances, however, even though the IT development group and business managers recognized the need for improved data quality, time pressures to get the project in production prevented teams from addressing the data quality issue in more than a superficial manner.
Lack of attention to data quality and accuracy in enterprise systems can have downstream ramifications. I remember working with an overseas bank on a data warehousing system a few years ago. The bank was struggling with delivering consistent business intelligence to the bank's business users. On one occasion, a business manager discovered that financial summary data in the data warehouse was wrong by many millions of dollars. I visited the bank several months later, and I was told that the reason for the error had still not been found. Trying to analyze a data quality problem caused by upstream applications is time consuming and expensive. The problem must be corrected at the source before the error is replicated to other applications.
Lack of accuracy in data not only erodes end-user confidence in IT applications, it can also have a significant financial impact on the business. As I write this, I am reading a report from the Data Warehousing Institute on data quality that estimates that poor-quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead. The same report states that nearly 50% of the companies surveyed have no plans for managing or improving data quality. At the same time, almost half the survey respondents think the quality of their data is worse than everyone thinks.
These results clearly demonstrate a gap between perception and reality regarding the quality of data in many corporations. The report goes on to state that "although some companies understand the importance of high-quality data, most are oblivious to the true business impact of defective or substandard data." To solve this problem, companies need to become more educated about the importance of both data quality and techniques to improve it. This is especially importnat given that the world economy is becoming more and more information driven. Companies with access to timely and accurate information have a significant business advantage over their competitors.
I must admit that when I was approached to write the foreword for this book, I had some reservations. As a practitioner, I have found that books on data quality are often very theoretical and involve esoteric concepts that are difficult to relate to real-world applications. In truth, I have a suspicion this may be one reason why less attention is given to data quality than in fact it deserves. We need education that enables designers and developers to apply data quality concepts and techniques easily and rapidly to application development projects.
When I read this book I was pleasantly surprised, and my concerns in this regard vanished. The author, Jack Olson, has a background that enables him to address the topic of data quality and accuracy from a practical viewpoint. As he states in the preface, "Much of the literature on data quality discusses what I refer to as the outside-in approach. This book covers the inside-out approach. To make the inside-out approach work, you need good analytical tools and a talented and experienced staff of data analysts…. You also need a thorough understanding of what the term inaccurate data means." The bottom line for me is that the book presents techniques that you can immediately apply to your applications projects. I hope that you will find the book as useful as I did and that the ideas presented will help you improve the quality and accuracy of the data in your organization.