From about 1950 through today, there has been a clear evolution in the use of computer-generated data from simple historical record keeping to ever more active roles. This trend does not show signs of slowing down. Data is generated by more people, is used in the execution of more tasks by more people, and is used in corporate decision making more than ever before.
The more technology we develop, the more users demand from it.
When we look at the past 50 years, the degree to which information systems have played a role has been in a state of constant change. Every IT department has had a significant amount of their resources devoted to the task of implementing new or replacement systems. As each new system is deployed, it is often already obsolete, as another replacement technology shows up while they were building it. This drives them to again replace the just-finished system. This process of "continuous evolution" has never stopped, and probably will not for a number of years into the future.
The constant need to remodel systems as fast as they are developed has been driven by enormously fast technology innovation in hardware, communications, and software. No organization has been able to keep up with the rapid pace of technological change. All organizations have been chasing the technology curve in the hope of eventually reaching a stable point, where new systems can survive for awhile. They will not reach stability for a long time in the future, as much more technology is being born as this is written.
The need to change is also fueled by the rapid change in the nature of the companies themselves. Mergers and acquisitions drive very rapid and important changes as companies try to merge information systems. Changes in product lines or changes in markets served drive many hastily implemented changes into information systems. For example, the decision to "go global" can wreak havoc on currency, date, address, and other data elements already in place. Business change impacts are the ones that generally are done the quickest, with the least amount of planning, and that usually derive the worst results.
External changes also cause hastily implemented patches to existing systems: tax law changes, accounting changes such as those experienced in recent years, the Y2K problem, the EURO conversion, and on and on. This rapid evolution has meant that systems have been developed hastily and changed aggressively. This is done with few useful standards for development and control. The software industry has never developed effective standards similar to those the hardware and construction industries enjoy (through blueprints), nor does it have the luxury of time to think through everything it does before committing to systems. The result is that many, if not all, of our systems are very rough edged. These rough edges particularly show through in the quality of the data and the information derived from the data.
A lot of this rapid change happened in order to push information systems into more of the tasks of the enterprise and to involve more people in the process. The Internet promises to involve all people and all tasks in the scope of information systems. At some time in the future, all companies will have an information system backbone through which almost all activity will be affected. As a result, information systems become bigger, more complex, and, hopefully, more important every time a new technology is thrown in. The data becomes more and more important.
Just about everything in organizations has been "databased." There are personnel databases, production databases, billing and collection databases, sales management databases, customer databases, marketing databases, supply chain databases, accounting databases, financial management databases, and on and on. Whenever anyone wants to know something, they instinctively run to a PC to query a database. It is difficult to imagine that less than 25 years ago there were no PCs and data was not collected on many of the corporate objects and activities of today.
I participated in an audit for a large energy company a few years ago that inventoried over 5,000 databases and tens of thousands of distinct data elements in their corporate information systems. Most corporations do not know how much data they are actually handling on a daily basis.
Not only has most corporation information been put into databases, but it has been replicated into data warehouses, data marts, operational data stores, and business objects. As new ways are discovered to use data, there is a tendency to create duplication of the primary data in order to satisfy the new need. The most dramatic example today is the wave of customer relationship management (hereafter, CRM) projects proliferating throughout the IT world.
Replication often includes aggregating data, combining data from multiple sources, putting data into data structures that are different from the original structure, and adding time period information. Often the original data cannot be recognized or found in the aggregations. As a result, errors detected in the aggregations often cannot be traced back to primary instances of data containing the errors.
In addition to replicating, there are attempts to integrate the data of multiple databases inside interactive processes. Some of this integration includes reaching across company boundaries into databases of suppliers, customers, and others, examples of which are shown in Figure 1.1.
Adding the demands of replication and integration on top of operational systems adds greatly to the complexity of information systems and places huge burdens on the content of the primary operational systems. Data quality problems get magnified through all of these channels. Figure 1.2 indicates aspects of integration, operation, and replication.
Along with the increasing complexity of systems comes an increase in the impact of inaccurate data. In the primary systems, a wrong value may have little or no impact. It may cause a glitch in processing of an order, resulting in some small annoyance to fix. However, as this wrong value is propagated to higher-level decision support systems, it may trigger an incorrect reordering of a product or give a decision maker the wrong information to base expanding a manufacturing line on. The latter consequences can be much larger than the original.
Although a single wrong value is not likely to cause such drastic results, the cumulative effect of multiple wrong values in that same attribute can collectively deliver very wrong results. Processes that generate wrong values rarely generate only one inaccurate instance.