2.2 Principle of Unintended Uses

The previously cited examples demonstrate that you cannot separate data from uses. To assess the quality of data, you must first collect a thorough specification of the intended uses and then judge the data as to its suitability for those uses. In a perfect world, database builders would gather all requirements and then craft a database design and applications to match them.

In the real world there is a serious problem when dealing with "unintended uses." These are uses that were not known or defined at the time the databases were designed and implemented.

Unintended uses arise for a large variety of reasons. Some examples follow:

The company expands to new markets.
The company purchases another company and consolidates applications.
External requirements are received, such as a new tax law.
The company grows its usage, particularly in decision making.
The company migrates to a new, packaged application that has different needs.

This represents the single biggest problem with databases. Unintended uses proliferate at ever-expanding rates. You cannot anticipate all uses for a database when initially building it unless it is a database with unimportant content. In the real world you can expect (and in fact depend on) a large number of unintended uses appearing with surprising regularity. Each of these can cause a good database to become a bad database. Two things are needed: anticipation in database design and flexibility in implementations.

Need for Anticipation in Database Design

Database designers need to be schooled in the principles of data quality. By doing so, they will be able to avoid some data quality problems from occurring when unintended uses appear. In the least, they should be schooled to be diligent in the careful and thorough documentation of the content. This means that metadata repositories should be more prevalent, more used, and more valued by information systems groups.

Anticipation also includes questioning each database design decision in light of what might appear in the future. For example, name and address fields should anticipate the company's growth into markets in other countries where the structure and form of elements may vary. Another example is anticipating sales amounts in multiple national currencies.

Database and application designers should be discouraged from using confusing and complicated data encoding methods. Many bad techniques proliferated in the past that were the result of the limited capacity and slow speed of storage devices. These are no longer excuses for making data structures and encoding schemes overly complicated.

A good database design is one that is resilient in the face of unintended uses. This principle and the techniques to achieve it must be taught to the newer generations of information system designers.

Need for Flexibility in Implementations

We know that changes will be made to our systems. Corporations always change. They change so much that keeping up with the changes is a major headache for any CIO. Unintended uses is one of the reasons for change. When a new use appears, its requirements need to be collected and analyzed against the data they intend to use. This needs to be done up front and thoroughly. If the database is not up to the task of the new uses, either the new use needs to be changed or discarded, or the database and its data-generating applications must be upgraded to satisfy the requirements. This analysis concept needs to be incorporated into all new uses of data.

It is amazing how many companies do not think this way. Too often the data is force fit into the new use with poor results. How many times have you heard about a data warehouse project that completed but yielded nonsense results from queries? This generally is the result of force fitting data to a design it just does not match.

Database systems are better able to accept changes if they are designed with flexibility in the first place. Relational-based systems tend to be more flexible than older database technologies. Systems with thorough, complete, and current metadata will be much easier to change than those lacking metadata.

Most information system environment do a very poor job of creating and maintaining metadata repositories. Part of the blame goes to the repository vendors who have built insufficient systems. Part goes to practitioners who fail to take the time to use the metadata systems that are there. Part goes to lack of education and awareness of how important these things really are.

Note

Organizations must improve their use of metadata repositories hugely if they have any hope of improving data quality and recouping some of the losses they are regularly incurring.