To determine where you should go, you must first understand where you are.
This maxim defines the entire philosophy behind this phase of the database-design process. You must devote some time to gaining a clear understanding of your organization's database for these reasons:
To determine whether the database supports the organization's current information requirements
To uncover existing structural deficiencies
To determine how the database needs to evolve so that it will support the organization's future information requirements
You can use the existing database as a resource for developing a new database. However, you must carefully judge which aspects of the current database remain useful and which aspects should be discarded. You can make these judgments by answering the following questions:
What types of data does the organization use?
How does the organization use that data?
How does the organization manage and maintain that data?
The answers to these questions provide you with vital information that you can use to design a database that best suits your organization's needs.
You can best answer these questions by analyzing your organization's existing database. It's very likely that the organization is using some type of database, and it can probably be associated with one of the following categories:
Paper-based databasesalso known as file systemstypically consist of various forms and handwritten documents stored in file folders or bound in notebooks. The folders and notebooks are identified by some coding scheme (e.g., unique numbers or colored tabs) and stored in file cabinets. These cabinets are likely to be identified by some coding scheme as well, depending on the size of the database.
Legacy databases have been in existence and in use for several years or more and consist of various types of data structures and character-based user interface screens that all reside on a mainframe computer or personal computer. The capability, functionality, and effectiveness of the structures and screens are quite dependent upon the programming language and database-management software used to create them. In general, the structures and screens are crude by today's standards because they were created at a time when programming languages and database software were not as sophisticated as those we've come to know since the mid-1990s.
Human-knowledge bases (loosely defined) are based on the memory of one or more employees within an organization. These individuals have a specific amount of knowledge regarding a given aspect of the organization (e.g., customer information or product details), and they are crucial to conducting the organization's business.
The goal of your analysis is to determine the types of data the organization uses, how the organization manages and maintains that data, and how the organization views and uses the data. You can reduce the time it takes to define the preliminary field and table structures for the new database if you conduct this investigation properly.
During the analysis, you review the various ways the organization collects and presents its data, and you conduct a set of interviews with users and management. You then use the information you've gathered to define a preliminary field list and to help you determine the tables that should be included in the initial database structure. If your analysis reveals that the current database is poorly designed, you can take precautions to ensure that you don't make the same mistakes in the new database. Despite whatever shortcomings the current database may have, it can still help you identify a number of the fields and tables that you should include in the new database.
There's one rule you should keep first and foremost in your mind as you're analyzing the current database:
Do not adopt the current database structure as the basis for the new database structure.
Following this rule will help you avert unnecessary errors and aid in maximizing your design efforts.
Every so often, there's a point during the analysis when a novice database developer (and sometimes an experienced one, as well) will stop and think, "This database doesn't look too bad. Let's just end the analysis here and use this database as the basis for the new one." This is a particularly bad idea because every hidden problem within the current database structure will be transferred into the new database. These types of problems include awkward table structures, poorly defined relationships, and inconsistent field specifications; they will invariably surface later and at the least opportune times. Therefore, you should do your best to avoid this perilous situation by following the rule above. Just remember that it's always better to define a new database structure explicitly than to copy an existing structure. After all, if the old database didn't have problems, you wouldn't be building a new one.
You'll typically analyze two types of databases during this part of the design process: paper-based databases and legacy databases. Many organizations use both types of databases to some degree, and you perform the same basic analysis process on each of them. There are minor differences in the way you analyze a paper-based database and a legacy database, to be sure, but the differences have more to do with the databases themselves than with the overall analysis process. You needn't be concerned with these differences, however, because I've seamlessly incorporated them into the analysis process presented in this book.
A paper-based database incorporates data that is literally collected, stored, and maintained on paper. The paper used in this type of database appears in a variety of shapes, sizes, and configurations. Some of the more common formats include index cards, hand-written reports, and various types of preprinted forms. Anyone who has ever worked in an office for a business or organization is very familiar with this type of database.
You'll find that analyzing this type of database can be a daunting task. One of your most immediate problems is finding someone who completely understands how the database works so you can learn its use and purpose. There are several problems with the paper-based database itself, especially in terms of the way data is collected and managed. This type of database typically contains inconsistent data, erroneous data, duplicate data, redundant data, incomplete entries, and old data that should have been purged from the database long ago. Clearly, the only reason you'd analyze this type of database is to identify items that you could incorporate into the new database. For example, you can extract individual pieces of data from various sections of a form in the paper-based database and transform them into fields in the new database.
A legacy database is a database that has been in existence and in use for five years or more. Mainframe databases typically fall into this category, as do older PC-based databases. There are several reasons that "legacy" is used as part of the name for this type of database. First, it suggests that the database has been around for a long time, possibly longer than anyone can clearly remember. Second, the word "legacy" may mean that the individual who originally created the database has either shifted responsibilities within the organization or is working for someone else, and, thus, the database has become his or her legacy to the organization. Third, the term implies the disturbing possibility that no single individual completely understands the database structure or how it is implemented in the DBMS software program.
Mainframe legacy databases present some special problems in the analysis process. One problem stems from the fact that a number of older mainframe databases are based on hierarchical or network database models. If neither you nor anyone in the organization has a firm understanding of these models, it will take you some time to decipher the structure of the database. In this case, you'll find it very helpful to make printouts of the data in each of the database structures.
Even if a legacy database is based on the relational model, there's no particular guarantee that the structure is sound. Unfortunately, there are many instances where the people who created these databases didn't completely understand the concept of a relational database. (After you have read this book, you won't fall into that group.) As a result, many older databases have improper or inefficient structures.
Numerous PC-based legacy databases are improperly or inefficiently designed as well. Many of them were originally designed and implemented in dBase II and dBase III, which were nonrelational database-management systems. As a result, the databases implemented within these systems could not take advantage of the benefits provided by the relational model. Two characteristics commonly associated with these types of databases are duplicate fields and redundant data, which (as you'll learn later) can cause serious problems with data integrity.
Analyzing a legacy database is somewhat easier than analyzing a paper-based database because a legacy database is typically more organized and structured than a paper-based database, the structures within the database are explicitly defined, and there is usually a software application program that people use to interact with the data in the database. (The application program is valuable to you during the analysis process because it can reveal a lot of information about the data structures and the tasks performed against the data in the legacy database.) The time it will take you to perform a proper analysis will depend to some degree on the platform (mainframe or PC), the DBMS used to implement the legacy database, and the software application program.
The key point to remember when you're analyzing either a paper-based or a legacy database is that you should proceed through the process patiently and methodically so that you can ensure a thorough and accurate analysis.