A field specification incorporates various elements that define every attribute of a field. All of the elements within the specification are categorized as general elements, physical elements, or logical elements. These element categories enable you to focus on a distinct aspect of the field as you're defining the specification, and they provide a way for you to find a particular element quite easily.
Here are the elements within each category:
General Elements: Field Name, Parent Table, Label, Specification Type, Source Specification, Shared By, Alias(es), Description
Physical Elements: Data Type, Length, Decimal Places, Character Support, Input Mask, Display Format
Logical Elements: Key Type, Key Structure, Uniqueness, Null Support, Values Entered By, Required Value, Default Value, Range of Values, Edit Rule, Comparisons Allowed, Operations Allowed
Figure 9.1 shows an example of a Field Specifications sheet. We'll use this sheet (or various portions of it) as we work on field specification examples throughout the remainder of the book.
Items under the General Elements category represent the most fundamental attributes of the field. They provide information on the field's purpose, the name of the table(s) in which the field appears, and the pseudonyms the field assumes under certain circumstances.
This is the set of absolute minimal words that uniquely identifies a particular field throughout the database. You created and refined field names earlier in the database-design process (see Chapter 7), so you'll just take each name and use it as the setting for this element.
The table that incorporates a given field within its structure is known as the field's parent table. This is the only table in which the field will appear unless the field is participating in establishing a relationship. (You'll learn more about this exception in Chapter 10.) For example, STUDENTS is the parent table of the STUDFIRST NAME field.
This is an alternate name (typically a shorter form of the field name) by which you can identify the field within an end-user application interface that you create for the database. For example, you might use QTY ON HAND as a label for a field named QUANTITY ON HAND because many people in the organization are already accustomed to this particular name. Labels can be particularly useful when you want to conserve space on a data-entry screen or squeeze more fields into a particular report.
Avoid the temptation of using the label as the official field name within the table structure; otherwise, you make it possible for someone to misinterpret or incorrectly identify the field. Always use the most precise and accurate name as the official field name and then use the label (judiciously, of course) within your end-user interface applications. This will enable you to make a distinction between the two at all times.
The elements you set for a given field depend upon the type of specification you define for the field. You can define a specification in three ways:
Unique. This is the default specification for all fields except those that serve as a template for other fields or those that participate within a table relationship as foreign keys. You can incorporate all but the Source Specification element for this type of specification, and the element settings you establish will apply only to the field indicated in the Field Name element.
Generic. This specification serves as a template for other field specifications and helps you ensure consistent definitions for fields that have the same general meaning. For example, you could create this type of specification for a generic STATE field and then use it as the basis for every other STATE field in the database. Fields such as CUSTSTATE, EMPSTATE, and VENDSTATE all have the same meaning (they represent a state within the United States), but there is enough of an obvious distinction between them to require that they remain separate fields. (If you recall, you learned about generic fields in Chapter 6 when you were developing the preliminary field list and in Chapter 7 when you were working with the Elements of the Ideal Field.)
A generic specification requires you to use a nonspecific field name and element settings that are as broad and general as possible. You can, however, incorporate any element except Parent Table, Label, Shared By, Alias(es), and Source Specification.
Replica. This is the default specification for a field based on a generic field or a field that serves as a foreign key within a table relationship, and it draws a majority of its element settings from an existing specification. You can incorporate elements that were not already incorporated by the source specification, and you can alter any element settings drawn from the source specification.
You'll learn how to define each type of specification in the section "Using Unique, Generic, and Replica Field Specifications" later in the chapter.
This element is set only on a Replica specification and indicates the name of the specific field specification upon which the current specification is based. (You'll see a good example of this element in the next section as well.)
This element indicates the names of other tables that share this field. The only table names that should appear here are those that have an explicit relationship to the field's parent table. For example, assume you have a data table called EMPLOYEES that is related to two subset tables called PART-TIME EMPLOYEES and FULL-TIME EMPLOYEES via a field called EMPLOYEE ID NUMBER. As you create a field specification for EMPLOYEE ID NUMBER, you would use "PART-TIME EMPLOYEES, FULL-TIME EMPLOYEES" as the setting for this element.
This is a name (or set of names) that you use for the field in very rare circumstances. One instance in which you would use an alias is when there must be two occurrences of the field in the same table. Let's assume that an organization is accustomed to identifying its employees by unique values within an EMPLOYEE ID NUMBER field. Now, consider the SUBSIDIARIES table structure in Figure 9.2 (this is a partial structure only).
In this instance, each subsidiary has a president and a vice president. Both of these individuals must be represented in the table because of their positions within the subsidiary organization, so there are two EMPLOYEE ID NUMBER fields in the table structure. Proper database design, however, dictates that there can only be one occurrence of this field within the table; there is an obvious problem here. The only solution is to use an alias for one or both occurrences of the EMPLOYEE ID NUMBER field. For instance, you could (for sake of clarity) use PRESIDENT ID as an alias for the first occurrence of EMPLOYEE ID NUMBER and VICE PRESIDENT ID as an alias for the second occurrence of EMPLOYEE ID NUMBER. With the aliases in place, both employees are properly represented within the table. Figure 9.3 shows the revised table structure.
Although using an alias is acceptable under these circumstances, you should use them very judiciously; otherwise, they can become difficult to manage and maintain, eventually conceal or disguise the true meaning of the original fields, and cause you to misunderstand what the data actually represents. This issue will become even clearer when you begin to establish table relationships.
This is a complete interpretation of the field. Composing a field description is extremely beneficial because it forces you (and everyone in the organization) to think carefully about the nature of the data that will be stored in the field. You can be relatively sure that the field requires further refinement if you have difficultly composing a suitable description.
Earlier in the database-design process, you learned a set of guidelines for composing a table description. Similarly, there is a set of guidelines that governs how you compose a proper field description.
Use a statement that accurately identifies the field and clearly states its purpose. The description should supplement the field name in terms of defining what the field represents. It should also state the field's role within the table or its relationship to the table's subject. Here's an example of such a description:
CustCitythe metropolitan area in which a customer resides or conducts business. This is an integral component of a customer's complete address.
Write a clear and succinct statement. The description should be free of confusing sentences or ambiguous phrases. Although the description should be as complete as possible, use the minimum number of words necessary to convey the required information. As you've seen with table descriptions, verbose statements are difficult to read and understand.
Refrain from restating or rephrasing the field name. Neither of these practices does anything to illuminate the identity or purpose of the field. Remember that the purpose of a description is to provide a complete interpretation of the field. Here's an example of a poor description:
CustLast Namethe last name of a customer.
A description is far more useful when you write it in this manner:
CustLast Namethe surname of a customer, whether original or by marriage, that we use in all formal communications and correspondence with that customer.
Avoid using technical jargon, acronyms, or abbreviations. Although some people within the organization will understand these types of idioms, its better for you to use terminology that everyone understands. Remember that a description must be as clear as possible to anyone who reads it. For example, you should avoid this type of statement:
Employee ID Numbera unique number used to identify an employee within the organization. It is a component of the SSP.
The problem with this description is that there is no inherent way to determine the meaning of the acronym SSP. You could resolve this problem by spelling out the complete term, but it would be better for you to restate the purpose of the field.
Do not include implementation-specific information. There's no reason to include the fact that a given field appears on a particular data-entry screen or is used within a specific piece of programming code. This type of information is more appropriate for the implementation phase of the overall database-development process.
Do not make this description dependent upon the description of another field. Each description should be as complete as possible and independent of every other description in the database. Interdependent descriptions introduce unnecessary confusion and can inadvertently obscure the field's true identity and purpose. Avoid using a description such as this:
Item Reorder Levelminimum number of items that must exist for a particular product. (See description for Quantity On Hand).
Do not use examples. As you learned in Chapter 7, using examples in a description is a bad idea because they depend on supplemental information to convey their full meaning. You can ensure that a description is clear and succinct by keeping it absolutely free of examples.
Figure 9.4 shows the General Elements section of a Field Specifications sheet for an EMPLOYEE ID NUMBER field.
This category pertains to the structure of a field. Its elements are expressed in general terms because each RDBMS program implements them in a slightly different manner. Establishing these elements during this phase of the design process helps you ensure consistent field definitions throughout the database and reduces the time it will take you to implement the field structures in an RDBMS program.
This element indicates the nature of the data that the field stores.
In Chapter 1, you learned that Structured Query Language, or SQL, is the standard language used to create, modify, maintain, and query relational databases. SQL is actually a fully documented standard set forth jointly by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO). Although the current version of the standard (as of this writing) is SQL/3, most major RDBMS programs implement much of the previous version, SQL/92.
The SQL standard defines seven major data types, and each data type has one or more uniquely named variations. Here's a brief definition of each data type.
Many RDBMS programs provide additional data types beyond those specified by the standard, which are known as extended data types. Examples of extended data types include MONEY/CURRENCY, BOOLEAN (for True or False values), SERIAL/ROWID (for unique row identifiers), and BYTE/BLOB (for unstructured binary data).
I've presented the SQL standard data types because you will encounter them (or variations thereof ) in practically every RDBMS program. I have not provided much detail on these data types, however, because they are not implemented consistently across all RDBMS programs; you must consult your RDBMS's documentation to determine which data types the RDBMS supports and how the RDBMS implements them.
You can use any of the SQL data types (except Interval ) as the setting for the Data Type element of a given specification. Due to their inconsistent implementation, however, I recommend that you use one of the following general data types as the setting for this element instead.
This data type stores any combination of letters, numbers, keyboard characters, or special characters. Keyboard characters include the comma, dollar sign, exclamation mark, percentage sign, and period. Special characters include the copyright symbol, the trademark symbol, and the symbol for pi.
This data type stores only whole numbers and real numbers. It will not accept numbers with leading zeroes (e.g., 0000234) because they are not genuine numbers.
This data type stores dates, times, or a combination of both.
These data types are quite suitable for indicating the nature of the data that the field stores, and they are certainly much easier for users and management to understand. Using general data types will help you avoid unnecessary confusion, especially when you're reviewing the specification with users and management.
I use these general data types as the basis for all further data type references and examples throughout the remainder of the book.
This element specifies the total number of characters that a user can enter for any given field value. The RDBMS program you use to implement the database will determine the maximum number of characters you can set for this element. Although you can theoretically set the Length element for any data type, you should be aware that some RDBMS programs do not allow you to specify a length for a numeric field. Instead, the RDBMS program sets the length of a numeric field based on the type of number the field stores, such as an integer, a long integer, or a real number.
This denotes the number of digits to the right of the decimal point in a real number. The number of digits determines the real number's precision. For example, many businesses require that all currency values have four digits of precision to the right of the decimal point.
This element indicates the type of characters that a user can enter into a given field value. Setting and enforcing this element helps you ensure that the user cannot introduce meaningless data into the field, thus enhancing field-level integrity.
Let's say you're working with a CUSTSTATE field and its data type is alphanumeric. This data type is appropriate for the field because it allows a user to incorporate letters as part of a given field value. But it also allows him to use numbers, keyboard characters, and extended characters, which means that he can enter a meaningless value into the fieldthere are no state names or state abbreviations that contain characters other than letters. You solve this problem by using the Character Support element to define the characters that the user can incorporate within a field value. (I address the issue of a valid combination of letters in the "Logical Elements" section.)
You can choose to include or exclude any of the following types of characters:
Letters all letters of the alphabet including foreign language letters such as é and ñ.
Numbers 0 through 9.
Keyboard characters any standard character other than letters and numbers, such as asterisk, ampersand, bracket, caret, comma, equals sign, exclamation point, parenthesis, percent sign, period, pound sign, question mark, quote, semicolon, slash, or vertical bar. Note that the Field Specifications sheet includes examples of the characters that belong to this category.
Special characters any character that you can produce only through specific combinations of standard keys and the CTRL, ALT, and SHIFT keys, or with the aid of a special software program. Characters in this category include complex mathematical symbols, the copyright symbol, fractions, the symbol for pi, and the trademark symbol. The Field Specifications sheet includes examples of these characters as well.
This element specifies the manner in which a user should enter data into the field. For example, there are many ways to enter a date, such as "01/01/02," "01-01-02," and "01-Jan-2002." Using an input mask helps you ensure that a user enters values into the field consistently and (in this case) prevents confusion over the meaning of the date sequence.
RDBMS programs implement input masks in various ways, so you should use a relatively generic setting for this element. ( You can assign multiple input masks, if appropriate.) For example, you could use "mm/dd/yy" as the input mask for a date field. This mask indicates the sequence of the date components (month, day, year), the structure of the date (two numbers per component, e.g., 05/16/02), and the date component separator (the slash).
This element governs the appearance of a field's value when it is displayed on a screen or printed within a document. A display format enables you to present the field value in a more meaningful or readable fashion than the manner in which it was entered. For example, "03/13/88" might be the way you enter a given date, but "March 13, 1988" is much easier to read and comprehend.
Use a generic setting for this element, just as you did with the Input Mask; RDBMS programs implement display formats in various ways as well. For example, you can use "Month Day, Year" as a display format for a DATE HIRED field. You can also use a complete sentence to indicate a display format, such as the one in this example of a display format setting for a COMPANY NAME field.
Each word should start with a capital letter.
Figure 9.5 shows the Physical Elements section of a Field Specifications sheet for an EMPLOYEE ID NUMBER field.
This category pertains mainly to the values within a field. Its elements govern matters such as whether each value should be unique, when a value should be entered, whether a value can be edited, and the types of comparisons and operations that can be performed on each value. Setting these elements helps you establish and enforce a large part of field-level integrity.
This element designates a field's role within a table, which you identified as you were establishing a primary key for the table. As you already know, a field can serve as a non-key, a primary key, or an alternate key. In Chapter 10, you'll learn all about foreign keys and when to designate a field as a foreign key on the Field Specifications sheet.
This element denotes whether a field designated as a primary key is acting as a simple (single-field) primary key or as part of a composite (multifield) primary key.
This element indicates whether a field's values are unique. You set it as "Unique" when the Key Type element is set to "Primary"; otherwise, you'll typically set this element as "Non-unique."
When you work with a non-key field, think about how its values are going to be used so that you can determine whether they should be unique. Consider the DEPARTMENTS table structure in Figure 9.6.
In this example, the EMPLOYEE ID NUMBER field identifies the person who manages a particular department. Assuming that a person is allowed to manage only one department at any given time, the values in this field should be unique; therefore, you should set the Uniqueness element for this field as "Unique."
This specifies whether a field accepts null values. "No Nulls" is the setting you'll commonly use for this element, especially when a field serves as a primary key or an alternate key, or when the field's Required Value element is set to "Yes." You can set this element to "Nulls Allowed," however, when there is a valid reason for a field to accept null values. A CUSTCOUNTY field, for example, must accept nulls because a customer may not know the name of the county in which she lives. (Of course, it will no longer be null once she supplies the county name.)
Remember that a null does not represent a blankit represents a missing or unknown value. Users commonly make the mistake of using a blank to represent a meaningful value, such as "None," "Not Applicable," "No Response," and "Not Wanted." If these values are valid for a particular field, then make sure you include them in the Range of Values element for the field. Above all, use nulls judiciously and do not use blanks!
This element indicates the source of a field's values. Either a user will enter values into the field manually or a database application program will enter them automatically; the application program can provide values for the field only if the person who developed the program provided a means for it to generate the values. Note that the setting that represents the database application program is "System."
This denotes whether a user is required to enter a value for a field. Although you'll typically set this element to "No" for most of the fields in a table, you must set it to "Yes" when the field serves as the primary key. You may also need to set Required Value to "Yes" for a field such as CUSTZIPCODEa letter or package you send to a given customer must include a zip code in order for the Postal Service to handle it properly and accurately.
This is a value that a user can enter into a field when a more appropriate value is not yet available and nulls are disallowed. Use a default value very judiciously, and only if it is meaningful. For example, "WA" is a meaningful default value for a CUSTSTATE field when the vast majority of your customers live in Washington state. Conversely, "01/01/96" is not a good default value for a DATE HIRED field because it is a completely arbitrary value that has no real meaning.
This element specifies every possible valid value for a field. You can set this element in various ways, such as with a lower and upper limit (1,000 to 9,999) or with a specific list of values ("WA," "OR," "ID," "MT"). There are three categories under which you can establish a range of values:
Generala complete collection of every possible value for this field. For example, the general range of values for a CUSTSTATE field might include all valid abbreviations for every state in the United States.
Integrity specifica collection of values based on the field's role within a table relationship. (You'll learn all about this category in Chapter 10.)
Business specific a collection of values generated by a particular business requirement. Organizations commonly have various requirements that limit the range of values for a field. In an organization that conducts its business strictly in the Pacific Northwest, for example, the valid range of values for a CUSTSTATE field are "WA," "OR," "ID," and "MT." (You'll learn more about this category in Chapter 11.)
You're concerned only with the general range of values during this stage of the database-design process, and you'll revisit the Range of Values element later when you establish table relationships and business rules.
It's important to note that "Other" and "Miscellaneous" are two values that you do not want to set within any category of the Range of Values element. Both values are nonspecific and absolutely meaningless within this context and are a sign of mental laziness in that their very presence indicates a need to review the field for possible refinement. You can avoid unnecessary confusion and potential problems by refraining from using these values.
This element designates at what point a user can enter a value into a field and whether he can modify that value. You set this element to one of these four options:
Enter Now, Edits Allowed. A user must enter a value for this field when she creates a new record in the field's parent table. She can then edit the value at any time.
Enter Later, Edits Allowed. A user has the option of entering a value for this field when he creates a new record in the field's parent table. This does not imply in any way that the field's value can be null for all time; the user must enter a value for this field at some point in the near future. After he's entered the value, he can then edit it at any time.
Enter Now, Edits Not Allowed. A user must enter a value for this field when she creates a new record in the field's parent table, but she cannot edit it at any time whatsoever.
Enter Later, Edits Not Allowed. A user has the option of entering a value for this field when he creates a new record in the field's parent table. This does not imply in any way that the field's value can be null for all time; the user must enter a value for this field at some point in the near future. After he's entered the value, he cannot edit it at any time whatsoever.
You should use a default value when you set the Edit Rule element to the second or fourth option; this will keep the field's value from being null until such time that the user enters an appropriate value.
This indicates the types of comparisons a user can apply to a given field value when he's retrieving information from the field. There are six types of comparisons: equal to (=), not equal to (|), greater than (>), less than (<), greater than or equal to (>=), and less than or equal to (<=). This element also indicates whether a user can compare a given field value to any of the following:
Another value within the same field. When a field serves as a primary key, this option applies to the values of related foreign key fields. (You'll learn more about this in the next chapter.)
A value of another field within the parent table or from some other table in the database.
A value expression, which is some form of operation involving field values, literal values, or a combination of both. It returns a single value that you can then use for the comparison: (RETAIL PRICE 2.50) is an example of a value expression.
Controlling the types of comparisons a user can apply to the field's values enables you to keep him from making meaningless comparisons. Let's say that he's working with an EMPLOYEE ID NUMBER field based on a numeric data type. Unless you indicate otherwise, he can make a comparison such as this one:
Is an Employee ID Number in the Employees table greater than or equal to an Employee ID Number in the Part-Time Employees table?
Although a "greater than or equal to" comparison is generally acceptable in a numeric field, it is not appropriate in this instance; there is no valid reason for him to make this type of comparison.
Similarly, it would be pointless for him to make a comparison between a given EMPLOYEE ID NUMBER value and the value of another numeric field within the EMPLOYEES table or some other table within the database; therefore, a comparison such as this is invalid:
Is an Employee ID Number in the Employees table greater than or equal to a Quantity On Hand in the Products table?
It is both suitable and reasonable, however, for him to make a comparison between a given EMPLOYEE ID NUMBER value within the EMPLOYEES table and another EMPLOYEE ID NUMBER value within a related data table or related subset table. This comparison, then, is a valid one:
Is an Employee ID Number in the Employees table equal to an Employee ID Number in the Part-Time Employees table?"
There are instances when it is perfectly suitable for the user to compare a particular value of one field to the value of a completely different field. For example, it is totally logical for him to make the following comparison between a DATE SHIPPED field and a DATE ORDERED field:
Is the current value of Date Shipped greater than or equal to the current value of Date Ordered?
It's fortunate that he can make this type of comparisonhe certainly doesn't want the value of DATE SHIPPED to be earlier than the value of DATE ORDERED!
As you set the Comparisons Allowed element for a given field, think about how you're going to use the field's values so that you can designate the appropriate comparisons. It's very likely that you'll review this element later in the design process when you establish table relationships and define business rules.
This element specifies the types of operations that a user can perform on the field's values. There are five types of operations: addition (+), subtraction (), multiplication (x), division (÷), and concatenation. (Obviously, any combination of these operations is valid as well.) This element also indicates whether an operation can incorporate any of the following:
Another value within the same field
A value from another field within the parent table or from some other table in the database
The result of a value expression (which, as you recall, is itself some form of operation involving field values, literal values, or a combination of both, that returns a single value)
You can prevent the user from defining meaningless operations by limiting the types of operations that he can perform on the field's values. Let's consider the EMPLOYEE ID NUMBER, DATE SHIPPED, and DATE ORDERED fields once again. There is no reason for the user to perform mathematical operations on a pair of EMPLOYEE ID NUMBER values within the EMPLOYEES table, nor is there any reason for him to perform such operations using a given EMPLOYEE ID NUMBER value and some other numeric field's value. In the case of the DATE SHIPPED field, however, it is suitable to perform some of these operations using a given DATE SHIPPED value and the value of some other appropriate date field within the database. For example, the user might need to subtract DATE ORDERED from DATE SHIPPED to determine the time that elapsed between the date that the customer placed the order and the date that the items within the order were shipped to the customer.
As you set the Operations Allowed element for a given field, think about how you're going to use the field's values so that you can designate the appropriate operations. It's very likely that you'll review this element later in the design process as you define business rules.
Figure 9.7 shows the Logical Elements section of a Field Specifications sheet for an EMPLOYEE ID NUMBER field.