Understanding Normalization

Normalization is simply a set of rules that will ultimately make your life easier when you're acting as a database administrator. It's the art of organizing your database in such a way that your tables are related where appropriate and flexible for future growth.

The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form.

Throughout this hour, you'll learn about each rule in the first, second, and third normal forms and, we hope, will follow them as you create your own applications. You'll be using a sample set of tables for a students and courses database and taking it to the third normal form.

Problems with the Flat Table

Before launching into the first normal form, you have to start with something that needs to be fixed. In the case of a database, it's the flat table. A flat table is like a spreadsheet?it has many, many columns. There are no relationships between multiple tables; all the data you could possibly want is right there in that flat table. This scenario is inefficient and consumes more physical space on your hard drive than a normalized database.

In your students and courses database, assume you have the following fields in your flat table:

  • StudentName? The name of the student.

  • CourseID1? The ID of the first course taken by the student.

  • CourseDescription1? The description of the first course taken by the student.

  • CourseInstructor1? The instructor of the first course taken by the student.

  • CourseID2? The ID of the second course taken by the student.

  • CourseDescription2? The description of the second course taken by the student.

  • CourseInstructor2? The instructor of the second course taken by the student.

  • Repeat CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes students can take during their academic career.

With what you've learned so far, you should be able to identify the first problem area: CourseID, CourseDescription, and CourseInstructor columns are repeated groups.

Eliminating redundancy is the first step in normalization, so next you'll take this flat table to first normal form. If your table remained in its flat format, you could have a lot of unclaimed space, and a lot of space being used unnecessarily?not an efficient table design.

First Normal Form

The rules for the first normal form are as follows:

  • Eliminate repeating information.

  • Create separate tables for related data.

If you think about the flat table design, with many repeated sets of fields for the students and courses database, you can identify two distinct topics: students and courses. Taking your students and courses database to the first normal form would mean that you create two tables: one for students and one for courses, as shown in Figure 14.9.

Figure 14.9. Breaking the flat table into two tables.


Your two tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they want and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that existed in the flat table.

The next step is to put the tables into second normal form.

Second Normal Form

The rule for the second normal form is

  • No non-key attributes depend on a portion of the primary key.

In plain English, this means that if fields in your table are not entirely related to a primary key, you have more work to do. In the students and courses example, you need to break out the courses into their own table and modify the students_courses table. CourseID, CourseDesc, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see this new design in Figure 14.10.

Figure 14.10. Taking your tables to second normal form.


This structure should look familiar to you as a many-to-many relationship using an intermediary mapping table. The third normal form is the last form we'll look at, and you'll find it's just as simple to understand as the first two.

Third Normal Form

The rule for the third normal form is

  • No attributes depend on other non-key attributes.

This rule simply means that you need to look at your tables and see whether you have more fields that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answer?instructors. Inevitably, an instructor will teach more than one class. However, CourseInstructor is not a key of any sort. So, if you break out this information and create a separate table purely for the sake of efficiency and maintenance (as shown in Figure 14.11), that's the third normal form.

Figure 14.11. Taking your tables to third normal form.


Third normal form is usually adequate for removing redundancy and allowing for flexibility and growth. The next section will give you some pointers for the thought process involved in database design and where it fits in the overall design process of your application.

    Part III: Getting Involved with the Code