Relational Databases

Relational Databases

The frustration with the inadequate capabilities of network and hierarchical databases resulted in the invention of the relational data model. The relational data model took the idea of the network database some several steps further. Relational models — just like hierarchical and network models — are based upon tables and use parent/child relationships. (Though this relationship was implemented through column values as opposed to a low-level physical pointer defining the relationship; more on that later in the chapter.)

Tables

A table is a basic building unit of the relational database. It is a fairly intuitive way of organizing data and has been around for centuries. A table consists of rows and columns (called records and fields in database jargon). Each table has a unique name in the database (i.e., unique fully qualified name, the one that includes schema or database name as a prefix).

Note 

The Dot (.) notation in a fully qualified name is commonly used in the programming world to describe hierarchy of the objects and their properties. This could refer not only to the database objects but also to the structures, user-defined types, and such. For example, a table field in an MS SQL Server database could be referred to as ACME.DBO.CUSTOMER. CUST_ID_N where ACME is a database name, DBO is the table owner (Microsoft standard), CUSTOMER is the name of the table, and CUST_ID_N is the column name in the CUSTOMER table.

Cross-References 

See Chapter 4 for more on table and other database object names.

Each field has a unique name within the table, and any table must have at least one field. The number of fields per table is usually limited, the actual limitation being dependent on a particular implementation. Unlike legacy database structure, records in a table are not stored or retrieved in any particular order (although, records can be arranged in a particular order by means of using clustered index — discussed in Chapter 4); the task of sorting the record in relational databases systems (RDBMS) is relegated to SQL.

A record thus is composed of a number of cells, where each cell has a unique name and might contain some data. A table that has no records is called an empty table.

Data within the field must be of the same type, for example, the field AMOUNT contains only numbers, and field DESCRIPTION, only words. The set of the data within one field is said to be column's domain.

Note 

Early databases — relational or otherwise — were designed to contain only text data; modern databases store anything that could be converted into binary format: pictures, movies, audio records, and so on.

The good relational design would make sure that such a record describes an entity — another relational database term to be discussed later in the book but worth mentioning here. To put it in other words, the record should not contain irrelevant information: CUSTOMER table deals with the customer information only, its records should not contain information about, say, products that this customer ordered.

Note 

The process of grouping the relevant data together, eliminating redundancies along the way is called normalization and will be discussed in Chapter 2. It is not part of SQL per se, but it does impose limits on the SQL query efficiency.

There is no theoretical limit on the number of rows a table could have, though some implementations impose restrictions; also there are (or at least ought to be) practical considerations to the limits: data retrieval speed, amount of storage, and so on.

Relationships

Tables in RDBMS might or might not be related. As it was mentioned before, RDBMS is built upon parent/child relationship notion (hence the name — relational), but unlike"in legacy databases (hierarchical, network) these relations are based solely on the values in the table columns; these relationships are meaningful in logical terms, not in low-level computer specific pointers. Let's take the example of our fictitious order entry database (the one that we will design, build, and use throughout the book). The ORDER_HEADER table is related to CUSTOMER table since both of these tables have a common set of values: The field ORDHDR_CUSTID_FN (customer ID) in ORDER_HEADER (and its values) corresponds to CUST_ID_N in CUSTOMER. The field CUST_ID_N is said to be a primary key for the CUSTOMER table and a foreign key for the ORDER_HEADER table (under different name).

Primary key

The primary key holds more than one job in RDBMS. We've said already that it is used to define a relationship; but its primary role is to uniquely identify each record in a table.

In the days of legacy databases, the records were always stored in some predefined order; if such an order had to be broken (because somebody had inserted records in a wrong order or business rule was changed), then the whole table (and, most likely, the whole database) had to be rebuilt. The RDBMS abolishes fixed order for the records, but it still needs some mechanism of identifying the records uniquely, and the primary key, based on the idea of a field (or fields) that contains set unique values, serves exactly this purpose.

By it is very nature, the primary key cannot be empty; this means that in a table with defined primary key, the primary key fields must contain data for each record.

Note 

Though it is not a requirement to have a primary key on each and every table, it is considered to be a good practice to have one; in fact, many RDBMS implementations would warn you if you create a table without defining a primary key. Some purists go even further, specifying that the primary key should be meaningless in the sense that they would use some generated unique value (like EMPLOYEE_ID) instead of, say, Social Security numbers (despite that these are unique as well).

A primary key could consist of one or more columns, i.e., though some fields may contain duplicate values, their combination (set) is unique through the entire table. A key that consists of several columns is called a composite key.

Note 

In the world of RDBMS, only tables that have primary keys can be related. Though the primary key is a cornerstone for defining relation in RDBMS, the actual implementations (especially early ones) have not always provided a built-in support for this logical concept. In practice, the task of enforcing uniqueness of a chosen primary key was the responsibility of programmers (requiring them to check for existing values before inserting new records, for example). Today all major relational database products have built-in support for primary keys; on a very basic level this means that the database does its own checking for unique constraint violations and will raise an error whenever an attempt to insert a duplicate record is made.

Foreign key

Let's go back to our CUSTOMER and ORDER_HEADER tables. By now you understand why the CUST_ID_N was designated as a primary key — it has unique value, no customer can possibly have more than one ID, and no ID could be assigned to more than one customer. To track what customers placed which orders, you need something that will provide a link between customers and their orders.

Table ORDER_HEADER has its own primary key — ORDHDR_ID_N which uniquely identifies orders; in addition to that it will have a foreign key ORDHDR_CUSTID_FN field. The values in that field correspond to the values in the CUST_ID_N primary key field for the CUSTOMER table. Note that, unlike the primary key, the foreign key is not required to be unique — one customer could place several orders.

Now, by looking into ORDER_HEADER table you can find which customers placed particular orders. The table ORDER_HEADER became related to table CUSTOMER. It became easy to find a customer based on orders, or find orders for a customer. You no longer need to know database layout, order of the records in the table, or master some low-level proprietary programming language to query data; it's now possible to run ad-hoc queries formulated in standard English-like language — the Structured Query Language.

Invasion of RDBMS

In spite of the clear advantages of the relational database model, it took some time for it to become workable. One of the main reasons was the hardware. The logically clear and clean model proved to be quite a task to implement, and even then it required much more in terms of memory and processing power than legacy databases.

The development of relational databases was driven by the need of the medium to big businesses to gather, preserve, and analyze data. In 1965, Gordon Moore, the cofounder of Intel, made his famous observation that the number of transistors per square inch on the integrated circuits (IC) doubles every year ever since the IC were invented. Surprisingly, this rule still holds true. More powerful machines made it feasible to implement and sell RDBMS; cheap memory and powerful processors made them fast; perpetually growing appetites for information made RDBMS products a commodity, drastically cutting their price down. Today, according to some estimates, less than 10 percent of the market is being held by the database legacy "dinosaurs" — mostly because of significant investment made by their owners more than 20 years ago. For better or for worse, relational database systems have come to rule on planet Earth.