8.1 Representing Hierarchical Information

Let's look at an example to understand how we can represent hierarchical information in a relational database. As a basis for the example, we'll use an organization chart showing how one employee reports to another within a large organization, as shown in Figure 8-1.

Figure 8-1. An organization chart
figs/mos2_0801.gif


Figure 8-1 represents a hierarchy of employees. The information regarding an employee, his manager, and the reporting relationship need to be represented in one table, employee, as shown in the Entity Relationship Diagram in Figure 8-2.

Figure 8-2. The reporting relationship
figs/mos2_0802.gif


In Figure 8-2, the employee table refers to itself. The column manager_emp_id refers to the emp_id column of the same table. To represent hierarchical data, you need to make use of a relationship such as when one column of a table references another column of the same table. When such a relationship is implemented using a database constraint, it is known as self-referential integrity constraint. The corresponding CREATE TABLE statement will look as follows:

CREATE TABLE employee (

emp_id          NUMBER (4) CONSTRAINT emp_pk PRIMARY KEY,

fname           VARCHAR2 (15) NOT NULL, 

lname           VARCHAR2 (15) NOT NULL, 

dept_id         NUMBER (2) NOT NULL,

manager_emp_id  NUMBER (4) CONSTRAINT emp_fk REFERENCES employee(emp_id),

salary          NUMBER (7,2) NOT NULL,

hire_date       DATE NOT NULL, 

job_id          NUMBER (3));

As a basis for the examples in this chapter, we'll use the following sample data:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date

FROM employee;



   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 

--------- ---------- --------- -------------- --------- --------- 

     7369 SMITH             20           7902       800 17-DEC-80 

     7499 ALLEN             30           7698      1600 20-FEB-81 

     7521 WARD              30           7698      1250 22-FEB-81 

     7566 JONES             20           7839      2000 02-APR-81 

     7654 MARTIN            30           7698      1250 28-SEP-81 

     7698 BLAKE             30           7839      2850 01-MAY-80 

     7782 CLARK             10           7839      2450 09-JUN-81 

     7788 SCOTT             20           7566      3000 19-APR-87 

     7839 KING              10                     5000 17-NOV-81 

     7844 TURNER            30           7698      1500 08-SEP-81 

     7876 ADAMS             20           7788      1100 23-MAY-87 

     7900 JAMES             30           7698       950 03-DEC-81 

     7902 FORD              20           7566      3000 03-DEC-81 

     7934 MILLER            10           7782      1300 23-JAN-82

The employee table has two important aspects:

  • The column manager_emp_id

  • The emp_fk constraint

The column manager_emp_id stores the emp_id of the employee's manager. For example, the manager_emp_id for Smith is 7902, which means that Ford is Smith's manager. The employee King doesn't have a manager_emp_id, which indicates that King is the uppermost employee. To be able to represent the uppermost employee, the manager_emp_id column must be nullable.

There is a foreign key constraint on the manager_emp_id column. This enforces the rule that any value put in the manager_emp_id column must be the emp_id of a valid employee. Such a constraint is not mandatory when representing hierarchical information. However, it is a good practice to define database constraints to enforce such business rules.

Before moving on to the following sections on manipulating hierarchies, we will introduce some hierarchy terminology. The following list defines terms that we'll use often when working with hierarchical data:


Node

A row in a table that represents a specific entry in a hierarchical tree structure. For example, in Figure 8-1 each employee is considered to be a node.


Parent

A node that is one level up in a tree. In Figure 8-1, King is the parent of Blake, and Blake is the parent of Martin. The term parent node is sometimes used in place of just parent.


Child

A node that is one level down in a tree. In Figure 8-1, Blake is a child of King. Blake, in turn, has five children: Allen, Ward, Martin, Turner, and James. The term child node is sometimes used in place of just child.


Root

The uppermost node in a hierarchical structure. The definition of a root is that it has no parent. In Figure 8-1, King is the root. You can have only one root in any given tree, but it's worth noting that you can have multiple trees in a hierarchical table. If our employee table stored information on employees from multiple companies, we would have one root per company. The term root node is sometimes used in place of root.


Leaf

A node with no children, and sometimes called a leaf node. Leaf nodes are the antitheses of root nodes, and represent the lowest levels of a tree structure. The leaf nodes in Figure 8-1 are Adams, Smith, Allen, Ward, Martin, Turner, James, and Miller. Leaf nodes do not all need to be at the same level, but they do need to be without children.


Level

A layer of nodes. In Figure 8-1, King constitutes one level. Jones, Blake, and Clark constitute the next level down, and so forth.