8.2 Simple Hierarchy Operations

The processes for extracting some types of information from a table storing hierarchical data are relatively simple, and can be performed using the techniques that we have discussed so far in this book. Extracting more complex information requires using some new SQL constructs, which we'll discuss in Section 8.3. In this section, we'll discuss the hierarchy operations that can be performed using what you've learned so far.

8.2.1 Finding Root Nodes

Finding the root nodes of a hierarchy tree is easy; look for the nodes with no parents. You may have more than one hierarchy in a table, and consequently more than one root node. In the employee table we discussed earlier, the value for manager_emp_id is NULL for the uppermost employee, and only for the uppermost employee. The following query searches for cases where manager_emp_id is NULL, thereby returning the root node:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date 

FROM employee 

WHERE manager_emp_id IS NULL;



   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 

--------- ---------- --------- -------------- --------- --------- 

     7839 KING              10                     5000 17-NOV-81

Because the manager_emp_id column defines the hierarchy, it's important that it always contain correct data. While populating data in this table, we must make sure to specify a manager_emp_id for every row other than the row for the uppermost employee. The uppermost employee doesn't report to anyone (doesn't have a manager), and hence manager_emp_id is not applicable for him. If we leave out manager_emp_id values for employees that do have managers, those employees will erroneously show up as root nodes.

8.2.2 Finding a Node's Immediate Parent

You may wish to link nodes to their immediate parents. For example, you might want to print a report showing each employee's manager. The name of each employee's manager can be derived by joining the employee table to itself. This type of join is a self join (discussed in Chapter 3). The following query returns the desired result:

SELECT e.lname "Employee", m.lname "Manager"

FROM employee e, employee m

WHERE e.manager_emp_id = m.emp_id;



Employee   Manager

---------- ----------

SMITH      FORD

ALLEN      BLAKE

WARD       BLAKE

JONES      KING

MARTIN     BLAKE

BLAKE      KING

CLARK      KING

SCOTT      JONES

TURNER     BLAKE

ADAMS      SCOTT

JAMES      BLAKE

FORD       JONES

MILLER     CLARK



13 rows selected.

Note this query results in only 13 rows, although the employee table has 14 rows:

SELECT COUNT(*) FROM employee;



  COUNT(*)

----------

        14

The reason that only 13 rows are returned from the self join is simple. Our query lists employees and their managers. But since the uppermost employee KING doesn't have any manager, that row is not produced in the output. If you want all the employees to be produced in the result, you need an outer join, as in the following example:

SELECT e.lname "Employee", m.lname "Manager"

FROM employee e LEFT OUTER JOIN employee m

ON e.manager_emp_id = m.emp_id ;



Employee             Manager

-------------------- --------------------

SMITH                FORD

ALLEN                BLAKE

WARD                 BLAKE

JONES                KING

MARTIN               BLAKE

BLAKE                KING

CLARK                KING

SCOTT                JONES

KING

TURNER               BLAKE

ADAMS                SCOTT

JAMES                BLAKE

FORD                 JONES

MILLER               CLARK



14 rows selected.

Outer joins were discussed in detail in Chapter 3.

8.2.3 Finding Leaf Nodes

The opposite problem from finding root nodes, which have no parents, is to find leaf nodes, which have no children. Employees who do not manage anyone are the leaf nodes in the hierarchy tree shown in Figure 8-1. At first glance, the following query seems like it should list all employees from the employee table who are not managers of any other employee:

SELECT * FROM employee

WHERE emp_id NOT IN (SELECT manager_emp_id FROM employee);

However, when you execute this statement, you will see "No rows selected." Why? It is because the manager_emp_id column contains a NULL value in one row (for the uppermost employee), and NULLs can't be compared to any data value. Therefore, to get the employees who don't manage anyone, you need to rewrite the query as follows:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date 

FROM employee e

WHERE emp_id NOT IN 

(SELECT manager_emp_id FROM employee 

WHERE manager_emp_id IS NOT NULL);



   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 

--------- ---------- --------- -------------- --------- --------- 

     7369 SMITH             20           7902       800 17-DEC-80 

     7499 ALLEN             30           7698      1600 20-FEB-81 

     7521 WARD              30           7698      1250 22-FEB-81 

     7654 MARTIN            30           7698      1250 28-SEP-81 

     7844 TURNER            30           7698      1500 08-SEP-81 

     7876 ADAMS             20           7788      1100 23-MAY-87 

     7900 JAMES             30           7698       950 03-DEC-81 

     7934 MILLER            10           7782      1300 23-JAN-82 



8 rows selected.

In this example, the subquery returns the emp_id's of all the managers. The outer query then returns all the employees, except the ones returned by the subquery. This query can also be written as a correlated subquery using EXISTS instead of IN:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date 

FROM employee e

WHERE NOT EXISTS 

(SELECT emp_id FROM employee e1 WHERE e.emp_id = e1.manager_emp_id);



   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 

--------- ---------- --------- -------------- --------- --------- 

     7369 SMITH             20           7902       800 17-DEC-80 

     7499 ALLEN             30           7698      1600 20-FEB-81 

     7521 WARD              30           7698      1250 22-FEB-81 

     7654 MARTIN            30           7698      1250 28-SEP-81 

     7844 TURNER            30           7698      1500 08-SEP-81 

     7876 ADAMS             20           7788      1100 23-MAY-87 

     7900 JAMES             30           7698       950 03-DEC-81 

     7934 MILLER            10           7782      1300 23-JAN-82 



8 rows selected.

The correlated subquery checks each employee to see whether he is the manager of any other employee. If not, then that particular employee is included in the result set.

Oracle Database 10g introduces a pseudocolumn, CONNECT_BY_ISLEAF, which you can also use to identify leaf nodes. We discuss this pseudocolumn in Section 8.6.