The processes for extracting some types of information from a table storing hierarchical data are relatively simple, and can be performed using the techniques that we have discussed so far in this book. Extracting more complex information requires using some new SQL constructs, which we'll discuss in Section 8.3. In this section, we'll discuss the hierarchy operations that can be performed using what you've learned so far.
Finding the root nodes of a hierarchy tree is easy; look for the nodes with no parents. You may have more than one hierarchy in a table, and consequently more than one root node. In the employee table we discussed earlier, the value for manager_emp_id is NULL for the uppermost employee, and only for the uppermost employee. The following query searches for cases where manager_emp_id is NULL, thereby returning the root node:
SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date FROM employee WHERE manager_emp_id IS NULL; EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE --------- ---------- --------- -------------- --------- --------- 7839 KING 10 5000 17-NOV-81
Because the manager_emp_id column defines the hierarchy, it's important that it always contain correct data. While populating data in this table, we must make sure to specify a manager_emp_id for every row other than the row for the uppermost employee. The uppermost employee doesn't report to anyone (doesn't have a manager), and hence manager_emp_id is not applicable for him. If we leave out manager_emp_id values for employees that do have managers, those employees will erroneously show up as root nodes.
You may wish to link nodes to their immediate parents. For example, you might want to print a report showing each employee's manager. The name of each employee's manager can be derived by joining the employee table to itself. This type of join is a self join (discussed in Chapter 3). The following query returns the desired result:
SELECT e.lname "Employee", m.lname "Manager" FROM employee e, employee m WHERE e.manager_emp_id = m.emp_id; Employee Manager ---------- ---------- SMITH FORD ALLEN BLAKE WARD BLAKE JONES KING MARTIN BLAKE BLAKE KING CLARK KING SCOTT JONES TURNER BLAKE ADAMS SCOTT JAMES BLAKE FORD JONES MILLER CLARK 13 rows selected.
Note this query results in only 13 rows, although the employee table has 14 rows:
SELECT COUNT(*) FROM employee; COUNT(*) ---------- 14
The reason that only 13 rows are returned from the self join is simple. Our query lists employees and their managers. But since the uppermost employee KING doesn't have any manager, that row is not produced in the output. If you want all the employees to be produced in the result, you need an outer join, as in the following example:
SELECT e.lname "Employee", m.lname "Manager" FROM employee e LEFT OUTER JOIN employee m ON e.manager_emp_id = m.emp_id ; Employee Manager -------------------- -------------------- SMITH FORD ALLEN BLAKE WARD BLAKE JONES KING MARTIN BLAKE BLAKE KING CLARK KING SCOTT JONES KING TURNER BLAKE ADAMS SCOTT JAMES BLAKE FORD JONES MILLER CLARK 14 rows selected.
Outer joins were discussed in detail in Chapter 3.
The opposite problem from finding root nodes, which have no parents, is to find leaf nodes, which have no children. Employees who do not manage anyone are the leaf nodes in the hierarchy tree shown in Figure 8-1. At first glance, the following query seems like it should list all employees from the employee table who are not managers of any other employee:
SELECT * FROM employee WHERE emp_id NOT IN (SELECT manager_emp_id FROM employee);
However, when you execute this statement, you will see "No rows selected." Why? It is because the manager_emp_id column contains a NULL value in one row (for the uppermost employee), and NULLs can't be compared to any data value. Therefore, to get the employees who don't manage anyone, you need to rewrite the query as follows:
SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date FROM employee e WHERE emp_id NOT IN (SELECT manager_emp_id FROM employee WHERE manager_emp_id IS NOT NULL); EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE --------- ---------- --------- -------------- --------- --------- 7369 SMITH 20 7902 800 17-DEC-80 7499 ALLEN 30 7698 1600 20-FEB-81 7521 WARD 30 7698 1250 22-FEB-81 7654 MARTIN 30 7698 1250 28-SEP-81 7844 TURNER 30 7698 1500 08-SEP-81 7876 ADAMS 20 7788 1100 23-MAY-87 7900 JAMES 30 7698 950 03-DEC-81 7934 MILLER 10 7782 1300 23-JAN-82 8 rows selected.
In this example, the subquery returns the emp_id's of all the managers. The outer query then returns all the employees, except the ones returned by the subquery. This query can also be written as a correlated subquery using EXISTS instead of IN:
SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date FROM employee e WHERE NOT EXISTS (SELECT emp_id FROM employee e1 WHERE e.emp_id = e1.manager_emp_id); EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE --------- ---------- --------- -------------- --------- --------- 7369 SMITH 20 7902 800 17-DEC-80 7499 ALLEN 30 7698 1600 20-FEB-81 7521 WARD 30 7698 1250 22-FEB-81 7654 MARTIN 30 7698 1250 28-SEP-81 7844 TURNER 30 7698 1500 08-SEP-81 7876 ADAMS 20 7788 1100 23-MAY-87 7900 JAMES 30 7698 950 03-DEC-81 7934 MILLER 10 7782 1300 23-JAN-82 8 rows selected.
The correlated subquery checks each employee to see whether he is the manager of any other employee. If not, then that particular employee is included in the result set.
|