8.4 Complex Hierarchy Operations

Using Oracle's hierarchical SQL extensions, you can perform complex, hierarchical queries much more easily than you would be able to do using standard, ANSI SQL.

8.4.1 Finding the Number of Levels

Previously we showed how the LEVEL pseudocolumn generates a level number for each record when we use the START WITH . . . CONNECT BY clause. You can use the following query to determine the number of levels in the hierarchy by finding the maximum level number returned by the LEVEL pseudocolumn:

SELECT MAX(LEVEL) 

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id;



MAX(LEVEL)

----------

         4

To determine the number of employees at each level, group the results by LEVEL and count the number of employees in each distinct group. For example:

SELECT LEVEL, COUNT(emp_id) 

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id

GROUP BY LEVEL;



    LEVEL COUNT(EMP_ID)

--------- -------------

        1             1

        2             3

        3             8

        4             2

8.4.2 Listing Records in Hierarchical Order

One of the very common programming challenges SQL programmers face is to list records in a hierarchy in their proper hierarchical order. For example, you might wish to list employees with their subordinates underneath them, as in the following query:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || lname "Employee",

       emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id;



    LEVEL Employee        EMP_ID MANAGER_EMP_ID 

--------- ------------ --------- -------------- 

        1 KING              7839                

        2   JONES           7566           7839 

        3     SCOTT         7788           7566 

        4       ADAMS       7876           7788 

        3     FORD          7902           7566 

        4       SMITH       7369           7902 

        2   BLAKE           7698           7839 

        3     ALLEN         7499           7698 

        3     WARD          7521           7698 

        3     MARTIN        7654           7698 

        3     TURNER        7844           7698 

        3     JAMES         7900           7698 

        2   CLARK           7782           7839 

        3     MILLER        7934           7782 



14 rows selected.

Notice that by using the expression LPAD(' ',2*(LEVEL - 1)), we are able to align employee names in a manner that corresponds to their level. As the level number increases, the number of spaces returned by the expression increases, and the employee name is further indented.

The previous query lists all the employees in the employee table. If you want to filter out certain employees based on some condition, then you can use a WHERE clause in your hierarchical query. Here is an example:

SELECT LEVEL, LPAD('    ',2*(LEVEL - 1)) || lname "Employee",

       emp_id, manager_emp_id, salary

FROM employee

WHERE salary > 2000

START WITH manager_emp_id IS NULL

CONNECT BY manager_emp_id = PRIOR emp_id;



    LEVEL Employee        EMP_ID MANAGER_EMP_ID     SALARY

--------- ------------ --------- --------------  ---------

        1 KING              7839                      5000

        3     SCOTT         7788           7566       3000

        3     FORD          7902           7566       3000

        2   BLAKE           7698           7839       2850

        2   CLARK           7782           7839       2450

This query lists records with salary > 2000. The WHERE clause restricts the rows returned by the query without affecting other rows in the hierarchy. In our example, the WHERE condition filtered JONES out of the result, but the employees below JONES in the hierarchy (SCOTT and FORD) are not filtered out, and are still indented as they were when JONES was present. The WHERE clause must come before the START WITH . . . CONNECT BY clause in a hierarchical query; otherwise, you'll get a syntax error.

Though the WHERE clause comes before the START WITH . . . CONNECT BY construct, the filtering happens after the complete hierarchy tree is built.


As discussed earlier, the START WITH clause is optional?i.e., you can have a CONNECT BY without a START WITH. When the START WITH clause is missing, effectively the query doesn't specify where to start building the hierarchy. In that situation, each row of the table is considered a root, and a hierarchy is built for each row. For example:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || lname "Employee",

       emp_id, manager_emp_id, salary

FROM employee

CONNECT BY manager_emp_id = PRIOR emp_id;



LEVEL Employee                 EMP_ID MANAGER_EMP_ID     SALARY

----- -------------------- ---------- -------------- ----------

    1 SCOTT                      7788           7566       3000

    2   ADAMS                    7876           7788       1100

    1 FORD                       7902           7566       3000

    2   SMITH                    7369           7902        800

    1 ALLEN                      7499           7698       1600

    1 WARD                       7521           7698       1250

    1 JAMES                      7900           7698        950

    1 TURNER                     7844           7698       1500

    1 MARTIN                     7654           7698       1250

    1 MILLER                     7934           7782       1300

    1 ADAMS                      7876           7788       1100

    1 JONES                      7566           7839       2000

    2   SCOTT                    7788           7566       3000

    3     ADAMS                  7876           7788       1100

    2   FORD                     7902           7566       3000

    3     SMITH                  7369           7902        800

    1 CLARK                      7782           7839       2450

    2   MILLER                   7934           7782       1300

    1 BLAKE                      7698           7839       2850

    2   ALLEN                    7499           7698       1600

    2   WARD                     7521           7698       1250

    2   JAMES                    7900           7698        950

    2   TURNER                   7844           7698       1500

    2   MARTIN                   7654           7698       1250

    1 SMITH                      7369           7902        800

    1 KING                       7839                      5000

    2   JONES                    7566           7839       2000

    3     SCOTT                  7788           7566       3000

    4       ADAMS                7876           7788       1100

    3     FORD                   7902           7566       3000

    4       SMITH                7369           7902        800

    2   CLARK                    7782           7839       2450

    3     MILLER                 7934           7782       1300

    2   BLAKE                    7698           7839       2850

    3     ALLEN                  7499           7698       1600

    3     WARD                   7521           7698       1250

    3     JAMES                  7900           7698        950

    3     TURNER                 7844           7698       1500

    3     MARTIN                 7654           7698       1250



39 rows selected.

This example returns the hierarchy tree for each row in the table. In the organization tree under KING, SCOTT is at level 3; however, in the organization tree under JONES, SCOTT is at level 2, and under the organization tree headed by himself, SCOTT is at level 1.

8.4.3 Checking for Ascendancy

Another common operation on hierarchical data is to check for ascendancy. In an organization chart, you may ask whether one employee has authority over another. For example: "Does JONES have any authority over BLAKE?" To find out, you need to search for BLAKE in the subtree headed by JONES. If you find BLAKE in the subtree, then you know that BLAKE either directly or indirectly reports to JONES. If you don't find BLAKE in the subtree, then you know that JONES doesn't have any authority over BLAKE. The following query searches for BLAKE in the subtree headed by JONES:

SELECT * 

FROM employee

WHERE lname = 'BLAKE'

START WITH lname = 'JONES'

CONNECT BY manager_emp_id = PRIOR emp_id;



no rows selected

The START WITH . . . CONNECT BY clause in this example generates the subtree headed by JONES, and the WHERE clause filters this subtree to find BLAKE. As you can see, no rows are returned. This means that BLAKE was not found in JONES's subtree, so you know that JONES has no authority over BLAKE. Let's take a look at another example that produces positive results. This time we'll check to see whether JONES has any authority over SMITH:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date 

FROM employee

WHERE lname = 'SMITH'

START WITH lname = 'JONES'

CONNECT BY manager_emp_id = PRIOR emp_id;



    EMP_ID LNAME         DEPT_ID MANAGER_EMP_ID     SALARY HIRE_DATE 

---------- ---------- ---------- -------------- ---------- --------- 

      7369 SMITH              20           7902        800 17-DEC-80

This time, SMITH was found in the list of employees in JONES's subtree, so you know that at some level JONES has management authority over SMITH.

8.4.4 Deleting a Subtree

Let's assume that the organization we are dealing with splits, and JONES and all his subordinates form a new company. Therefore, we don't need to maintain JONES and his subordinates in our employee table. Furthermore, we need to delete the entire subtree headed by JONES, as shown in Figure 8-1, from our table. We can do this by using a subquery as in the following example:

DELETE FROM employee

WHERE emp_id IN 

(SELECT emp_id FROM employee

START WITH lname = 'JONES'

CONNECT BY manager_emp_id = PRIOR emp_id);



5 rows deleted.

In this example, the subquery generates the subtree headed by JONES, and returns the emp_ids of the employees in that subtree, including JONES's. The outer query then deletes the records with these emp_id values from the employee table.

8.4.5 Listing Multiple Root Nodes

An interesting variation on the problem of listing the root node of a hierarchy is to find and list the root nodes from several hierarchies that are all stored in the same table. For example, you might consider department managers to represent root nodes, and you might further wish to list all department managers found in the employee table.

There are no constraints on the employees belonging to any department. However, you can assume that if A reports to B and B reports to C, and A and C belong to the same department, then B also belongs to the same department. If an employee's manager belongs to another department, then that employee is the uppermost employee, or manager, of his department.

Therefore, to find the uppermost employee in each department, you need to search the tree for those employees whose managers belong to a different department than their own. You can do that using the following query:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date 

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY manager_emp_id = PRIOR emp_id

AND dept_id != PRIOR dept_id;



EMP_ID LNAME     DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE 

------ -------- -------- -------------- ------ --------- 

  7839 KING           10                  5000 17-NOV-81 

  7566 JONES          20           7839   2000 02-APR-81 

  7698 BLAKE          30           7839   2850 01-MAY-80

In this example, the extra condition (dept_id != PRIOR dept_id) added to the CONNECT BY clause restricts the output to only those employees whose managers belong to a different department than their own.

8.4.6 Listing the Top Few Levels of a Hierarchy

Another common task in dealing with hierarchical data is listing the top few levels of a hierarchy tree. For example, you may want to list top management employees in an organization. Let's assume that the top two levels in our organization chart constitute top management. You can then use the LEVEL pseudocolumn to identify those employees, as in the following example:

SELECT emp_id, lname, dept_id, manager_emp_id, salary, hire_date 

FROM employee

WHERE LEVEL <= 2 

START WITH manager_emp_id IS NULL

CONNECT BY manager_emp_id = PRIOR emp_id;



EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE 

------ --------- ---------- -------------- ------ --------- 

  7839 KING              10                  5000 17-NOV-81 

  7566 JONES             20           7839   2000 02-APR-81 

  7698 BLAKE             30           7839   2850 01-MAY-80 

  7782 CLARK             10           7839   2450 09-JUN-81

In this example, the LEVEL <= 2 condition in the WHERE clause restricts the results to only those employees in the top two levels of the organization chart.

8.4.7 Aggregating a Hierarchy

Another challenging requirement is to aggregate a hierarchy. For example, you may want to sum the salaries of all employees reporting to a specific employee. Or, you may want to consider each employee as a root, and for each employee print out the sum of the salaries of all subordinate employees.

The first problem is relatively simple. Earlier we described how to select a subtree headed by an employee. You can easily sum the salaries of all employees in such a subtree. For example:

SELECT SUM(salary) 

FROM employee

START WITH lname = 'JONES'

CONNECT BY manager_emp_id = PRIOR emp_id;



SUM(SALARY)

-----------

      9900

The START WITH lname = 'JONES' clause generates the subtree headed by JONES, and the SUM(salary) expression sums the salary of employees in this subtree.

The second problem, a seemingly simple extension of the first, is relatively complex. You want to consider each employee as a root, and for each employee you want to sum the salaries of all employees in its subtree. In essence, you want to repeat the previous query for each employee in the table. The following SQL uses an inline view to achieve this:

SELECT t2.lname, t2.salary,

(SELECT SUM(t1.salary) FROM employee t1

START WITH t1.lname = t2.lname

CONNECT BY t1.manager_emp_id = PRIOR t1.emp_id) sum_salary

FROM employee t2;



LNAME                    SALARY SUM_SALARY

-------------------- ---------- ----------

SMITH                       800        800

ALLEN                      1600       1600

WARD                       1250       1250

JONES                      2000       9900

MARTIN                     1250       1250

BLAKE                      2850       9400

CLARK                      2450       3750

SCOTT                      3000       4100

KING                       5000      28050

TURNER                     1500       1500

ADAMS                      1100       1100

JAMES                       950        950

FORD                       3000       3800

MILLER                     1300       1300



14 rows selected.

In this example, the START WITH . . . CONNECT BY clause in the inline view generates a subtree for each employee. The inline view executes once for every row in the outer employee table. For each row in the outer employee table, the inline view generates a subtree headed by this employee, and returns the sum of salaries for all the employees in this subtree to the main query.

The result set provides two numbers for each employee. The first number, salary, is the employee's own salary. The second number, sum_salary, is the sum of the salaries of all employees under him (including himself/herself). Often programmers resort to PL/SQL to solve this type of problem. However, this query, which combines the power of hierarchical queries with that of inline views, solves the problem in a much more concise and elegant way.

8.4.8 Ordering Hierarchical Data

Sorting the results from a hierarchical query is a more interesting problem than it first may sound. A hierarchical query with a START WITH . . . CONNECT BY . . . construct displays the results in an arbitrary order, as shown in the following example:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || lname "EMPLOYEE", 

       emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id;



     LEVEL EMPLOYEE                 EMP_ID MANAGER_EMP_ID

---------- -------------------- ---------- --------------

         1 KING                       7839

         2   JONES                    7566           7839

         3     SCOTT                  7788           7566

         4       ADAMS                7876           7788

         3     FORD                   7902           7566

         4       SMITH                7369           7902

         2   BLAKE                    7698           7839

         3     ALLEN                  7499           7698

         3     WARD                   7521           7698

         3     MARTIN                 7654           7698

         3     TURNER                 7844           7698

         3     JAMES                  7900           7698

         2   CLARK                    7782           7839

         3     MILLER                 7934           7782

As always, you can use an ORDER BY clause to order the result rows in the way you want. However, in the case of a hierarchical query, an ORDER BY clause can destroy the hierarchical nature of the data returned by the query. This is shown in the following example, which orders the results by last name:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || lname "EMPLOYEE", 

       emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id

ORDER BY lname;



     LEVEL EMPLOYEE                 EMP_ID MANAGER_EMP_ID

---------- -------------------- ---------- --------------

         4       ADAMS                7876           7788

         3     ALLEN                  7499           7698

         2   BLAKE                    7698           7839

         2   CLARK                    7782           7839

         3     FORD                   7902           7566

         3     JAMES                  7900           7698

         2   JONES                    7566           7839

         1 KING                       7839

         3     MARTIN                 7654           7698

         3     MILLER                 7934           7782

         3     SCOTT                  7788           7566

         4       SMITH                7369           7902

         3     TURNER                 7844           7698

         3     WARD                   7521           7698

As you can see from this output, it is impossible to identify the hierarchical relationship between the rows. To resolve this problem, you can use the SIBLINGS (in Oracle9i and later) keyword in the ORDER BY clause, to order the hierarchical data while at the same time preserving the hierarchy. Oracle does this by sorting at each level while ensuring that child nodes remain underneath their parents. For example:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || lname "EMPLOYEE", 

       emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id

ORDER SIBLINGS BY lname;



     LEVEL EMPLOYEE                 EMP_ID MANAGER_EMP_ID

---------- -------------------- ---------- --------------

         1 KING                       7839

         2   BLAKE                    7698           7839

         3     ALLEN                  7499           7698

         3     JAMES                  7900           7698

         3     MARTIN                 7654           7698

         3     TURNER                 7844           7698

         3     WARD                   7521           7698

         2   CLARK                    7782           7839

         3     MILLER                 7934           7782

         2   JONES                    7566           7839

         3     FORD                   7902           7566

         4       SMITH                7369           7902

         3     SCOTT                  7788           7566

         4       ADAMS                7876           7788

In this example's output, BLAKE, CLARK, and JONES are siblings, and they are displayed in ascending order. So are BLAKE's children: ALLEN, JAMES, MARTIN, TURNER, and WARD.

8.4.9 Finding the Path to a Node

You can list the entire path of a given node starting from the root node using the SYS_CONNECT_BY_PATH function (in Oracle9i and later). This function takes two arguments: a column name and a character string. The function then returns a list containing each value of the column from the root node to the current node, separating values by the character string you provide. For example:

SELECT SYS_CONNECT_BY_PATH(lname, '#')

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id;



SYS_CONNECT_BY_PATH(LNAME,'#')

---------------------------------

#KING

#KING#JONES

#KING#JONES#SCOTT

#KING#JONES#SCOTT#ADAMS

#KING#JONES#FORD

#KING#JONES#FORD#SMITH

#KING#BLAKE

#KING#BLAKE#ALLEN

#KING#BLAKE#WARD

#KING#BLAKE#MARTIN

#KING#BLAKE#TURNER

#KING#BLAKE#JAMES

#KING#CLARK

#KING#CLARK#MILLER

The preceding query lists the full organizational path for each employee starting at the top. For example, #KING#JONES#FORD#SMITH shows the complete reporting relation of SMITH in the organization.

To understand the usefulness of the SYS_CONNECT_BY_PATH function, think of a trail in a park. The branches of such a trail are illustrated in Figure 8-3.

Figure 8-3. Trails
figs/mos2_0803.gif


The various points in the trail, and the distance between them is stored in a table, trail:

CREATE TABLE trail (

  start_point CHAR,

  end_point   CHAR,

  distance    NUMBER

);



INSERT INTO trail VALUES ('A','B', 3);

INSERT INTO trail VALUES ('A','C', 2.5);

INSERT INTO trail VALUES ('B','D', 2);

INSERT INTO trail VALUES ('B','E', 1.5);

INSERT INTO trail VALUES ('C','F', 2.5);

INSERT INTO trail VALUES ('C','G', 2.5);

INSERT INTO trail VALUES ('G','H', 3.5);

INSERT INTO trail VALUES ('E','J', 1.5);

COMMIT;

You need to find the total distance of each point in the trail from the starting point "A." The following query uses SYS_CONNECT_BY_PATH to print the distance of each point concatenated with the distances of each of its ancestors in the tree:

SELECT end_point, 

       SUBSTR(SYS_CONNECT_BY_PATH(distance,'+'),2) total_distance

FROM trail

START WITH start_point = 'A'

CONNECT BY start_point = PRIOR end_point;



E TOTAL_DISTANCE

- --------------------

B 3

D 3+2

E 3+1.5

J 3+1.5+1.5

C 2.5

F 2.5+2.5

G 2.5+2.5

H 2.5+2.5+3.5

The SUBSTR function takes out the first "+" in the query's output. Now, each of the total_distance expressions, one for each point, can be evaluated to compute the total distance. One way to evaluate such expressions is to write an eval function, as shown in the following code:

CREATE OR REPLACE FUNCTION eval (exp IN VARCHAR2) RETURN NUMBER IS

   result NUMBER;

BEGIN

     EXECUTE IMMEDIATE 'SELECT ' || exp || ' FROM DUAL' INTO result;

     RETURN result;

EXCEPTION

     WHEN OTHERS THEN

     RETURN NULL;

END;

/

The following example uses the eval function to compute the total distance of each point in the trail from the starting point A:

SELECT end_point, 

       eval(SUBSTR(SYS_CONNECT_BY_PATH(distance,'+'),2)) total_distance

FROM trail

START WITH start_point = 'A'

CONNECT BY start_point = PRIOR end_point;



E TOTAL_DISTANCE

- --------------

B              3

D              5

E            4.5

J              6

C            2.5

F              5

G              5

H            8.5

From this output, it is easy to figure out how far each point is in the trail from the starting point "A."