In the previous section, you saw how to perform some operations on a hierarchical tree by using simple SQL techniques. Operations such as traversing a tree, finding levels, etc., require more complex SQL statements, and also require the use of features designed specifically for working with hierarchical data. Oracle provides some extensions to ANSI SQL to facilitate these operations. But before looking at the Oracle SQL extensions, let's look at how you can traverse a tree using ANSI SQL, and at the problems you'll encounter when doing that.
Say you want to list each employee with his manager. Using regular Oracle SQL, you can perform self outer-joins on the employee table, as shown here:
SELECT e_top.lname, e_2.lname, e_3.lname, e_4.lname FROM employee e_top LEFT OUTER JOIN employee e_2 ON e_top.emp_id = e_2.manager_emp_id LEFT OUTER JOIN employee e_3 ON e_2.emp_id = e_3.manager_emp_id LEFT OUTER JOIN employee e_4 ON e_3.emp_id = e_4.manager_emp_id WHERE e_top.manager_emp_id IS NULL; LNAME LNAME LNAME LNAME -------------------- -------------------- -------------------- ------ KING JONES FORD SMITH KING JONES SCOTT ADAMS KING BLAKE TURNER KING BLAKE ALLEN KING BLAKE WARD KING CLARK MILLER KING BLAKE MARTIN KING BLAKE JAMES 8 rows selected.
The query returns eight rows, corresponding to the eight branches of the tree. To get those results, the query performs a self join on four instances of the employee table. Four employee table instances are needed in this statement because there are four levels to the hierarchy. Each level is represented by one copy of the employee table. The outer join is required because one employee (KING) has a NULL value in the manager_emp_id column.
This type of query has several drawbacks. First of all, you need to know the number of levels in an organization chart when you write the query, and it's not realistic to assume that you will know that information. It's even less realistic to think that the number of levels will remain stable over time. Moreover, you need to join four instances of the employee table together for a four-level hierarchy. Imagine an organization with 20 levels?you'd need to join 20 instances of the table! Such a join would cause a huge performance problem.
To circumvent problems such as these, Oracle has provided some extensions to ANSI SQL. Oracle provides the following three constructs to effectively and efficiently perform hierarchical queries:
The START WITH . . . CONNECT BY clause
The PRIOR operator
The LEVEL pseudocolumn
The following sections discuss these three Oracle extensions in detail.
You can extract information in hierarchical form from a table containing hierarchical data by using the SELECT statement's START WITH . . . CONNECT BY clause. The syntax for this clause is:
[[START WITH condition1] CONNECT BY condition2]
The syntax elements are:
Specifies the root row(s) of the hierarchy. All rows that satisfy condition1 are considered root rows. If you don't specify the START WITH clause, all rows are considered root rows, which is usually not desirable. You can include a subquery in condition1.
Specifies the relationship between parent rows and child rows in the hierarchy. The relationship is expressed as a comparison expression, where columns from the current row are compared to corresponding parent columns. condition2 must contain the PRIOR operator, which is used to identify columns from the parent row. condition2 cannot contain a subquery.
PRIOR is a built-in Oracle SQL operator that is used with hierarchical queries only. In a hierarchical query, the CONNECT BY clause specifies the relationship between parent and child rows. When you use the PRIOR operator in an expression in the CONNECT BY condition, the expression following the PRIOR keyword is evaluated for the parent row of the current row in the query. In the following example, PRIOR is used to connect each row to its parent by connecting manager_emp_id in the child to emp_id in the parent:
SELECT lname, emp_id, manager_emp_id FROM employee START WITH manager_emp_id IS NULL CONNECT BY PRIOR emp_id = manager_emp_id; LNAME EMP_ID MANAGER_EMP_ID -------------------- ---------- -------------- KING 7839 JONES 7566 7839 SCOTT 7788 7566 ADAMS 7876 7788 FORD 7902 7566 SMITH 7369 7902 BLAKE 7698 7839 ALLEN 7499 7698 WARD 7521 7698 MARTIN 7654 7698 TURNER 7844 7698 JAMES 7900 7698 CLARK 7782 7839 MILLER 7934 7782 14 rows selected.
The PRIOR column does not need to be listed first. The previous query could be restated as:
SELECT lname, emp_id, manager_emp_id FROM employee START WITH manager_emp_id IS NULL CONNECT BY manager_emp_id = PRIOR emp_id;
The preceding two PRIOR examples list all the employees in the organization, because each query uses the START WITH clause to begin with the top-most employee (with NULL manager_emp_id). Instead of reporting out the whole organization chart, you may want to list only the subtree under a given employee, JONES for example. To do this, you can modify the START WITH condition so that it specifies JONES as the root of the query. For example:
SELECT lname, emp_id, manager_emp_id FROM employee START WITH lname = 'JONES' CONNECT BY manager_emp_id = PRIOR emp_id; LNAME EMP_ID MANAGER_EMP_ID -------------------- ---------- -------------- JONES 7566 7839 SCOTT 7788 7566 ADAMS 7876 7788 FORD 7902 7566 SMITH 7369 7902
Since this query considers JONES as the root of the hierarchy, only the employees that belong to the organization tree under JONES (including JONES himself) are returned by the query. Be careful while using conditions such as lname = 'JONES' in hierarchical queries. In this case, if we have two JONES in our organization, the result returned by the hierarchy may be wrong. It is better to use primary or unique key columns, such as emp_id, as the condition in such situations.
In the previous example, we listed the portion of our organization chart headed by the specific employee named "JONES." There can be situations when you may need to print the organization chart headed by any employee that meets a specific condition. For example, you may want to list all employees under the employee who has been working in the company for the longest time. In this case, the starting point of the query (the root) is dependent on a condition. Therefore, you have to use a subquery to generate the information needed to evaluate the condition and pass that information to the main query, as in the following example:
SELECT lname, emp_id, manager_emp_id FROM employee START WITH hire_date = (SELECT MIN(hire_date) FROM employee) CONNECT BY manager_emp_id = PRIOR emp_id; LNAME EMP_ID MANAGER_EMP_ID -------------------- ---------- -------------- BLAKE 7698 7839 ALLEN 7499 7698 WARD 7521 7698 MARTIN 7654 7698 TURNER 7844 7698 JAMES 7900 7698 6 rows selected.
Note the START WITH clause in this example. The subquery in the START WITH clause returns the minimum hire_date in the table, which represents the hire_date of the employee with the longest service. The main query uses this information as the starting point of the hierarchy and lists the organization structure under this employee.
While using a subquery in the START WITH clause, be aware of how many rows will be returned by the subquery. If more than one row is returned when you are expecting just one row (indicated by the = sign), the query will generate an error. You can get around this by replacing = with the IN operator, but be warned that the hierarchical query may then end up dealing with multiple roots.
Since the CONNECT BY condition specifies the parent-child relationship, it cannot contain a loop (also known as a cycle). If a row is both parent (direct ancestor) and child (direct descendent) of another row, then you have a loop. For example, if the employee table had the following two rows, they would represent a loop:
EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE ------ ---------- --------- -------------- --------- --------- 9001 SMITH 20 9002 1800 15-NOV-61 9002 ALLEN 30 9001 11600 16-NOV-61
|
When a parent-child relationship involves two or more columns, you need to use the PRIOR operator before each parent column. Let's take as an example an assembly in a manufacturing plant. An assembly may consist of several subassemblies, and a given subassembly may further contain one or more subassemblies. All of these are stored in a table, assembly:
DESC assembly Name Null? Type -------------------------- -------- -------------- assembly_type NOT NULL VARCHAR2(4) assembly_id NOT NULL NUMBER(6) description NOT NULL VARCHAR2(20) parent_assembly_type VARCHAR2(4) parent_assembly_id NUMBER(6)
Column assembly_type and assembly_id constitute the primary key of this table, and the columns parent_assembly_type and parent_assembly_id together constitute the self-referential foreign key. Therefore, if you want to perform a hierarchical query on this table, you need to include both columns in the START WITH and the CONNECT BY clauses. You also need to use the PRIOR operator before each parent column, as shown in the following example:
SELECT * FROM assembly START WITH parent_assembly_type IS NULL AND parent_assembly_id IS NULL CONNECT BY parent_assembly_type = PRIOR assembly_type AND parent_assembly_id = PRIOR assembly_id; ASSE ASSEMBLY_ID DESCRIPTION PARE PARENT_ASSEMBLY_ID ---- ----------- -------------------- ---- ------------------ A 1234 Assembly A#1234 A 1256 Assembly A#1256 A 1234 B 6543 Part Unit#6543 A 1234 A 1675 Part Unit#1675 B 6543 X 9943 Repair Zone 1 X 5438 Repair Unit #5438 X 9943 X 1675 Readymade Unit #1675 X 5438 7 rows selected.
As we explained earlier, the term level refers to one layer of nodes. For example, in Figure 8-1, the root node (consisting of employee KING) is level 1. The next layer (employees JONES, BLAKE, CLARK) is level 2, and so forth. Oracle provides a pseudocolumn, LEVEL, to represent these levels in a hierarchy tree. Whenever you use the START WITH . . . CONNECT BY clauses in a hierarchical query, you can use the pseudocolumn LEVEL to return the level number for each row returned by the query. The following example illustrates the use of the LEVEL pseudocolumn:
SELECT level, lname, emp_id, manager_emp_id FROM employee START WITH manager_emp_id IS NULL CONNECT BY manager_emp_id = PRIOR emp_id; LEVEL LNAME EMP_ID MANAGER_EMP_ID ---------- -------------------- ---------- -------------- 1 KING 7839 2 JONES 7566 7839 3 SCOTT 7788 7566 4 ADAMS 7876 7788 3 FORD 7902 7566 4 SMITH 7369 7902 2 BLAKE 7698 7839 3 ALLEN 7499 7698 3 WARD 7521 7698 3 MARTIN 7654 7698 3 TURNER 7844 7698 3 JAMES 7900 7698 2 CLARK 7782 7839 3 MILLER 7934 7782 14 rows selected.
Note that each employee is now associated with a number, represented by the pseudocolumn LEVEL, that corresponds to that employee's level in the organization chart (see Figure 8-1).