8.3 Oracle SQL Extensions

In the previous section, you saw how to perform some operations on a hierarchical tree by using simple SQL techniques. Operations such as traversing a tree, finding levels, etc., require more complex SQL statements, and also require the use of features designed specifically for working with hierarchical data. Oracle provides some extensions to ANSI SQL to facilitate these operations. But before looking at the Oracle SQL extensions, let's look at how you can traverse a tree using ANSI SQL, and at the problems you'll encounter when doing that.

8.3.1 Tree Traversal Using ANSI SQL

Say you want to list each employee with his manager. Using regular Oracle SQL, you can perform self outer-joins on the employee table, as shown here:

SELECT e_top.lname, e_2.lname, e_3.lname, e_4.lname 

FROM employee e_top LEFT OUTER JOIN employee e_2 

                                    ON e_top.emp_id = e_2.manager_emp_id

                    LEFT OUTER JOIN employee e_3 

                                    ON e_2.emp_id = e_3.manager_emp_id

                    LEFT OUTER JOIN employee e_4 

                                    ON e_3.emp_id = e_4.manager_emp_id

WHERE e_top.manager_emp_id IS NULL;



LNAME                LNAME                LNAME                LNAME

-------------------- -------------------- -------------------- ------

KING                 JONES                FORD                 SMITH

KING                 JONES                SCOTT                ADAMS

KING                 BLAKE                TURNER

KING                 BLAKE                ALLEN

KING                 BLAKE                WARD

KING                 CLARK                MILLER

KING                 BLAKE                MARTIN

KING                 BLAKE                JAMES



8 rows selected.

The query returns eight rows, corresponding to the eight branches of the tree. To get those results, the query performs a self join on four instances of the employee table. Four employee table instances are needed in this statement because there are four levels to the hierarchy. Each level is represented by one copy of the employee table. The outer join is required because one employee (KING) has a NULL value in the manager_emp_id column.

This type of query has several drawbacks. First of all, you need to know the number of levels in an organization chart when you write the query, and it's not realistic to assume that you will know that information. It's even less realistic to think that the number of levels will remain stable over time. Moreover, you need to join four instances of the employee table together for a four-level hierarchy. Imagine an organization with 20 levels?you'd need to join 20 instances of the table! Such a join would cause a huge performance problem.

To circumvent problems such as these, Oracle has provided some extensions to ANSI SQL. Oracle provides the following three constructs to effectively and efficiently perform hierarchical queries:

  • The START WITH . . . CONNECT BY clause

  • The PRIOR operator

  • The LEVEL pseudocolumn

The following sections discuss these three Oracle extensions in detail.

8.3.2 START WITH . . . CONNECT BY and PRIOR

You can extract information in hierarchical form from a table containing hierarchical data by using the SELECT statement's START WITH . . . CONNECT BY clause. The syntax for this clause is:

[[START WITH condition1]  CONNECT BY condition2]

The syntax elements are:


START WITH condition1

Specifies the root row(s) of the hierarchy. All rows that satisfy condition1 are considered root rows. If you don't specify the START WITH clause, all rows are considered root rows, which is usually not desirable. You can include a subquery in condition1.


CONNECT BY condition2

Specifies the relationship between parent rows and child rows in the hierarchy. The relationship is expressed as a comparison expression, where columns from the current row are compared to corresponding parent columns. condition2 must contain the PRIOR operator, which is used to identify columns from the parent row. condition2 cannot contain a subquery.

PRIOR is a built-in Oracle SQL operator that is used with hierarchical queries only. In a hierarchical query, the CONNECT BY clause specifies the relationship between parent and child rows. When you use the PRIOR operator in an expression in the CONNECT BY condition, the expression following the PRIOR keyword is evaluated for the parent row of the current row in the query. In the following example, PRIOR is used to connect each row to its parent by connecting manager_emp_id in the child to emp_id in the parent:

SELECT lname, emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY PRIOR emp_id = manager_emp_id;



LNAME                    EMP_ID MANAGER_EMP_ID

-------------------- ---------- --------------

KING                       7839

JONES                      7566           7839

SCOTT                      7788           7566

ADAMS                      7876           7788

FORD                       7902           7566

SMITH                      7369           7902

BLAKE                      7698           7839

ALLEN                      7499           7698

WARD                       7521           7698

MARTIN                     7654           7698

TURNER                     7844           7698

JAMES                      7900           7698

CLARK                      7782           7839

MILLER                     7934           7782



14 rows selected.

The PRIOR column does not need to be listed first. The previous query could be restated as:

SELECT lname, emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY manager_emp_id = PRIOR emp_id;

The preceding two PRIOR examples list all the employees in the organization, because each query uses the START WITH clause to begin with the top-most employee (with NULL manager_emp_id). Instead of reporting out the whole organization chart, you may want to list only the subtree under a given employee, JONES for example. To do this, you can modify the START WITH condition so that it specifies JONES as the root of the query. For example:

SELECT lname, emp_id, manager_emp_id

FROM employee

START WITH lname = 'JONES'

CONNECT BY manager_emp_id = PRIOR emp_id;



LNAME                    EMP_ID MANAGER_EMP_ID

-------------------- ---------- --------------

JONES                      7566           7839

SCOTT                      7788           7566

ADAMS                      7876           7788

FORD                       7902           7566

SMITH                      7369           7902

Since this query considers JONES as the root of the hierarchy, only the employees that belong to the organization tree under JONES (including JONES himself) are returned by the query. Be careful while using conditions such as lname = 'JONES' in hierarchical queries. In this case, if we have two JONES in our organization, the result returned by the hierarchy may be wrong. It is better to use primary or unique key columns, such as emp_id, as the condition in such situations.

In the previous example, we listed the portion of our organization chart headed by the specific employee named "JONES." There can be situations when you may need to print the organization chart headed by any employee that meets a specific condition. For example, you may want to list all employees under the employee who has been working in the company for the longest time. In this case, the starting point of the query (the root) is dependent on a condition. Therefore, you have to use a subquery to generate the information needed to evaluate the condition and pass that information to the main query, as in the following example:

SELECT lname, emp_id, manager_emp_id

FROM employee

START WITH hire_date = (SELECT MIN(hire_date) FROM employee)

CONNECT BY manager_emp_id = PRIOR emp_id;



LNAME                    EMP_ID MANAGER_EMP_ID

-------------------- ---------- --------------

BLAKE                      7698           7839

ALLEN                      7499           7698

WARD                       7521           7698

MARTIN                     7654           7698

TURNER                     7844           7698

JAMES                      7900           7698



6 rows selected.

Note the START WITH clause in this example. The subquery in the START WITH clause returns the minimum hire_date in the table, which represents the hire_date of the employee with the longest service. The main query uses this information as the starting point of the hierarchy and lists the organization structure under this employee.

While using a subquery in the START WITH clause, be aware of how many rows will be returned by the subquery. If more than one row is returned when you are expecting just one row (indicated by the = sign), the query will generate an error. You can get around this by replacing = with the IN operator, but be warned that the hierarchical query may then end up dealing with multiple roots.

Since the CONNECT BY condition specifies the parent-child relationship, it cannot contain a loop (also known as a cycle). If a row is both parent (direct ancestor) and child (direct descendent) of another row, then you have a loop. For example, if the employee table had the following two rows, they would represent a loop:

EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 

------ ---------- --------- -------------- --------- --------- 

  9001 SMITH             20           9002      1800 15-NOV-61 

  9002 ALLEN             30           9001     11600 16-NOV-61

The pseudocolumn CONNECT_BY_ISCYCLE and the keyword NOCYCLE, both introduced in Oracle Database 10g, help identify and ignore cycles. These are discussed later in the chapter.


When a parent-child relationship involves two or more columns, you need to use the PRIOR operator before each parent column. Let's take as an example an assembly in a manufacturing plant. An assembly may consist of several subassemblies, and a given subassembly may further contain one or more subassemblies. All of these are stored in a table, assembly:

DESC assembly



 Name                       Null?    Type

 -------------------------- -------- --------------

 assembly_type              NOT NULL VARCHAR2(4)

 assembly_id                NOT NULL NUMBER(6)

 description                NOT NULL VARCHAR2(20)

 parent_assembly_type                VARCHAR2(4)

 parent_assembly_id                  NUMBER(6)

Column assembly_type and assembly_id constitute the primary key of this table, and the columns parent_assembly_type and parent_assembly_id together constitute the self-referential foreign key. Therefore, if you want to perform a hierarchical query on this table, you need to include both columns in the START WITH and the CONNECT BY clauses. You also need to use the PRIOR operator before each parent column, as shown in the following example:

SELECT * FROM assembly

START WITH parent_assembly_type IS NULL

AND parent_assembly_id IS NULL

CONNECT BY parent_assembly_type = PRIOR assembly_type

AND parent_assembly_id = PRIOR assembly_id;



ASSE ASSEMBLY_ID DESCRIPTION          PARE PARENT_ASSEMBLY_ID

---- ----------- -------------------- ---- ------------------

A           1234 Assembly A#1234

A           1256 Assembly A#1256      A                  1234

B           6543 Part Unit#6543       A                  1234

A           1675 Part Unit#1675       B                  6543

X           9943 Repair Zone 1

X           5438 Repair Unit #5438    X                  9943

X           1675 Readymade Unit #1675 X                  5438



7 rows selected.

8.3.3 The LEVEL Pseudocolumn

As we explained earlier, the term level refers to one layer of nodes. For example, in Figure 8-1, the root node (consisting of employee KING) is level 1. The next layer (employees JONES, BLAKE, CLARK) is level 2, and so forth. Oracle provides a pseudocolumn, LEVEL, to represent these levels in a hierarchy tree. Whenever you use the START WITH . . . CONNECT BY clauses in a hierarchical query, you can use the pseudocolumn LEVEL to return the level number for each row returned by the query. The following example illustrates the use of the LEVEL pseudocolumn:

SELECT level, lname, emp_id, manager_emp_id

FROM employee

START WITH manager_emp_id IS NULL

CONNECT BY manager_emp_id = PRIOR emp_id;



     LEVEL LNAME                    EMP_ID MANAGER_EMP_ID

---------- -------------------- ---------- --------------

         1 KING                       7839

         2 JONES                      7566           7839

         3 SCOTT                      7788           7566

         4 ADAMS                      7876           7788

         3 FORD                       7902           7566

         4 SMITH                      7369           7902

         2 BLAKE                      7698           7839

         3 ALLEN                      7499           7698

         3 WARD                       7521           7698

         3 MARTIN                     7654           7698

         3 TURNER                     7844           7698

         3 JAMES                      7900           7698

         2 CLARK                      7782           7839

         3 MILLER                     7934           7782



14 rows selected.

Note that each employee is now associated with a number, represented by the pseudocolumn LEVEL, that corresponds to that employee's level in the organization chart (see Figure 8-1).