6- Hierachical Querys(級聯查詢)

relational database is based upon sets, with each table representing a set. However, there are some types of information that are not directly amenable to the set data structure. Think, for example, of an organization chart, a bill of material in a manufacturing and assembly plant, or a family tree. These types of information are hierarchical in nature, and most conveniently represented in a tree structure.

To represent hierarchical data, we need to make use of a relationship such as when one column of a table references another column of the same table. When such a relationship is implemented using a database constraint, it is known as self-referential integrity constraint(當一個表中的某個字段引用了同一個表中的其他字段，就是我們常說的“自引用”完整性約束) . The corresponding CREATE TABLE statement will look as follows

CREATE TABLE EMPLOYEE

(

EMP_ID NUMBER (4) CONSTRAINT EMP_PK PRIMARY KEY,

LNAME VARCHAR2 (15)NOT NULL,

DEPT_ID NUMBER (2)NOT NULL,

MANAGER_EMP_ID NUMBER (4) CONSTRAINT EMP_FK REFERENCES EMPLOYEE(EMP_ID),

)

The column MANAGER_EMP_ID stores the EMP_ID of the employee's manager.There is a foreign key constraint on the MANAGER_EMP_ID column. This enforces the rule that any value we put in the MANAGER_EMP_ID column must be the EMP_ID of a valid employee. Such a constraint is not mandatory when representing hierarchical information. However, it is a good practice to define database constraints to enforce such business rules.

(在字段MANAGER_EMP_ID上有一個外鍵索引，該外鍵索引指向了同一個表中的另一個字段EMP_ID，這就強迫我們向該字段添加的任何值必須是employee表中一個有效的EMP_ID值，像這樣的約束並不是強制必要的，但在執行某些商業邏輯上定義這樣的約束是一個好習慣。)

1.簡單的級聯操作：

A．Finding the Root Node：

Finding the root of a hierarchy tree is easy; we look for the one node with no parent.

B．Finding a Node's Immediate Parent：

We may wish to link nodes to their immediate parents. For example, we might want to print a report showing each employee's manager. The name of each employee's manager can be derived by joining the EMPLOYEE table to itself. This type of join is a self join. The following query returns the desired result:

SELECT E.LNAME "Employee", M.LNAME "Manager"

FROM EMPLOYEE E, EMPLOYEE M

WHERE E.MANAGER_EMP_ID = M.EMP_ID;

The reason that only 13 rows are returned from the self join is simple. This query lists employees and their managers. But since the uppermost employee KING doesn't have any manager, that row is not produced in the output. If we want all the employees to be produced in the result, we need an outer join, as in the following example:

SELECT E.LNAME "Employee", M.LNAME "Manager"

FROM EMPLOYEE E, EMPLOYEE M

WHERE E.MANAGER_EMP_ID = M.EMP_ID (+);

C.Finding the leaf nodes：

The opposite problem from finding the root node, which has no parent, is to find leaf nodes, which have no children. Employees who do not manage anyone are the leaf nodes in the hierarchy tree shown in Figure 8-1. At first glance, the following query seems like it should list all employees from the EMPLOYEE table who are not managers of any other employee:

SELECT * FROM EMPLOYEE

WHERE EMP_ID NOT IN (SELECT MANAGER_EMP_ID FROM EMPLOYEE);

However, when we execute this statement, we will see "No rows selected." Why? It is because the MANAGER_EMP_ID column contains a NULL value in one row (for the uppermost employee), and NULLs can't be compared to any data value.(然而，我們很奇怪地發現：沒有一條記錄返回！爲什麼？因爲最頂層的節點其MANAGER_EMP_ID列包含了空值，而空值是無法和其他類型的數據進行比較的。)

Therefore, to get the employees who don't manage anyone, we need to rewrite the query as follows:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE

FROM EMPLOYEE E

WHERE EMP_ID NOT IN (SELECT MANAGER_EMP_ID FROM EMPLOYEE

WHERE MANAGER_EMP_ID IS NOT NULL);

In this example, the subquery returns the EMP_IDs of all the managers. The outer query then returns all the employees, except the ones returned by the subquery. This query can also be written as a correlated subquery using EXISTS instead of IN:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE

FROM EMPLOYEE E

WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE E1

WHERE E.EMP_ID = E1.MANAGER_EMP_ID);

2．Oracle SQL擴展：

For example, let's say we want to list each employee with his manager. Using regular Oracle SQL, we can perform self outer joins on the EMPLOYEE table, as shown here:

SELECT E_TOP.LNAME, E_2.LNAME, E_3.LNAME, E_4.LNAME

FROM EMPLOYEE E_TOP, EMPLOYEE E_2, EMPLOYEE E_3, EMPLOYEE E_4

WHERE E_TOP.MANAGER_EMP_ID IS NULL

AND E_TOP.EMP_ID = E_2.MANAGER_EMP_ID (+)

AND E_2.EMP_ID = E_3.MANAGER_EMP_ID (+)

AND E_3.EMP_ID = E_4.MANAGER_EMP_ID (+);

The query returns eight rows, corresponding to the eight branches of the tree. To get those results, the query performs a self join on four instances of the EMPLOYEE table. Four EMPLOYEE table instances are needed in this statement because there are four levels to the hierarchy. Each level is represented by one copy of the EMPLOYEE table. The outer join is required because one employee (KING) has a NULL value in the MANAGER_EMP_ID column

This type query has several drawbacks. First of all, we need to know the number of levels in an organization chart when we write the query, and it's not realistic to assume that we will know that information. It's even less realistic to think that the number of levels will remain stable over time. Moreover, we need to join four instances of the EMPLOYEE table together for a four level hierarchy. Imagine an organization with 20 levels—we'd need to join 20 tables. This would cause a huge performance problem.

A．START WITH...CONNECT BY and PRIOR：

We can extract information in hierarchical form from a table containing hierarchical data by using the SELECT statement's START WITH...CONNECT BY clause. The syntax for this clause is:

[[START WITH condition1] CONNECT BY condition2]

The syntax elements are:

START WITH condition1

Specifies the root row(s) of the hierarchy. All rows that satisfy condition1 are considered root rows. If we don't specify the START WITH clause, all rows are considered root rows, which is usually not desirable. We can include a subquery in condition1.

(Start with用於指定級聯結構中的根節點，我們可以在Start with子句中使用子查詢)

CONNECT BY condition2

Specifies the relationship between parent rows and child rows in the hierarchy. The relationship is expressed as a comparison expression, where columns from the current row are compared to corresponding parent columns. condition2 must contain the PRIOR operator, which is used to identify columns from the parent row. condition2 cannot contain a subquery.

(Connect by用於指定級聯結構中父記錄和子記錄之間的對應關係，condition2必須包含Prior操作符，該操作符用於表示父記錄中的列，condition2不能含有子查詢)

PRIOR is a built-in Oracle SQL operator that is used with hierarchical queries only. In a hierarchical query, the CONNECT BY clause specifies the relationship between parent and child rows. When we use the PRIOR operator in an expression in the CONNECT BY condition, the expression following the PRIOR keyword is evaluated for the parent row of the current row in the query. In the following example, PRIOR is used to connect each row to its parent by connecting MANAGER_EMP_ID in the child to EMP_ID in the parent:

(Prior操作符是Oracle中專門用於級聯查詢的操作符，當我們在一個表達式中使用Prior操作符時，緊跟着Prior關鍵字的表達式將在查詢中被當成父記錄看待，等號右邊的表達式將被看成是子記錄)

例：

SELECT LNAME, EMP_ID, MANAGER_EMP_ID

FROM EMPLOYEE

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;

LNAME EMP_ID MANAGER_EMP_ID

-------------------- ---------- ----------------

KING 7839

JONES 7566 7839

SCOTT 7788 7566

ADAMS 7876 7788

FORD 7902 7566

The PRIOR column does not need to be listed first. The previous query could be restated as:

SELECT LNAME, EMP_ID, MANAGER_EMP_ID

FROM EMPLOYEE

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

Since the CONNECT BY condition specifies the parent-child relationship, it cannot contain a loop. If a row is both parent (direct ancestor) and child (direct descendent) of another row, then we have a loop. For example, if the EMPLOYEE table had the following two rows, they would represent a loop:

EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE

------ ---------- --------- -------------- --------- ---------

9001 SMITH 20 9002 1800 15-NOV-61

9002 ALLEN 30 9001 11600 16-NOV-61

***************************************************************************************

When a parent-child relationship involves two or more columns, we need to use the PRIOR operator before each parent column.(當一個父子記錄之間的關係受兩個或多個字段影響時，我們就必須在每個父字段的前面都加上Prior關鍵字)

SELECT * FROM ASSEMBLY

START WITH PARENT_ASSEMBLY_TYPE IS NULL AND PARENT_ASSEMBLY_ID IS NULL

CONNECT BY PARENT_ASSEMBLY_TYPE = PRIOR ASSEMBLY_TYPE

AND PARENT_ASSEMBLY_ID = PRIOR ASSEMBLY_ID;

B．The LEVEL Pseudocolumn：

In a hierarchy tree, the term level refers to one layer of nodes.Oracle provides a pseudocolumn, LEVEL, to represent these levels in a hierarchy tree. Whenever we use the START WITH...CONNECT BY clauses in a hierarchical query, we can use the pseudocolumn LEVEL to return the level number for each row returned by the query.(在一棵級聯結構的樹中，level用來反映某一層的節點，Oracle提供了一個僞列(pseudocolumn)－Level，用來表示級聯樹中的層次，不管我們在級聯查詢中如何使用Start with…Conect by子句，我們可以使用僞列“level”來返回查詢結果中任何一條記錄的層數)

The following example illustrates the use of the LEVEL pseudocolumn

SELECT LEVEL, LNAME, EMP_ID, MANAGER_EMP_ID

FROM EMPLOYEE

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

LEVEL LNAME EMP_ID MANAGER_EMP_ID

----- ----- -------------------- ---------------

1 KING 7839

2 JONES 7566 7839

3 SCOTT 7788 7566

4 ADAMS 7876 778

3．複雜級聯運算：

A．Finding the Number of Levels：

Previously we showed how the LEVEL pseudocolumn generates a level number for each record when we use the START WITH...CONNECT BY clause. We can use the following query to determine the number of levels in the hierarchy by counting the number of distinct level numbers returned by the LEVEL pseudocolumn:

SELECT COUNT(DISTINCT LEVEL)

FROM EMPLOYEE

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;

COUNT(DISTINCTLEVEL)

--------------------

B．Listing Records in Hierarchical Order

One of the very common programming challenges SQL programmers face is to list records in a hierarchy in their proper hierarchical order. For example, we might wish to list employees with their subordinates underneath them, as is in the following query

SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || LNAME "EMPLOYEE",EMP_ID, MANAGER_EMP_ID

FROM EMPLOYEE

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;

LEVEL Employee EMP_ID MANAGER_EMP_ID

--------- ------------ --------- --------------

1 KING 7839

2 JONES 7566 7839

3 SCOTT 7788 7566

4 ADAMS 7876 7788

Notice that by using the expression LPAD(' ',2*(LEVEL - 1)), we are able to align employee names in a manner that corresponds to their level. As the level number increases, the number of spaces returned by the expression increases, and the employee name is further indented.

(注意：通過使用LPAD(' ',2*(LEVEL - 1))表達式，我們可以用一種特定的方式來對齊各個級別的用戶名，隨着級別數目的增加，該表達式返回的空格數目也隨着增加，各個用戶名之間更加交錯排列)

Instead of reporting out the whole organization chart, we may want to list only the subtree under a given employee, JONES for example. To do this, we can modify the START WITH condition so that it specifies JONES as the root of the query. For example

SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || LNAME "EMPLOYEE", EMP_ID, MANAGER_EMP_ID, SALARY

FROM EMPLOYEE

START WITH LNAME = 'JONES'

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

LEVEL Employee EMP_ID MANAGER_EMP_ID SALARY

--------- ------------ --------- -------------- ---------

1 JONES 7566 7839 2000

2 SCOTT 7788 7566 3000

3 ADAMS 7876 7788 1100

2 FORD 7902 7566 3000

3 SMITH 7369 7902 800

Notice that since we asked the query to consider JONES as the root of the hierarchy, it assigned level 1 to JONES, level 2 to employees directly reporting to him, and so forth. Be careful while using conditions such as LNAME = 'JONES' in hierarchical queries. In this case, if we have two JONES in our organization, the result returned by the hierarchy may be wrong. It is better to use primary or unique key columns, such as EMP_ID, as the condition in such situations

(注意：因爲我們在查詢中指定JONES作爲級聯查詢的起始點，它指定JONES的級別爲1，其直接下屬的級別爲2，實際上這和我們原來查詢得出的結果不同。當在級聯查詢中使用諸如LNAME = ‘JONES’的條件時要特別注意，在這種情況下，如果我們剛好有兩個同名的JONES，由此返回的結果集將有可能錯誤。相比之下，使用主鍵索引或惟一索引，比如EMP_ID作爲條件更加合適)

In this example, we listed the portion of the organization chart headed by a specific employee. There could be situations when we may need to print the organization chart headed by any employee that meets a specific condition. For example, we may want to list all employees under the employee who has been working in the company for the longest time. In this case, the starting point of the query (the root) is dependent on a condition. Therefore, we have to use a subquery to generate this information and pass it to the main query, as in the following example

SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || LNAME "EMPLOYEE",EMP_ID, MANAGER_EMP_ID, SALARY

FROM EMPLOYEE

START WITH HIRE_DATE = (SELECT MIN(HIRE_DATE) FROM EMPLOYEE)

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

(該級聯查詢的起始條件中使用了子查詢，先查出入職日期最早的時間，作爲起始條件，然後查出其下的所有子節點的記錄)

While using a subquery in the START WITH clause, be aware of how many rows will be returned by the subquery. If more than one row is returned when we are expecting just one row (indicated by the = sign), the query will generate an error. We can get around this by replacing = with the IN operator, but be warned that the hierarchical query may then end up dealing with multiple roots.

C．Checking for Ascendancy(檢查節點之間的父子關係)：

Another common operation on hierarchical data is to check for ascendancy. In an organization chart, we may ask whether one employee has authority over another. For example: "Does JONES have any authority over BLAKE?" To find out, we need to search for BLAKE in the subtree headed by JONES. If we find BLAKE in the subtree, then we know that BLAKE either directly or indirectly reports to JONES. If we don't find BLAKE in the subtree, then we know that JONES doesn't have any authority over BLAKE. The following query searches for BLAKE in the subtree headed by JONES:

SELECT * FROM EMPLOYEE

WHERE LNAME = 'BLAKE'

START WITH LNAME = 'JONES'

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

The START WITH...CONNECT BY clause in this example generates the subtree headed by JONES, and the WHERE clause filters this subtree to find BLAKE. As we can see, no rows were returned. This means that BLAKE was not found in JONES' subtree, so we know that JONES has no authority over BLAKE

D．Deleting a Subtree(刪除子樹)：

Let's assume that the organization we are dealing with splits, and JONES and all his subordinates form a new company. Therefore, we don't need to maintain JONES and his subordinates in our EMPLOYEE table. Furthermore, we need to delete the entire subtree headed by JONES, as shown in Figure 8-1, from our table. We can do this by using a subquery as in the following example:

DELETE FROM EMPLOYEE

WHERE EMP_ID IN (SELECT EMP_ID FROM EMPLOYEE

START WITH LNAME = 'JONES'

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID);

In this example, the subquery generates the subtree headed by JONES, and returns the EMP_IDs of the employees in that subtree, including JONES'. The outer query then deletes the records with these EMP_ID values from the EMPLOYEE table.

E．Listing Multiple Root Nodes(列出所有的根節點)：

An interesting variation on the problem of listing the root node of a hierarchy is to find and list the root nodes from several hierarchies that are all stored in the same table. For example, we might consider department manager's to represent root nodes, and we might further wish to list all department managers found in the EMPLOYEE table

There are no constraints on the employees belonging to any department. However, we can assume that if A reports to B and B reports to C, and A and C belong to the same department, then B also belongs to the same department.

If an employee's manager belongs to another department, then that employee is the uppermost employee, or manager, of his department.Therefore, to find the uppermost employee in each department, we need to search the tree for those employees whose managers belong to a different department then their own.

(假如一個僱員的上級隸屬於另一個部門，那麼該僱員肯定是其所在部門中的最頂端的僱員，或者說是該部門的經理。所以爲了找出employee表中所有部門的最頂端的僱員，我們必須查找整棵樹，找出那些僱員的上級隸屬於另一個部門的節點)

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE

FROM EMPLOYEE

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID

AND DEPT_ID != PRIOR DEPT_ID;

EMP_ID LNAME DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE

------ -------- -------- -------------- ------ ---------

7839 KING 10 5000 17-NOV-81

7566 JONES 20 7839 2975 02-APR-81

7698 BLAKE 30 7839 2850 01-MAY-81

F．Listing the Top Few Levels of a Hierarchy(列出級聯記錄中的若干層記錄)：

Another common task in dealing with hierarchical data is listing the top few levels of a hierarchy tree. For example, we may want to list top management employees in an organization. Let's assume that the top two levels in our organization chart constitute top management. We can then use the LEVEL pseudocolumn to identify those employees, as in the following example

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE

FROM EMPLOYEE

WHERE LEVEL <= 2

START WITH MANAGER_EMP_ID IS NULL

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

G．Aggregating a Hierarchy(級聯記錄的統計)：

Another challenging requirement on hierarchical data is to aggregate a hierarchy. For example, we may want to sum the salaries of all employees reporting to a specific employee. Or, we may want to consider each employee as a root, and for each employee report out the sum of the salaries of all subordinate employees

I．統計某個特定僱員及其所有下屬的工資總和：

SELECT SUM(SALARY) FROM EMPLOYEE

START WITH LNAME = 'JONES'

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

II．統計每個僱員及其所有下屬的工資總和：

相對於第一個問題，我們必須把所有的節點都看成是根節點，對每個節點都計算該員工及其所有下屬的工資總和。明顯地，我們必須不斷重複執行第一個問題所使用地查詢。

SELECT LNAME, SALARY, (SELECT SUM(SALARY) FROM EMPLOYEE T1

START WITH LNAME = T2.LNAME

CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID) SUM_SALARY

FROM EMPLOYEE T2;

4．級聯查詢中的限制：

A．A hierarchical query can't use a join.

B．A hierarchical query cannot select data from a view that involves a join

C．We can use an ORDER BY clause within a hierarchical query; however, the ORDER BY clause takes precedence over the hierarchical ordering performed by the START WITH...CONNECT BY clause. Therefore, unless all we care about is the level number, it doesn't make sense to use ORDER BY in a hierarchical query