6- 集合運算

 
1.集合運算簡介:
There are situations when we need to combine the results from two or more SELECT statements. SQL enables us to handle these requirements by using set operations. The result of each SELECT statement can be treated as a set, and SQL set operations can be applied on those sets to arrive at a final result. Oracle SQL supports the following four set operations:
· UNION ALL
· UNION
· MINUS
· INTERSECT
 
SQL statements containing these set operators are referred to as compound queries, and each SELECT statement in a compound query is referred to as a component query. Two SELECTs can be combined into a compound query by a set operation only if they satisfy the following two conditions:
·The result sets of both the queries must have the same number of columns.
·The datatype of each column in the second result set must match the datatype of its corresponding column in the first result set.
兩個SELECT語句要合成一個混合查詢(compound query),必須滿足以下兩個條件:
 ·兩個查詢的結果集其列數必須相同
 ·第二個查詢的結果集其字段的類型必須和第一個查詢的結果集的字段類型相同,但是如果Oracle能夠對字段的類
     型進行隱式的自動轉換,則不要求兩個查詢的結果集在字段類型上完全一致
 
These conditions are also referred to as union compatibility conditions. The term union compatibility is used even though these conditions apply to other set operations as well. Set operations are often called vertical joins, because the result combines data from two or more SELECTS based on columns instead of rows. The generic syntax of a query involving a set operation is:
 
<component query>
 
{UNION | UNION ALL | MINUS | INTERSECT}
 
<component query>
 
2.集合運算符:
The following list briefly describes the four set operations supported by Oracle SQL:
·UNION ALL
Combines the results of two SELECT statements into one result set.
·UNION
Combines the results of two SELECT statements into one result set, and then eliminates any duplicate rows from that result set.
·MINUS
Takes the result set of one SELECT statement, and removes those rows that are also returned by a second SELECT statement.
·INTERSECT
Returns only those rows that are returned by each of two SELECT statements
 
AUnion All:
The UNION ALL operator merges the result sets of two component queries. This operation returns rows retrieved by either of the component queries.The UNION ALL operator simply merges the output of its component queries, without caring about any duplicates in the final result set(Union All只是簡單地將兩個結果集合併在一起,而不管其中是否有重複的記錄).
 
BUnion:
The UNION operator returns all distinct rows retrieved by two component queries. The UNION operation eliminates duplicates while merging rows retrieved by either of the component queries
 
To eliminate duplicate rows, a UNION operation needs to do some extra tasks as compared to the UNION ALL operation. These extra tasks include sorting and filtering the result set. If we observe carefully,(爲了將重複的記錄排除掉,Union運算和Union All運算相比需要做一些額外的工作,這些額外的工作包括了排序和對結果集進行過濾) we will notice that the result set of the UNION ALL operation is not sorted, whereas the result set of the UNION operation is sorted. These extra tasks introduce a performance overhead to the UNION operation. A query involving UNION will take extra time compared to the same query with UNION ALL, even if there are no duplicates to remove Therefore, unless we have a valid need to retrieve only distinct rows, we should use UNION ALL instead of UNION for better performance. (即便在數據集中確實沒有重複的數據要過濾,Union查詢也需要額外的時間用以和採用Union All查詢的結果集進行對比。所以,除非我們確實需要完全不同的記錄,否則我們應該採用Union All代替Union來提高性能).
 
CIntersect:
INTERSECT returns only the rows retrieved by both component queries. Compare this with UNION, which returns the rows retrieved by any of the component queries. If UNION acts like 'OR', INTERSECT acts like 'AND'
 
DMinus:
Minus returns all rows from the first SELECT that are not also returned by the second SELECT(Minus返回所有出現在第一個結果集中,但不出現在第二個結果集中的記錄)
 
例:
查詢1:
SELECT CUST_NBR, NAME
 FROM CUSTOMER
 WHERE REGION_ID = 5;
 CUST_NBR NAME
---------- ------------------------------
         1 Cooper Industries
         2 Emblazon Corp.
         3 Ditech Corp.
         4 Flowtech Inc.
         5 Gentech Industries
 
查詢2:
SELECT C.CUST_NBR, C.NAME
 FROM CUSTOMER C
 WHERE C.CUST_NBR IN (SELECT O.CUST_NBR
                     FROM CUST_ORDER O, EMPLOYEE E
                     WHERE O.SALES_EMP_ID = E.EMP_ID
                     AND E.LNAME = 'MARTIN');
 CUST_NBR NAME
---------- ------------------------------
         4 Flowtech Inc.
         8 Zantech Inc.
 
查詢結果集相減:
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5
MINUS
SELECT C.CUST_NBR, C.NAME
FROM CUSTOMER C
WHERE C.CUST_NBR IN (SELECT O.CUST_NBR
                       FROM CUST_ORDER O, EMPLOYEE E
                       WHERE O.SALES_EMP_ID = E.EMP_ID
                          AND E.LNAME = 'MARTIN');
 CUST_NBR NAME
---------- ------------------------------
         1 Cooper Industries
         2 Emblazon Corp.
         3 Ditech Corp.
         5 Gentech Industries
You might wonder why we don't see "Zantech Inc." in the output. An important thing to note here is that the execution order of component queries in a set operation is from top to bottom. The results of UNION, UNION ALL, and INTERSECT will not change if we alter the ordering of component queries. However, the result of MINUS will be different if we alter the order of the component queries. If we rewrite the previous query by switching the positions of the two SELECTs, we get a completely different result
 
SELECT C.CUST_NBR, C.NAME
FROM CUSTOMER C
WHERE C.CUST_NBR IN (SELECT O.CUST_NBR
                        FROM CUST_ORDER O, EMPLOYEE E
                        WHERE O.SALES_EMP_ID = E.EMP_ID
                          AND E.LNAME = 'MARTIN')
MINUS
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5;
 
 CUST_NBR NAME
---------- ------------------------------
         8 Zantech Inc.
 
In a MINUS operation, rows may be returned by the second SELECT that are not also returned by the first. These rows are not included in the output(在Minus運算中,第二個結果集返回的記錄可能不會在第一個結果集中出現,但是這些記錄同樣不會被包含在最終的結果集中)
 
3.利用集合運算對錶進行比較:
The following query uses both MINUS and UNION ALL to compare two tables for equality. The query depends on each table having either a primary key or at least one unique index.(假設每個表都有一個主鍵索引或至少有一個惟一)
 
(SELECT * FROM CUSTOMER_KNOWN_GOOD
 MINUS
 SELECT * FROM CUSTOMER_TEST)   --表A的記錄減去表B的記錄
 
UNION ALL               --注意兩邊要用括號,確保先進行MINUS運算                  
 
(SELECT * FROM CUSTOMER_TEST             --B的記錄減去表A的記錄
 MINUS
 SELECT * FROM CUSTOMER_KNOWN_GOOD);
 
We can look at it as the union of two compound queries. The parentheses ensure that both MINUS operations take place first before the UNION ALL operation is performed. The result of the first MINUS query will be those rows in CUSTOMER_KNOWN_GOOD that are not also in CUSTOMER_TEST. The result of the second MINUS query will be those rows in CUSTOMER_TEST that are not also in CUSTOMER_KNOWN_GOOD. The UNION ALL operator simply combines these two result sets for convenience. If no rows are returned by this query, then we know that both tables have identical rows. Any rows returned by this query represent differences between the CUSTOMER_TEST and CUSTOMER_KNOWN_GOOD tables.
 
If the possibility exists for one or both tables to contain duplicate rows, we must use a more general form of this query in order to test two tables for equality. This more general form uses row counts to detect duplicates(如果表中允許有重複記錄的出現,則用第一種方式會得到錯誤的結果,此時可以通過對錶A和表B的記錄進行分組統計,然後相減,如果兩邊的記錄完全相同,則不會出現在最終的記錄集中,如果兩邊的記錄數不同,則會出現在最終的記錄集中)
 
(SELECT C1.*, COUNT(*) FROM CUSTOMER_KNOWN_GOOD C1
 GROUP BY C1.CUST_NBR, C1.NAME
   MINUS
SELECT C2.*, COUNT(*) FROM CUSTOMER_TEST C2
 GROUP BY C2.CUST_NBR, C2.NAME)      --將表A分組統計的結果減去表B分組統計的結果        
 
UNION ALL
 
(SELECT C3.*, COUNT(*) FROM CUSTOMER_TEST C3
 GROUP BY C3.CUST_NBR, C3.NAME
 MINUS
SELECT C4.*, COUNT(*)
 FROM CUSTOMER_KNOWN_GOOD C4
 GROUP BY C4.CUST_NBR, C4.NAME);     --將表B分組統計的結果減去表A分組統計的結果
 
   CUST_NBR NAME                             COUNT(*)
----------- ------------------------------ ----------
          2 Samsung                                 1  --表A減表B的結果
          3 Panasonic                               3
          2 Samsung                                 2   --B減表A的結果
          3 Panasonic                               1
 
These results indicate that one table (CUSTOMER_KNOWN_GOOD) has one record for "Samsung", whereas the second table (CUSTOMER_TEST) has two records for the same customer. Also, one table (CUSTOMER_KNOWN_GOOD) has three records for "Panasonic", whereas the second table (CUSTOMER_TEST) has one record for the same customer. Both the tables have the same number of rows (two) for "Sony", and therefore "Sony" doesn't appear in the output.
 
Duplicate rows are not possible in tables that have a primary key or at least one unique index. Use the short form of the table comparison query for such tables.
 
4.在混合查詢(Compound Query)中使用NULLS:
As we know, NULL doesn't have a datatype, and NULL can be used in place of a value of any datatype. If we purposely select NULL as a column value in a component query, Oracle no longer has two datatypes to compare in order to see whether the two component queries are compatible
(正如我們所知,NULL型變量是沒有數據類型的,並且NULL可以被用在任何數據類型的變量值處,假如我們有意地在構成查詢(component query)中SELECT NULL值,Oracle不會有第二種數據類型用來判斷兩個構成查詢中的返回值是否一致)
 
For character columns, this is no problem. For example:
 
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
 UNION
SELECT 2 NUM, NULL STRING FROM DUAL;
 
       NUM STRING
---------- --------
         1 DEFINITE
         2
Notice that Oracle considers the character string 'DEFINITE' from the first component query to be compatible with the NULL value supplied for the corresponding column in the second component query.
 
However, if a NUMBER or a DATE column of a component query is set to NULL, we must explicitly tell Oracle what "flavor" of NULL to use. Otherwise, we'll encounter errors.
 
For example:
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
 UNION
SELECT NULL NUM, 'UNKNOWN' STRING FROM DUAL;
 
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
       *
ERROR at line 1:
ORA-01790: expression must have same datatype as corresponding expression
 
Note that the use of NULL in the second component query causes a datatype mismatch between the first column of the first component query, and the first column of the second component query. Using NULL for a DATE column causes the same problem
 
In these cases, we need to cast the NULL to a suitable datatype to fix the problem, as in the following examples
 
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
 UNION
SELECT TO_NUMBER(NULL) NUM, 'UNKNOWN' STRING FROM DUAL;
 
       NUM STRING
---------- --------
         1 DEFINITE
           UNKNOWN
This problem of union compatibility when using NULLs is encountered in Oracle8i. However, there is no such problem in Oracle9i. Oracle9i is smart enough to know which flavor of NULL to use in a compound query
 
5.集合運算中的規則和限制:
there are some other rules and restrictions that apply to the set operations
 
Column names for the result set are derived from the first SELECT
 
If we want to use ORDER BY in a query involving set operations, we must place the ORDER BY at the end of the entire statement. The ORDER BY clause can appear only once at the end of the compound query. The component queries can't have individual ORDER BY clauses
(如果我們想在使用集合運算的查詢中進行排序,必須把Order By放在整個SQL語句的最後,Order By只能在複合查詢中出現一次,而且是在末尾。不允許複合查詢中的單個查詢擁有獨立的Order By)
 
SELECT CUST_NBR, NAME
 FROM CUSTOMER
WHERE REGION_ID = 5
UNION
SELECT EMP_ID, LNAME
 FROM EMPLOYEE
WHERE LNAME = 'MARTIN'
ORDER BY CUST_NBR;
 
 CUST_NBR NAME
---------- ---------------------
         1 Cooper Industries
         2 Emblazon Corp.
         3 Ditech Corp.
         4 Flowtech Inc.
         5 Gentech Industries
 
Note that the column name used in the ORDER BY clause of this query is taken from the first SELECT. We couldn't order these results by EMP_ID. If we attempt to ORDER BY EMP_ID, we will get an error, as in the following example
 
SELECT CUST_NBR, NAME
 FROM CUSTOMER
WHERE REGION_ID = 5
    UNION
SELECT EMP_ID, LNAME
 FROM EMPLOYEE
WHERE LNAME = 'MARTIN' ORDER BY EMP_ID;
ORDER BY EMP_ID
         *
ERROR at line 8:
ORA-00904: invalid column name
 
The ORDER BY clause doesn't recognize the column names of the second SELECT. To avoid confusion over column names, it is a common practice to ORDER BY column positions
 
SELECT CUST_NBR, NAME
 FROM CUSTOMER
WHERE REGION_ID = 5
    UNION
SELECT EMP_ID, LNAME
 FROM EMPLOYEE
WHERE LNAME = 'MARTIN'
ORDER BY 1;
 
Unlike ORDER BY, we can use GROUP BY and HAVING clauses in component queries.
 
Component queries are executed from top to bottom. If we want to alter the sequence of execution, use parentheses appropriately
 
發佈了64 篇原創文章 · 獲贊 5 · 訪問量 40萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章