4- 子查詢

 
1.子查詢的概念:
A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing statement.
 
Subqueries are executed prior to execution of the containing SQL statementand. the result set generated by the subquery is discarded(丟棄) after the containing SQL statement has finished execution. Thus, a subquery can be thought of as a temporary table with statement scope
 
A subquery may either be correlated with its containing SQL statement, meaning that it references one or more columns from the containing statement, or it might reference nothing outside itself, in which case it is called a noncorrelated subquery.
 
A less-commonly-used but powerful variety of subquery, called the inline view, occurs in the FROM clause of a select statement. Inline views are always noncorrelated; they are evaluated first and behave like unindexed tables cached in memory for the remainder of the query.
 
2.非相關性子查詢:(Noncorrelated Subquery):
Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. Divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:
    A.Single-row, single-column subqueries
    B.Multiple-row, single-column subqueries
    C.Multiple-column subqueries
Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.
 
A. Single-Row, Single-Column Subqueries 
A subquery that returns a single row with a single column is treated like a scalar(數值) by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. The subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed.
    SELECT lname
     FROM employee
     WHERE salary > (SELECT AVG(salary)
                       FROM EMPLOYEE);
As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table.
 
非相關性子查詢的一些注意問題:
.The FROM clause may contain any type of noncorrelated subquery.
 (From子句可以包含任何類型的非相關型子查詢)
.The SELECT and ORDER BY clauses may contain scalar subqueries.
 (Select和Order by字句可以包含數值子查詢)
.The GROUP BY clause may not contain subqueries.
 (Group by字句不能包含子查詢)
.The START WITH and CONNECT BY clauses, used for querying hierarchical data, may contain    
 subqueries
 (用於級聯數據查詢的Start with和Connect by字句都可以包含子查詢)
 
B.Multiple-Row ,Single-column Subqueries
When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL may be used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any members of the set or all members of the set.(當一個子查詢返回超過一行的記錄時,顯然不可能僅僅使用比較運算符來比較,因爲一條記錄不能和一個記錄集進行比較。但是一條記錄可以和一個記錄集中的每一條記錄進行比較,兩個特定的關鍵字Any和All可以幫助我們達到目的,它們可以和比較運算符一起使用用於判斷某個值是否等於、小於、大於一個記錄集中的任何一條記錄的值)
 
    1).ALL關鍵字:
    SELECT fname, lname
    FROM employee
     WHERE dept_id = 3 AND salary >= ALL (SELECT salary
                                            FROM employee
                                           WHERE dept_id = 3);
            FNAME                LNAME
    -------------------- --------------------
            Mark                 Russell
    The subquery returns the set of salaries for department 3, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary     returned by the subquery. Thus, this query retrieves the name of the highest paid person    in department 3. While everyone except the lowest paid employee has a salary >= some    of the salaries in the departement, only the highest paid employee has a salary >= all  of the salaries in the department. If multiple employees tie for the highest salary in  the department, multiple names will be returned.
 
    2).ANY關鍵字:
    SELECT fname, lname
     FROM employee
     WHERE dept_id = 3 AND NOT salary < ANY (SELECT salary
                                               FROM employee
                                              WHERE dept_id = 3);
    There are almost always multiple ways to phrase the same query. One of the challenges  of writing SQL is striking the right balance between efficiency and readability. In this  case, I might prefer using AND salary >= ALL over AND NOT salary < ANY because the first  variation is easier to understand; however, the latter form might prove more efficient,     since each evaluation of the subquery results requires from 1 to N comparisons when using    ANY versus(對比) exactly N comparisons when using ALL.
If there are 100 people in the department, each of the 100 salaries needs to be compared    to the entire set of 100. When using ANY, the comparison can be suspended(停止) as soon    as a larger salary is identified in the set, whereas using ALL requires 100 comparisons  to ensure that there are no smaller salaries in the set.
 
    注:使用ALL和ANY的比較:
    A.可讀性:使用ALL在可讀性上可以比ANY更具可讀性。具體參見前兩例
    B.性能: 使用ALL時,子查詢中的所有記錄會被查詢並進行比較,假如子查詢的結果有N條記               錄,則使用ALL時必須比較N次。如果使用ANY,因爲只要有一條記錄滿足條件比較          就會停止,所以查詢並比較的次數從1到N次不等。如果要從大量數據的篩選少量的
記錄,則使用ANY從性能上來說比較合適
 
    ORA-01427: single-row subquery returns more than one row
    What the error message is trying to convey is that a multiple-row subquery has been     identified where only a single-row subquery is allowed. If we are not absolutely certain  that our subquery will return exactly one row, we must include ANY or ALL to ensure our  code doesn't fail in the future.(如果查詢要求的是單行子查詢,但返回的結果卻是多行的, 那麼就會出現上述的錯誤,假如我們不確定子查詢一定會返回單條記錄,我們必須使用ANY或    ALL來確保我們的代碼不會出現上述的錯誤)
 
    3).IN關鍵字:
    Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE    if a match is found in the set returned by the subquery
 
    Finding members of one set that do not exist in another set is referred to as an anti-join.     As the name implies, an anti-join is the opposite of a join; rows from table A are returned    if the specified data is not found in table B.
 
C. Multiple-Column Subqueries
例:UPDATE monthly_orders SET (tot_orders, max_order_amt, min_order_amt, tot_amt) = 
       (SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price)
          FROM cust_order
       WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY')
         AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY')
         AND cancelled_dt IS NULL)
     WHERE month = 11 and year = 2001;
 
such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement
例:DELETE FROM line_item WHERE (order_nbr, part_nbr) IN
   (SELECT c.order_nbr, p.part_nbr
         FROM cust_order c, line_item li, part p
        WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL
          AND c.order_nbr = li.order_nbr
          AND li.part_nbr = p.part_nbr
          AND p.status = 'DISCONTINUED');
Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table.
 
3.相關性查詢(Correlated subquery):
A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike noncorrelated subqueries, which are executed exactly once prior to execution of the containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.(如果一個子查詢中引用了包含查詢語句的一個或多個字段,則該子查詢就稱爲相關性子查詢。和非相關性子查詢不同的是-非相關性子查詢在包含查詢之前執行,而且只執行一次。相關性查詢對包含查詢中的中間結果集的每一條記錄都執行一次子查詢)
 
例:SELECT p.part_nbr, p.name
   FROM supplier s, part p
    WHERE s.name = 'Acme Industries'
     AND s.supplier_id = p.supplier_id
     AND 10 <= (SELECT COUNT(*) FROM cust_order co, line_item li
                 WHERE li.part_nbr = p.part_nbr
                   AND li.order_nbr = co.order_nbr
                   AND co.order_dt >= TO_DATE('01-DEC-2001','DD-MON-YYYY'));
 
The reference to p.part_nbr is what makes the subquery correlated; values for p.part_nbr must be supplied by the containing query before the subquery can execute. If there are 10,000 parts in the part table, but only 100 are supplied by Acme Industries, the subquery will be executed once for each of the 100 rows in the intermediate result set created by joining the part and supplier tables.(因爲子查詢中引用了包含查詢的:p.part_nbr字段,所以該子查詢是相關性查詢,p.part_nbr字段的值必須在子查詢執行之前由包含查詢給出,如果表part中有1000種零件,但只有100種零件由Acme Industries提供,子查詢會在由part表和supplier表聯合組成的臨時表中的100符合條件的記錄上執行100次查詢,而不是在包含查詢執行前一次執行完成)
 
Correlated subqueries are often used to test whether relationships exist without regard to cardinality. The EXISTS operator is used for these types of queries (相關性查詢通常用於測試某種關係是否存在,而不管返回記錄集的大小,Exists運算符就是用於這種類型的查詢)
    SELECT p.part_nbr, p.name, p.unit_cost
     FROM part p
     WHERE EXISTS (SELECT 1 FROM line_item li, cust_order co
                   WHERE li.part_nbr = p.part_nbr
                      AND li.order_nbr = co.order_nbr
                      AND co.ship_dt >= TO_DATE('01-JAN-2002','DD-MON-YYYY'))
As long as the subquery returns one or more rows, the EXISTS condition is satisfied without regard for how many rows were actually returned by the subquery. Since the EXISTS operator returns TRUE or FALSE depending on the number of rows returned by the subquery, the actual columns returned by the subquery are irrelevant. The SELECT clause requires at least one column, however, so it is common practice to use either the literal "1" or the wildcard " * ".(因爲Exists的返回值是True或False,所以Exists不會關注返回的記錄到底有多少條,我們可以使用1或通配符*來獲取返回記錄)
 
4.內嵌視圖:(Inline view)
the FROM clause contains a list of data sets. In this light, it is easy to see how the FROM clause can contain tables (permanent data sets), views (virtual data sets), and SELECT statements (temporary data sets). A SELECT statement in the FROM clause of a containing SELECT statement is referred to as an inline view。Since it is a subquery that executes prior to the containing query, a more palatable name might have been a "pre-query."
 
Because the result set from an inline view is referenced by other elements of the containing query, we must give our inline view a name and provide aliases for all ambiguous columns. Similar to other types of subqueries, inline views may join multiple tables, call built-in and user-defined functions, specify optimizer hints, and include GROUP BY, HAVING, and CONNECT BY clauses. Unlike other types of subqueries, an inline view may also contain an ORDER BY clause(因爲對於包含SQL來說,Inline view的結果集引用了其它的元素,所以我們必須給Inline view一個名稱,同時也必須給它裏面的字段賦予一個別名。和其它類型的子查詢相同,Inline view可以和多個表連接,調用內置、自定義函數,包含Group by,Having,Connect by字句;不同與其它的子查詢,Inline view同樣也可以包含Order by字句。
 
Inline views are particularly useful when we need to combine data at different levels of aggregation(Inline view特別適用於當我們需要對多個聚集級別不同的數據集進行連接時,例如一個數據集的數據是來自於全體數據,另一個數據集的數據來自於統計後的結果)
 
When considering using an inline view, ask the following questions:
1).What value does the inline view add to the readability and, more importantly, the performance of the containing query?
2).How large will the result set generated by the inline view be?
3).How often, if ever, will I have need of this particular data set?
 
In general, using an inline view should enhance the readability and performance of the query, and it should generate a manageable data set that is of no value to other statements or sessions; otherwise, we may want to consider building a permanent or temporary table so that we can share the data between sessions and build additional indexes as needed
 
5.TOP N查詢:
Certain queries that are easily described in English have traditionally been difficult to formulate in SQL. One common example is the "Find the top five salespeople" query. The complexity stems from the fact that data from a table must first be aggregated, and then the aggregated values must be sorted and compared to one another in order to identify the top or bottom performers.

  

需求:找出銷售數量排在前5位的銷售員和對應的銷售額,並統計其對應的分紅(銷售額的1%)
 
準備工作:按銷售員的ID統計所有銷售人員的銷售記錄:
SELECT e.lname employee, SUM(co.sale_price) total_sales
FROM cust_order co, employee e
WHERE co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
   AND co.order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
   AND co.ship_dt IS NOT NULL AND co.cancelled_dt IS NULL
   AND co.sales_emp_id = e.emp_id
GROUP BY e.lname
ORDER BY 2 DESC;
 
結果: EMPLOYEE             TOTAL_SALES
--------------------    -----------
Blake                     1927580
Houseman           1814327
Russell                 1784596
Boorman              1768813
Isaacs                   1761814
McGowan             1761814
Anderson              1757883
Evans                    1737093
Fletcher                 1735575
Dunn                     1723305
注意:結果中有兩個銷售員的銷售總額是相同的。所以最後結果應該包含6條記錄,而不是5條記錄
 
方案一:
SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,
    ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM (SELECT all_emp_orders.sales_emp_id emp_id, all_emp_orders.tot_sales tot_sales
FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales
                  FROM cust_order
                  WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
                  AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
                  AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
                  GROUP BY sales_emp_id
                  ORDER BY 2 DESC
                ) all_emp_orders            --按銷售員的ID分類統計銷售額
         WHERE ROWNUM <= 5            
        ) top5_emp_orders, employee e       --篩選前5位的銷售記錄
WHERE top5_emp_orders.emp_id = e.emp_id;    --計算其紅利
 
結果:
EMPLOYEE             TOTAL_SALES      BONUS
-------------------- ----------- ----------
Blake                    1927580      19276
Houseman          1814327      18143
Russell                1784596      17846
Boorman             1768813      17688
McGowan            1761814      17618
可以看到,這個SQL在執行的時候漏掉了:Isaacs 1761814 這條記錄,因爲這個SQL語句使用了ROWNUM進行篩選,所以會把這條記錄給漏掉了
 
方案二:
This will require two steps: find the fifth highest sales total last year, and then find all salespeople whose total sales meet or exceed that figure
這個方案需要兩個步驟:
1).按業務員的ID分組統計去年的銷售總額
2).找出銷售總額排在前5位的數字
3).找出售額等於這5個數字中任意一個的所有記錄
4).計算其紅利
 
SELECT e.lname employee,
       top5_emp_orders.tot_sales total_sales,
       ROUND(top5_emp_orders.tot_sales * 0.01) bonus
 FROM employee e,
       (SELECT sales_emp_id, SUM(sale_price) tot_sales        
          FROM cust_order
         WHERE order_dt >= TO_DATE('01-JAN-2001', 'DD-MON-YYYY')
           AND order_dt < TO_DATE('01-JAN-2002', 'DD-MON-YYYY')
           AND ship_dt IS NOT NULL
           AND cancelled_dt IS NULL
         GROUP BY sales_emp_id
        --篩選所有銷售額等於前5位銷售額的記錄
 HAVING SUM(sale_price) IN (SELECT all_emp_orders.tot_sales
                                    FROM (SELECT SUM(sale_price) tot_sales
                                            FROM cust_order
                                           WHERE order_dt >=
                                                 TO_DATE('01-JAN-2001',
                                                         'DD-MON-YYYY')
                                             AND order_dt <
                                                 TO_DATE('01-JAN-2002',
                                                         'DD-MON-YYYY')
                                             AND ship_dt IS NOT NULL
                                             AND cancelled_dt IS NULL
                                           GROUP BY sales_emp_id
                                           ORDER BY 1 DESC) all_emp_orders
                                   WHERE ROWNUM <= 5)) top5_emp_orders
 WHERE top5_emp_orders.sales_emp_id = e.emp_id
 ORDER BY 2 DESC;
 
結果:
EMPLOYEE             TOTAL_SALES      BONUS
-------------------- ----------- ----------
Blake                    1927580      19276
Houseman          1814327      18143
Russell                1784596      17846
Boorman              1768813      17688
McGowan             1761814      17618
Isaacs                  1761814      17618
可以看到,現在查詢的結果就包含了所有正確的記錄了,但是存在以下幾個缺陷:
1).統計過程執行了兩次
2).可讀性很差
 
方案三:
the RANK function may be used to assign a ranking to each element of a set. The RANK function understands that there may be ties in the set of values being ranked and leaves gaps in the ranking to compensate. The following query illustrates how rankings would be assigned to the entire set of salespeople; notice how the RANK function leaves a gap between the fifth and seventh rankings to compensate for the fact that two rows share the fifth spot in the ranking(如果在要排序的隊列中出現了相同的排列項,RANK函數會識別出來並且自動地調整排列項的序號,排列的結果序號可能不會是連續的。例如兩個排列項的序號爲5,那麼下一個的序號就必須是7,而不是6了)
 
準備工作:RANK函數的用法:
SELECT sales_emp_id, SUM(sale_price) tot_sales,
 RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank
FROM cust_order
WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
 AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
 AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
GROUP BY sales_emp_id;
 
結果:
SALES_EMP_ID TOT_SALES SALES_RANK
------------ ---------- ----------
          11    1927580          1
          24    1814327          2
          34    1784596          3
          18    1768813          4
          25    1761814          5
          26    1761814          5
          30    1757883          7
          21    1737093          8
          19    1735575          9
可以看到,除了結果正確之外,序號也是正確的。RANK函數會自動在5和7之間留出空隙。對於這個查詢,TOP 5和TOP 6的結果都是相同的
 
最終解決方案:
SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,
      ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM (SELECT all_emp_orders.sales_emp_id emp_id,
             all_emp_orders.tot_sales tot_sales
         FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales,
      RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank
                  FROM cust_order
                 WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
                   AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
                   AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
                 GROUP BY sales_emp_id
                ) all_emp_orders  --使用RANK函數分組統計銷售記錄
        WHERE all_emp_orders.sales_rank <= 5
       ) top5_emp_orders, employee e --取出排名前5位的記錄(按RANK的值取)
WHERE top5_emp_orders.emp_id = e.emp_id --計算所有銷售前5名人員的紅利
ORDER BY 2 DESC;
發佈了64 篇原創文章 · 獲贊 5 · 訪問量 40萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章