1.子查詢的概念:
A subquery is a SELECT statement that is nested within another SQL statement. For the purpose of this discussion, we will call the SQL statement that contains a subquery the containing statement.
Subqueries are executed prior to execution of the containing SQL statementand. the result set generated by the subquery is discarded(丟棄) after the containing SQL statement has finished execution. Thus, a subquery can be thought of as a temporary table with statement scope
A subquery may either be correlated with its containing SQL statement, meaning that it references one or more columns from the containing statement, or it might reference nothing outside itself, in which case it is called a noncorrelated subquery.
A less-commonly-used but powerful variety of subquery, called the inline view, occurs in the FROM clause of a select statement. Inline views are always noncorrelated; they are evaluated first and behave like unindexed tables cached in memory for the remainder of the query.
2.非相關性子查詢:(Noncorrelated Subquery):
Noncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. Divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:
A.Single-row, single-column subqueries
B.Multiple-row, single-column subqueries
C.Multiple-column subqueries
Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery.
A. Single-Row, Single-Column Subqueries:
A subquery that returns a single row with a single column is treated like a scalar(數值) by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. The subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed.
SELECT lname
FROM employee
WHERE salary > (SELECT AVG(salary)
FROM EMPLOYEE);
As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table.
非相關性子查詢的一些注意問題:
.The FROM clause may contain any type of noncorrelated subquery.
(From子句可以包含任何類型的非相關型子查詢)
.The SELECT and ORDER BY clauses may contain scalar subqueries.
(Select和Order by字句可以包含數值子查詢)
.The GROUP BY clause may not contain subqueries.
(Group by字句不能包含子查詢)
.The START WITH and CONNECT BY clauses, used for querying hierarchical data, may contain
subqueries
(用於級聯數據查詢的Start with和Connect by字句都可以包含子查詢)
B.Multiple-Row ,Single-column Subqueries:
When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL may be used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any members of the set or all members of the set.(當一個子查詢返回超過一行的記錄時,顯然不可能僅僅使用比較運算符來比較,因爲一條記錄不能和一個記錄集進行比較。但是一條記錄可以和一個記錄集中的每一條記錄進行比較,兩個特定的關鍵字Any和All可以幫助我們達到目的,它們可以和比較運算符一起使用用於判斷某個值是否等於、小於、大於一個記錄集中的任何一條記錄的值)
1).ALL關鍵字:
SELECT fname, lname
FROM employee
WHERE dept_id = 3 AND salary >= ALL (SELECT salary
FROM employee
WHERE dept_id = 3);
FNAME LNAME
-------------------- --------------------
Mark Russell
The subquery returns the set of salaries for department 3, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary returned by the subquery. Thus, this query retrieves the name of the highest paid person in department 3. While everyone except the lowest paid employee has a salary >= some of the salaries in the departement, only the highest paid employee has a salary >= all of the salaries in the department. If multiple employees tie for the highest salary in the department, multiple names will be returned.
2).ANY關鍵字:
SELECT fname, lname
FROM employee
WHERE dept_id = 3 AND NOT salary < ANY (SELECT salary
FROM employee
WHERE dept_id = 3);
There are almost always multiple ways to phrase the same query. One of the challenges of writing SQL is striking the right balance between efficiency and readability. In this case, I might prefer using AND salary >= ALL over AND NOT salary < ANY because the first variation is easier to understand; however, the latter form might prove more efficient, since each evaluation of the subquery results requires from 1 to N comparisons when using ANY versus(對比) exactly N comparisons when using ALL.
If there are 100 people in the department, each of the 100 salaries needs to be compared to the entire set of 100. When using ANY, the comparison can be suspended(停止) as soon as a larger salary is identified in the set, whereas using ALL requires 100 comparisons to ensure that there are no smaller salaries in the set.
注:使用ALL和ANY的比較:
A.可讀性:使用ALL在可讀性上可以比ANY更具可讀性。具體參見前兩例
B.性能: 使用ALL時,子查詢中的所有記錄會被查詢並進行比較,假如子查詢的結果有N條記 錄,則使用ALL時必須比較N次。如果使用ANY,因爲只要有一條記錄滿足條件比較 就會停止,所以查詢並比較的次數從1到N次不等。如果要從大量數據的篩選少量的
記錄,則使用ANY從性能上來說比較合適
ORA-01427: single-row subquery returns more than one row
What the error message is trying to convey is that a multiple-row subquery has been identified where only a single-row subquery is allowed. If we are not absolutely certain that our subquery will return exactly one row, we must include ANY or ALL to ensure our code doesn't fail in the future.(如果查詢要求的是單行子查詢,但返回的結果卻是多行的, 那麼就會出現上述的錯誤,假如我們不確定子查詢一定會返回單條記錄,我們必須使用ANY或 ALL來確保我們的代碼不會出現上述的錯誤)
3).IN關鍵字:
Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE if a match is found in the set returned by the subquery
Finding members of one set that do not exist in another set is referred to as an anti-join. As the name implies, an anti-join is the opposite of a join; rows from table A are returned if the specified data is not found in table B.
C. Multiple-Column Subqueries:
例:UPDATE monthly_orders SET (tot_orders, max_order_amt, min_order_amt, tot_amt) =
(SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price)
FROM cust_order
WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY')
AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY')
AND cancelled_dt IS NULL)
WHERE month = 11 and year = 2001;
such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement
例:DELETE FROM line_item WHERE (order_nbr, part_nbr) IN
(SELECT c.order_nbr, p.part_nbr
FROM cust_order c, line_item li, part p
WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL
AND c.order_nbr = li.order_nbr
AND li.part_nbr = p.part_nbr
AND p.status = 'DISCONTINUED');
Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table.
3.相關性查詢(Correlated subquery):
A subquery that references one or more columns from its containing SQL statement is called a correlated subquery. Unlike noncorrelated subqueries, which are executed exactly once prior to execution of the containing statement, a correlated subquery is executed once for each candidate row in the intermediate result set of the containing query.(如果一個子查詢中引用了包含查詢語句的一個或多個字段,則該子查詢就稱爲相關性子查詢。和非相關性子查詢不同的是-非相關性子查詢在包含查詢之前執行,而且只執行一次。相關性查詢對包含查詢中的中間結果集的每一條記錄都執行一次子查詢)
例:SELECT p.part_nbr, p.name
FROM supplier s, part p
WHERE s.name = 'Acme Industries'
AND s.supplier_id = p.supplier_id
AND 10 <= (SELECT COUNT(*) FROM cust_order co, line_item li
WHERE li.part_nbr = p.part_nbr
AND li.order_nbr = co.order_nbr
AND co.order_dt >= TO_DATE('01-DEC-2001','DD-MON-YYYY'));
The reference to p.part_nbr is what makes the subquery correlated; values for p.part_nbr must be supplied by the containing query before the subquery can execute. If there are 10,000 parts in the part table, but only 100 are supplied by Acme Industries, the subquery will be executed once for each of the 100 rows in the intermediate result set created by joining the part and supplier tables.(因爲子查詢中引用了包含查詢的:p.part_nbr字段,所以該子查詢是相關性查詢,p.part_nbr字段的值必須在子查詢執行之前由包含查詢給出,如果表part中有1000種零件,但只有100種零件由Acme Industries提供,子查詢會在由part表和supplier表聯合組成的臨時表中的100符合條件的記錄上執行100次查詢,而不是在包含查詢執行前一次執行完成)
Correlated subqueries are often used to test whether relationships exist without regard to cardinality. The EXISTS operator is used for these types of queries (相關性查詢通常用於測試某種關係是否存在,而不管返回記錄集的大小,Exists運算符就是用於這種類型的查詢)
SELECT p.part_nbr, p.name, p.unit_cost
FROM part p
WHERE EXISTS (SELECT 1 FROM line_item li, cust_order co
WHERE li.part_nbr = p.part_nbr
AND li.order_nbr = co.order_nbr
AND co.ship_dt >= TO_DATE('01-JAN-2002','DD-MON-YYYY'))
As long as the subquery returns one or more rows, the EXISTS condition is satisfied without regard for how many rows were actually returned by the subquery. Since the EXISTS operator returns TRUE or FALSE depending on the number of rows returned by the subquery, the actual columns returned by the subquery are irrelevant. The SELECT clause requires at least one column, however, so it is common practice to use either the literal "1" or the wildcard " * ".(因爲Exists的返回值是True或False,所以Exists不會關注返回的記錄到底有多少條,我們可以使用1或通配符*來獲取返回記錄)
4.內嵌視圖:(Inline view)
the FROM clause contains a list of data sets. In this light, it is easy to see how the FROM clause can contain tables (permanent data sets), views (virtual data sets), and SELECT statements (temporary data sets). A SELECT statement in the FROM clause of a containing SELECT statement is referred to as an inline view。Since it is a subquery that executes prior to the containing query, a more palatable name might have been a "pre-query."
Because the result set from an inline view is referenced by other elements of the containing query, we must give our inline view a name and provide aliases for all ambiguous columns. Similar to other types of subqueries, inline views may join multiple tables, call built-in and user-defined functions, specify optimizer hints, and include GROUP BY, HAVING, and CONNECT BY clauses. Unlike other types of subqueries, an inline view may also contain an ORDER BY clause(因爲對於包含SQL來說,Inline view的結果集引用了其它的元素,所以我們必須給Inline view一個名稱,同時也必須給它裏面的字段賦予一個別名。和其它類型的子查詢相同,Inline view可以和多個表連接,調用內置、自定義函數,包含Group by,Having,Connect by字句;不同與其它的子查詢,Inline view同樣也可以包含Order by字句。)
Inline views are particularly useful when we need to combine data at different levels of aggregation(Inline view特別適用於當我們需要對多個聚集級別不同的數據集進行連接時,例如一個數據集的數據是來自於全體數據,另一個數據集的數據來自於統計後的結果)
When considering using an inline view, ask the following questions:
1).What value does the inline view add to the readability and, more importantly, the performance of the containing query?
2).How large will the result set generated by the inline view be?
3).How often, if ever, will I have need of this particular data set?
In general, using an inline view should enhance the readability and performance of the query, and it should generate a manageable data set that is of no value to other statements or sessions; otherwise, we may want to consider building a permanent or temporary table so that we can share the data between sessions and build additional indexes as needed
5.TOP N查詢:
Certain queries that are easily described in English have traditionally been difficult to formulate in SQL. One common example is the "Find the top five salespeople" query. The complexity stems from the fact that data from a table must first be aggregated, and then the aggregated values must be sorted and compared to one another in order to identify the top or bottom performers.
需求:找出銷售數量排在前5位的銷售員和對應的銷售額,並統計其對應的分紅(銷售額的1%)
準備工作:按銷售員的ID統計所有銷售人員的銷售記錄:
SELECT e.lname employee, SUM(co.sale_price) total_sales
FROM cust_order co, employee e
WHERE co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
AND co.order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
AND co.ship_dt IS NOT NULL AND co.cancelled_dt IS NULL
AND co.sales_emp_id = e.emp_id
GROUP BY e.lname
ORDER BY 2 DESC;
結果: EMPLOYEE TOTAL_SALES
-------------------- -----------
Blake 1927580
Houseman 1814327
Russell 1784596
Boorman 1768813
Isaacs 1761814
McGowan 1761814
Anderson 1757883
Evans 1737093
Fletcher 1735575
Dunn 1723305
注意:結果中有兩個銷售員的銷售總額是相同的。所以最後結果應該包含6條記錄,而不是5條記錄
方案一:
SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,
ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM (SELECT all_emp_orders.sales_emp_id emp_id, all_emp_orders.tot_sales tot_sales
FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales
FROM cust_order
WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
GROUP BY sales_emp_id
ORDER BY 2 DESC
) all_emp_orders --按銷售員的ID分類統計銷售額
WHERE ROWNUM <= 5
) top5_emp_orders, employee e --篩選前5位的銷售記錄
WHERE top5_emp_orders.emp_id = e.emp_id; --計算其紅利
結果:
EMPLOYEE TOTAL_SALES BONUS
-------------------- ----------- ----------
Blake 1927580 19276
Houseman 1814327 18143
Russell 1784596 17846
Boorman 1768813 17688
McGowan 1761814 17618
可以看到,這個SQL在執行的時候漏掉了:Isaacs 1761814 這條記錄,因爲這個SQL語句使用了ROWNUM進行篩選,所以會把這條記錄給漏掉了
方案二:
This will require two steps: find the fifth highest sales total last year, and then find all salespeople whose total sales meet or exceed that figure
這個方案需要兩個步驟:
1).按業務員的ID分組統計去年的銷售總額
2).找出銷售總額排在前5位的數字
3).找出售額等於這5個數字中任意一個的所有記錄
4).計算其紅利
SELECT e.lname employee,
top5_emp_orders.tot_sales total_sales,
ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM employee e,
(SELECT sales_emp_id, SUM(sale_price) tot_sales
FROM cust_order
WHERE order_dt >= TO_DATE('01-JAN-2001', 'DD-MON-YYYY')
AND order_dt < TO_DATE('01-JAN-2002', 'DD-MON-YYYY')
AND ship_dt IS NOT NULL
AND cancelled_dt IS NULL
GROUP BY sales_emp_id
--篩選所有銷售額等於前5位銷售額的記錄
HAVING SUM(sale_price) IN (SELECT all_emp_orders.tot_sales
FROM (SELECT SUM(sale_price) tot_sales
FROM cust_order
WHERE order_dt >=
TO_DATE('01-JAN-2001',
'DD-MON-YYYY')
AND order_dt <
TO_DATE('01-JAN-2002',
'DD-MON-YYYY')
AND ship_dt IS NOT NULL
AND cancelled_dt IS NULL
GROUP BY sales_emp_id
ORDER BY 1 DESC) all_emp_orders
WHERE ROWNUM <= 5)) top5_emp_orders
WHERE top5_emp_orders.sales_emp_id = e.emp_id
ORDER BY 2 DESC;
結果:
EMPLOYEE TOTAL_SALES BONUS
-------------------- ----------- ----------
Blake 1927580 19276
Houseman 1814327 18143
Russell 1784596 17846
Boorman 1768813 17688
McGowan 1761814 17618
Isaacs 1761814 17618
可以看到,現在查詢的結果就包含了所有正確的記錄了,但是存在以下幾個缺陷:
1).統計過程執行了兩次
2).可讀性很差
方案三:
the RANK function may be used to assign a ranking to each element of a set. The RANK function understands that there may be ties in the set of values being ranked and leaves gaps in the ranking to compensate. The following query illustrates how rankings would be assigned to the entire set of salespeople; notice how the RANK function leaves a gap between the fifth and seventh rankings to compensate for the fact that two rows share the fifth spot in the ranking(如果在要排序的隊列中出現了相同的排列項,RANK函數會識別出來並且自動地調整排列項的序號,排列的結果序號可能不會是連續的。例如兩個排列項的序號爲5,那麼下一個的序號就必須是7,而不是6了)
準備工作:RANK函數的用法:
SELECT sales_emp_id, SUM(sale_price) tot_sales,
RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank
FROM cust_order
WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
GROUP BY sales_emp_id;
結果:
SALES_EMP_ID TOT_SALES SALES_RANK
------------ ---------- ----------
11 1927580 1
24 1814327 2
34 1784596 3
18 1768813 4
25 1761814 5
26 1761814 5
30 1757883 7
21 1737093 8
19 1735575 9
可以看到,除了結果正確之外,序號也是正確的。RANK函數會自動在5和7之間留出空隙。對於這個查詢,TOP 5和TOP 6的結果都是相同的
最終解決方案:
SELECT e.lname employee, top5_emp_orders.tot_sales total_sales,
ROUND(top5_emp_orders.tot_sales * 0.01) bonus
FROM (SELECT all_emp_orders.sales_emp_id emp_id,
all_emp_orders.tot_sales tot_sales
FROM (SELECT sales_emp_id, SUM(sale_price) tot_sales,
RANK( ) OVER (ORDER BY SUM(sale_price) DESC) sales_rank
FROM cust_order
WHERE order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
AND order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY')
AND ship_dt IS NOT NULL AND cancelled_dt IS NULL
GROUP BY sales_emp_id
) all_emp_orders --使用RANK函數分組統計銷售記錄
WHERE all_emp_orders.sales_rank <= 5
) top5_emp_orders, employee e --取出排名前5位的記錄(按RANK的值取)
WHERE top5_emp_orders.emp_id = e.emp_id --計算所有銷售前5名人員的紅利
ORDER BY 2 DESC;