分類:
/*
聲明:本人不是專門學數據庫的,也不是專門的翻譯,只是因爲碰到一個問題(SQL CookBook中)找了一下,發現一個英文網站的解釋很清晰,特此翻譯過來,mark.不喜勿磚,謝謝!
原文鏈接:
ROLLUP, CUBE, GROUPING Functions and GROUPING SETS
*/
環境:
DROP TABLE dimension_tab; CREATE TABLE dimension_tab ( fact_1_id NUMBER NOT NULL, fact_2_id NUMBER NOT NULL, fact_3_id NUMBER NOT NULL, fact_4_id NUMBER NOT NULL, sales_value NUMBER(10,2) NOT NULL ); INSERT INTO dimension_tab SELECT TRUNC(DBMS_RANDOM.value(low => 1, high => 3)) AS fact_1_id, TRUNC(DBMS_RANDOM.value(low => 1, high => 6)) AS fact_2_id, TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_3_id, TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_4_id, ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2) AS sales_value FROM dual CONNECT BY level <= 1000; COMMIT;
1.Group by 基本用法
爲了方便理解,先用一個聚合函數,求和.(未使用group by)
SELECT SUM(sales_value) AS sales_value
FROM dimension_tab;
SALES_VALUE
-----------
50528.39
1 row selected.
SQL>
將想要分組的列放在group by 之後.結果的行數是我們目標列中包含不同值的個數.
SELECT fact_1_id, COUNT(*) AS num_rows, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY fact_1_id ORDER BY fact_1_id; FACT_1_ID NUM_ROWS SALES_VALUE ---------- ---------- ----------- 1 478 24291.35 2 522 26237.04 2 rows selected. SQL>
如果包含兩列,將會產生聚合結果(解釋爲笛卡爾積?),返回10列(2*5)
SELECT fact_1_id,
fact_2_id,
COUNT(*) AS num_rows,
SUM(sales_value) AS sales_value
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id
ORDER BY fact_1_id, fact_2_id;
FACT_1_ID FACT_2_ID NUM_ROWS SALES_VALUE
---------- ---------- ---------- -----------
1 1 83 4363.55
1 2 96 4794.76
1 3 93 4718.25
1 4 105 5387.45
1 5 101 5027.34
2 1 109 5652.84
2 2 96 4583.02
2 3 110 5555.77
2 4 113 5936.67
2 5 94 4508.74
10 rows selected.
SQL>
2.ROLLUP
出了正常的分組結果外,使用rollup還會,返回部分列分組,規則:從右向左,知道一個完整的分組.如果,rollup中的列數爲n,那麼將會有n+1個分組層次.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY ROLLUP (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id; FACT_1_ID FACT_2_ID SALES_VALUE ---------- ---------- ----------- 1 1 4363.55 1 2 4794.76 1 3 4718.25 1 4 5387.45 1 5 5027.34 1 24291.35 2 1 5652.84 2 2 4583.02 2 3 5555.77 2 4 5936.67 2 5 4508.74 2 26237.04 50528.39 13 rows selected. SQL>以下語句結果:Click Here.當行中包含null的時候,這種並不是一種很好的方法,稍後會討論.
SELECT fact_1_id,
fact_2_id,
fact_3_id,
SUM(sales_value) AS sales_value
FROM dimension_tab
GROUP BY ROLLUP (fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;
也可以在group by的時候使用rolluo做部分分組.結果:Click Here
SELECT fact_1_id,
fact_2_id,
fact_3_id,
SUM(sales_value) AS sales_value
FROM dimension_tab
GROUP BY fact_1_id, ROLLUP (fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;
3.CUBE
Cube是rollup的拓展,Cube將會返回所有分組的一個統計,如果,Cube中有n組,那麼將返回2的n次方組合結果.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id; FACT_1_ID FACT_2_ID SALES_VALUE ---------- ---------- ----------- 1 1 4363.55 1 2 4794.76 1 3 4718.25 1 4 5387.45 1 5 5027.34 1 24291.35 2 1 5652.84 2 2 4583.02 2 3 5555.77 2 4 5936.67 2 5 4508.74 2 26237.04 1 10016.39 2 9377.78 3 10274.02 4 11324.12 5 9536.08 50528.39 18 rows selected. SQL>如果cube中有三組(或n組)時,結果(或需計算):Click Here
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id, fact_3_id) ORDER BY fact_1_id, fact_2_id, fact_3_id;
也有可能只是一部分進行cube分組,結果:Click Here
SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY fact_1_id, CUBE (fact_2_id, fact_3_id) ORDER BY fact_1_id, fact_2_id, fact_3_id;
4.Grouping 函數
上面曾經說過,如果某一列存在null值,而我們的rollup或者cube時,也將會出現列爲null的時候,該怎樣區分null到底是由數據null還是由於rollup活cube自己生成的null呢.這裏使用grouping來解決這個問題.如果是數據本身的值(null或其他值),grouping(列名)將會在這一行返回0,如果這一行是由於rollup或者cube產生的話,將會返回1.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING(fact_1_id) AS f1g, GROUPING(fact_2_id) AS f2g FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id; FACT_1_ID FACT_2_ID SALES_VALUE F1G F2G ---------- ---------- ----------- ---------- ---------- 1 1 4363.55 0 0 1 2 4794.76 0 0 1 3 4718.25 0 0 1 4 5387.45 0 0 1 5 5027.34 0 0 1 24291.35 0 1 2 1 5652.84 0 0 2 2 4583.02 0 0 2 3 5555.77 0 0 2 4 5936.67 0 0 2 5 4508.74 0 0 2 26237.04 0 1 1 10016.39 1 0 2 9377.78 1 0 3 10274.02 1 0 4 11324.12 1 0 5 9536.08 1 0 50528.39 1 1 18 rows selected. SQL>
可以看出:(都是由rollup或cube所產生的)
- F1G=0,F2G=0 : 正常的group by結果
- F1G=0,F2G=1 : 一行
FACT_1_ID
列的統計 - F1G=1,F2G=0 : 一行
FACT_2_ID
列的統計 - F1G=1,F2G=1 : 由
FACT_1_ID
和FACT_2_ID
所產生的一個統計
grouping函數可以用來篩選排序結果.
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
GROUPING(fact_1_id) AS f1g,
GROUPING(fact_2_id) AS f2g
FROM dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
HAVING GROUPING(fact_1_id) = 1 OR GROUPING(fact_2_id) = 1
ORDER BY GROUPING(fact_1_id), GROUPING(fact_2_id);
FACT_1_ID FACT_2_ID SALES_VALUE F1G F2G
---------- ---------- ----------- ---------- ----------
1 24291.35 0 1
2 26237.04 0 1
4 11324.12 1 0
3 10274.02 1 0
2 9377.78 1 0
1 10016.39 1 0
5 9536.08 1 0
50528.39 1 1
8 rows selected.
SQL>
5.GROUPING_ID函數
grouping_id函數也是提供了一個可以確定是否爲統計的行的辨別方式.grouping_id返回的是group by分組的級別(或者說層次)
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id
FROM dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;
FACT_1_ID FACT_2_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
1 1 4363.55 0
1 2 4794.76 0
1 3 4718.25 0
1 4 5387.45 0
1 5 5027.34 0
1 24291.35 1
2 1 5652.84 0
2 2 4583.02 0
2 3 5555.77 0
2 4 5936.67 0
2 5 4508.74 0
2 26237.04 1
1 10016.39 2
2 9377.78 2
3 10274.02 2
4 11324.12 2
5 9536.08 2
50528.39 3
18 rows selected.
SQL>
6.GROUP_ID
重複的統計的結果進行劃分.第一次爲0,依次出現相同結果,id 1開始遞增.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id, GROUP_ID() AS group_id FROM dimension_tab GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id)) ORDER BY fact_1_id, fact_2_id; FACT_1_ID FACT_2_ID SALES_VALUE GROUPING_ID GROUP_ID ---------- ---------- ----------- ----------- ---------- 1 1 4363.55 0 0 1 2 4794.76 0 0 1 3 4718.25 0 0 1 4 5387.45 0 0 1 5 5027.34 0 0 1 24291.35 1 1 1 24291.35 1 0 2 1 5652.84 0 0 2 2 4583.02 0 0 2 3 5555.77 0 0 2 4 5936.67 0 0 2 5 4508.74 0 0 2 26237.04 1 1 2 26237.04 1 0 1 10016.39 2 0 2 9377.78 2 0 3 10274.02 2 0 4 11324.12 2 0 5 9536.08 2 0 50528.39 3 0 20 rows selected. SQL>
也可對結果進行篩選.
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
GROUP_ID() AS group_id
FROM dimension_tab
GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
HAVING GROUP_ID() = 0
ORDER BY fact_1_id, fact_2_id;
FACT_1_ID FACT_2_ID SALES_VALUE GROUPING_ID GROUP_ID
---------- ---------- ----------- ----------- ----------
1 1 4363.55 0 0
1 2 4794.76 0 0
1 3 4718.25 0 0
1 4 5387.45 0 0
1 5 5027.34 0 0
1 24291.35 1 0
2 1 5652.84 0 0
2 2 4583.02 0 0
2 3 5555.77 0 0
2 4 5936.67 0 0
2 5 4508.74 0 0
2 26237.04 1 0
1 10016.39 2 0
2 9377.78 2 0
3 10274.02 2 0
4 11324.12 2 0
5 9536.08 2 0
50528.39 3 0
18 rows selected.
SQL>
7.GROUPING 集合
使用cube,特別是在多列時,分組會很多.以下爲例,將會有8個組層次.結果:Click Here
SELECT fact_1_id,
fact_2_id,
fact_3_id,
SUM(sales_value) AS sales_value,
GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM dimension_tab
GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;
如果我們只需要其中一些分組結果,用grouping sets來篩選,選出"FACT_1_ID, FACT_2_ID
" and "FACT_1_ID, FACT_3_ID分組的統計
SELECT fact_1_id,
fact_2_id,
fact_3_id,
SUM(sales_value) AS sales_value,
GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM dimension_tab
GROUP BY GROUPING SETS((fact_1_id, fact_2_id), (fact_1_id, fact_3_id))
ORDER BY fact_1_id, fact_2_id, fact_3_id;
FACT_1_ID FACT_2_ID FACT_3_ID SALES_VALUE GROUPING_ID
---------- ---------- ---------- ----------- -----------
1 1 4363.55 1
1 2 4794.76 1
1 3 4718.25 1
1 4 5387.45 1
1 5 5027.34 1
1 1 2737.4 2
1 2 1854.29 2
1 3 2090.96 2
1 4 2605.17 2
1 5 2590.93 2
1 6 2506.9 2
1 7 1839.85 2
1 8 2953.04 2
1 9 2778.75 2
1 10 2334.06 2
2 1 5652.84 1
2 2 4583.02 1
2 3 5555.77 1
2 4 5936.67 1
2 5 4508.74 1
2 1 3512.69 2
2 2 2847.94 2
2 3 2972.5 2
2 4 2534.06 2
2 5 3115.99 2
2 6 2775.85 2
2 7 2208.19 2
2 8 2358.55 2
2 9 1884.11 2
2 10 2027.16 2
30 rows selected.
SQL>
8.組合列
使用rollup的列的組合(統計),
ROLLUP (a, b, c) (a, b, c) (a, b) (a) ()使用cube的列的組合(統計),
CUBE (a, b, c) (a, b, c) (a, b) (a, c) (a) (b, c) (b) (c) ()允許使用括號將列括起來.在使用rollup或cube,grouping sets時,會將其作爲一個單獨的整體,不會拆分括號裏面的列再組合(統計).
ROLLUP ((a, b), c) (a, b, c) (a, b) () Not considered: (a)
CUBE ((a, b), c) (a, b, c) (a, b) (c) () Not considered: (a, c) (a) (b, c) (b)
以下使用cube時,括號與不帶括號.結果:Click Here For Regular 和Click Here For braces
-- Regular Cube. SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id) ORDER BY fact_1_id, fact_2_id, fact_3_id; -- Cube with composite column. SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUP BY CUBE((fact_1_id, fact_2_id), fact_3_id) ORDER BY fact_1_id, fact_2_id, fact_3_id;
9.ConcatenatedGroupings(個人理解:組之間的笛卡爾積)
在使用group by時,有GROUPING SETS
, CUBE
和ROLLUP組合時,不同組會產生笛卡爾積.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id FROM dimension_tab GROUP BY GROUPING SETS(fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id; FACT_1_ID FACT_2_ID SALES_VALUE GROUPING_ID ---------- ---------- ----------- ----------- 1 24291.35 1 2 26237.04 1 1 10016.39 2 2 9377.78 2 3 10274.02 2 4 11324.12 2 5 9536.08 2 7 rows selected. SQL>
SELECT fact_3_id, fact_4_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_3_id, fact_4_id) AS grouping_id FROM dimension_tab GROUP BY GROUPING SETS(fact_3_id, fact_4_id) ORDER BY fact_3_id, fact_4_id; FACT_3_ID FACT_4_ID SALES_VALUE GROUPING_ID ---------- ---------- ----------- ----------- 1 6250.09 1 2 4702.23 1 3 5063.46 1 4 5139.23 1 5 5706.92 1 6 5282.75 1 7 4048.04 1 8 5311.59 1 9 4662.86 1 10 4361.22 1 1 4718.55 2 2 5439.1 2 3 4643.4 2 4 4515.3 2 5 5110.27 2 6 5910.78 2 7 4987.22 2 8 4846.25 2 9 5458.82 2 10 4898.7 2 20 rows selected. SQL>
SELECT fact_1_id, fact_2_id, fact_3_id, fact_4_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id, fact_4_id) AS grouping_id FROM dimension_tab GROUP BY GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id) ORDER BY fact_1_id, fact_2_id, fact_3_id, fact_4_id;
GROUPING SETS(fact_1_id, fact_2_id) (fact_1_id) (fact_2_id) GROUPING SETS(fact_3_id, fact_4_id) (fact_3_id) (fact_4_id) GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id) (fact_1_id, fact_3_id) (fact_1_id, fact_4_id) (fact_2_id, fact_3_id) (fact_2_id, fact_4_id)在兩個GROUPING SETS中將產生笛卡爾積.
GROUPING SETS(a, b), GROUPING SETS(c, d) (a, c) (a, d) (b, c) (b, d)
[個人理解]
在使用group by的時候,特別是使用統計函數時,cube rollup和grouping sets是很有用的,可以分組進行統計.在分組的同時,我們還有grouping_id(返回分組的級別層次計數)和grouping(判斷行是否由cube或rollup產生)進行一系列的操作.