Oracle group by 基本及的拓展 ROLLUP, CUBE, GROUPING 功能and GROUPING 集合

 分類:

/*

 聲明:本人不是專門學數據庫的,也不是專門的翻譯,只是因爲碰到一個問題(SQL CookBook中)找了一下,發現一個英文網站的解釋很清晰,特此翻譯過來,mark.不喜勿磚,謝謝!
原文鏈接:

ROLLUP, CUBE, GROUPING Functions and GROUPING SETS

*/

環境:
DROP TABLE dimension_tab;
CREATE TABLE dimension_tab (
  fact_1_id   NUMBER NOT NULL,
  fact_2_id   NUMBER NOT NULL,
  fact_3_id   NUMBER NOT NULL, 
  fact_4_id   NUMBER NOT NULL,
  sales_value NUMBER(10,2) NOT NULL
);

INSERT INTO dimension_tab
SELECT TRUNC(DBMS_RANDOM.value(low => 1, high => 3)) AS fact_1_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 6)) AS fact_2_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_3_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_4_id,
       ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2) AS sales_value
FROM   dual
CONNECT BY level <= 1000;
COMMIT;


1.Group by 基本用法

    爲了方便理解,先用一個聚合函數,求和.(未使用group by)
SELECT SUM(sales_value) AS sales_value
FROM   dimension_tab;

SALES_VALUE
-----------
   50528.39

1 row selected.

SQL>
  1. SELECT SUM(sales_value) AS sales_value  
  2. FROM   dimension_tab;  
  3.   
  4. SALES_VALUE  
  5. -----------  
  6.    50528.39  
  7.   
  8. 1 row selected.  
  9.   
  10. SQL>  

將想要分組的列放在group by 之後.結果的行數是我們目標列中包含不同值的個數.
SELECT fact_1_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id
ORDER BY fact_1_id;

 FACT_1_ID   NUM_ROWS SALES_VALUE
---------- ---------- -----------
         1        478    24291.35
         2        522    26237.04

2 rows selected.

SQL>

如果包含兩列,將會產生聚合結果(解釋爲笛卡爾積?),返回10列(2*5)

SELECT fact_1_id,
       fact_2_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, fact_2_id
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID   NUM_ROWS SALES_VALUE
---------- ---------- ---------- -----------
         1          1         83     4363.55
         1          2         96     4794.76
         1          3         93     4718.25
         1          4        105     5387.45
         1          5        101     5027.34
         2          1        109     5652.84
         2          2         96     4583.02
         2          3        110     5555.77
         2          4        113     5936.67
         2          5         94     4508.74

10 rows selected.

SQL>

2.ROLLUP

出了正常的分組結果外,使用rollup還會,返回部分列分組,規則:從右向左,知道一個完整的分組.如果,rollup中的列數爲n,那麼將會有n+1個分組層次.
SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY ROLLUP (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE
---------- ---------- -----------
         1          1     4363.55
         1          2     4794.76
         1          3     4718.25
         1          4     5387.45
         1          5     5027.34
         1               24291.35
         2          1     5652.84
         2          2     4583.02
         2          3     5555.77
         2          4     5936.67
         2          5     4508.74
         2               26237.04
                         50528.39

13 rows selected.

SQL>
以下語句結果:Click Here.當行中包含null的時候,這種並不是一種很好的方法,稍後會討論.

SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY ROLLUP (fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

也可以在group by的時候使用rolluo做部分分組.結果:Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, ROLLUP (fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

3.CUBE

Cube是rollup的拓展,Cube將會返回所有分組的一個統計,如果,Cube中有n組,那麼將返回2的n次方組合結果.
SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE
---------- ---------- -----------
         1          1     4363.55
         1          2     4794.76
         1          3     4718.25
         1          4     5387.45
         1          5     5027.34
         1               24291.35
         2          1     5652.84
         2          2     4583.02
         2          3     5555.77
         2          4     5936.67
         2          5     4508.74
         2               26237.04
                    1    10016.39
                    2     9377.78
                    3    10274.02
                    4    11324.12
                    5     9536.08
                         50528.39

18 rows selected.

SQL>
如果cube中有三組(或n組)時,結果(或需計算):Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

也有可能只是一部分進行cube分組,結果:Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, CUBE (fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

4.Grouping 函數

上面曾經說過,如果某一列存在null值,而我們的rollup或者cube時,也將會出現列爲null的時候,該怎樣區分null到底是由數據null還是由於rollup活cube自己生成的null呢.這裏使用grouping來解決這個問題.如果是數據本身的值(null或其他值),grouping(列名)將會在這一行返回0,如果這一行是由於rollup或者cube產生的話,將會返回1.
SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING(fact_1_id) AS f1g, 
       GROUPING(fact_2_id) AS f2g
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE        F1G        F2G
---------- ---------- ----------- ---------- ----------
         1          1     4363.55          0          0
         1          2     4794.76          0          0
         1          3     4718.25          0          0
         1          4     5387.45          0          0
         1          5     5027.34          0          0
         1               24291.35          0          1
         2          1     5652.84          0          0
         2          2     4583.02          0          0
         2          3     5555.77          0          0
         2          4     5936.67          0          0
         2          5     4508.74          0          0
         2               26237.04          0          1
                    1    10016.39          1          0
                    2     9377.78          1          0
                    3    10274.02          1          0
                    4    11324.12          1          0
                    5     9536.08          1          0
                         50528.39          1          1

18 rows selected.

SQL>

可以看出:(都是由rollup或cube所產生的)

  • F1G=0,F2G=0 : 正常的group by結果
  • F1G=0,F2G=1 : 一行 FACT_1_ID 列的統計
  • F1G=1,F2G=0 :  一行FACT_2_ID 列的統計
  • F1G=1,F2G=1 : 由FACT_1_ID 和FACT_2_ID 所產生的一個統計

grouping函數可以用來篩選排序結果.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING(fact_1_id) AS f1g, 
       GROUPING(fact_2_id) AS f2g
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
HAVING GROUPING(fact_1_id) = 1 OR GROUPING(fact_2_id) = 1
ORDER BY GROUPING(fact_1_id), GROUPING(fact_2_id);

 FACT_1_ID  FACT_2_ID SALES_VALUE        F1G        F2G
---------- ---------- ----------- ---------- ----------
         1               24291.35          0          1
         2               26237.04          0          1
                    4    11324.12          1          0
                    3    10274.02          1          0
                    2     9377.78          1          0
                    1    10016.39          1          0
                    5     9536.08          1          0
                         50528.39          1          1

8 rows selected.

SQL>

5.GROUPING_ID函數

grouping_id函數也是提供了一個可以確定是否爲統計的行的辨別方式.grouping_id返回的是group by分組的級別(或者說層次)

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
         1          1     4363.55           0
         1          2     4794.76           0
         1          3     4718.25           0
         1          4     5387.45           0
         1          5     5027.34           0
         1               24291.35           1
         2          1     5652.84           0
         2          2     4583.02           0
         2          3     5555.77           0
         2          4     5936.67           0
         2          5     4508.74           0
         2               26237.04           1
                    1    10016.39           2
                    2     9377.78           2
                    3    10274.02           2
                    4    11324.12           2
                    5     9536.08           2
                         50528.39           3

18 rows selected.

SQL>

6.GROUP_ID

重複的統計的結果進行劃分.第一次爲0,依次出現相同結果,id 1開始遞增.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
       GROUP_ID() AS group_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID   GROUP_ID
---------- ---------- ----------- ----------- ----------
         1          1     4363.55           0          0
         1          2     4794.76           0          0
         1          3     4718.25           0          0
         1          4     5387.45           0          0
         1          5     5027.34           0          0
         1               24291.35           1          1
         1               24291.35           1          0
         2          1     5652.84           0          0
         2          2     4583.02           0          0
         2          3     5555.77           0          0
         2          4     5936.67           0          0
         2          5     4508.74           0          0
         2               26237.04           1          1
         2               26237.04           1          0
                    1    10016.39           2          0
                    2     9377.78           2          0
                    3    10274.02           2          0
                    4    11324.12           2          0
                    5     9536.08           2          0
                         50528.39           3          0

20 rows selected.

SQL>

也可對結果進行篩選.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
       GROUP_ID() AS group_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
HAVING GROUP_ID() = 0
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID   GROUP_ID
---------- ---------- ----------- ----------- ----------
         1          1     4363.55           0          0
         1          2     4794.76           0          0
         1          3     4718.25           0          0
         1          4     5387.45           0          0
         1          5     5027.34           0          0
         1               24291.35           1          0
         2          1     5652.84           0          0
         2          2     4583.02           0          0
         2          3     5555.77           0          0
         2          4     5936.67           0          0
         2          5     4508.74           0          0
         2               26237.04           1          0
                    1    10016.39           2          0
                    2     9377.78           2          0
                    3    10274.02           2          0
                    4    11324.12           2          0
                    5     9536.08           2          0
                         50528.39           3          0

18 rows selected.

SQL>

7.GROUPING 集合

使用cube,特別是在多列時,分組會很多.以下爲例,將會有8個組層次.結果:Click Here

SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;
如果我們只需要其中一些分組結果,用grouping sets來篩選,選出"FACT_1_ID, FACT_2_ID" and "FACT_1_ID, FACT_3_ID分組的統計

SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS((fact_1_id, fact_2_id), (fact_1_id, fact_3_id))
ORDER BY fact_1_id, fact_2_id, fact_3_id;

 FACT_1_ID  FACT_2_ID  FACT_3_ID SALES_VALUE GROUPING_ID
---------- ---------- ---------- ----------- -----------
         1          1                4363.55           1
         1          2                4794.76           1
         1          3                4718.25           1
         1          4                5387.45           1
         1          5                5027.34           1
         1                     1      2737.4           2
         1                     2     1854.29           2
         1                     3     2090.96           2
         1                     4     2605.17           2
         1                     5     2590.93           2
         1                     6      2506.9           2
         1                     7     1839.85           2
         1                     8     2953.04           2
         1                     9     2778.75           2
         1                    10     2334.06           2
         2          1                5652.84           1
         2          2                4583.02           1
         2          3                5555.77           1
         2          4                5936.67           1
         2          5                4508.74           1
         2                     1     3512.69           2
         2                     2     2847.94           2
         2                     3      2972.5           2
         2                     4     2534.06           2
         2                     5     3115.99           2
         2                     6     2775.85           2
         2                     7     2208.19           2
         2                     8     2358.55           2
         2                     9     1884.11           2
         2                    10     2027.16           2

30 rows selected.

SQL>

8.組合列

使用rollup的列的組合(統計),

ROLLUP (a, b, c)
(a, b, c)
(a, b)
(a)
()
使用cube的列的組合(統計),

CUBE (a, b, c)
(a, b, c)
(a, b)
(a, c)
(a)
(b, c)
(b)
(c)
()
允許使用括號將列括起來.在使用rollup或cube,grouping sets時,會將其作爲一個單獨的整體,不會拆分括號裏面的列再組合(統計).

ROLLUP ((a, b), c)
(a, b, c)
(a, b)
()

Not considered:
(a)

CUBE ((a, b), c)
(a, b, c)
(a, b)
(c)
()

Not considered:
(a, c)
(a)
(b, c)
(b)

以下使用cube時,括號與不帶括號.結果:Click Here For Regular 和Click Here For braces

-- Regular Cube.
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

-- Cube with composite column.
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE((fact_1_id, fact_2_id), fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;


9.ConcatenatedGroupings(個人理解:組之間的笛卡爾積)

在使用group by時,有GROUPING SETSCUBE 和ROLLUP組合時,不同組會產生笛卡爾積.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
         1               24291.35           1
         2               26237.04           1
                    1    10016.39           2
                    2     9377.78           2
                    3    10274.02           2
                    4    11324.12           2
                    5     9536.08           2

7 rows selected.

SQL>


SELECT fact_3_id,
       fact_4_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_3_id, fact_4_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_3_id, fact_4_id)
ORDER BY fact_3_id, fact_4_id;

 FACT_3_ID  FACT_4_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
         1                6250.09           1
         2                4702.23           1
         3                5063.46           1
         4                5139.23           1
         5                5706.92           1
         6                5282.75           1
         7                4048.04           1
         8                5311.59           1
         9                4662.86           1
        10                4361.22           1
                    1     4718.55           2
                    2      5439.1           2
                    3      4643.4           2
                    4      4515.3           2
                    5     5110.27           2
                    6     5910.78           2
                    7     4987.22           2
                    8     4846.25           2
                    9     5458.82           2
                   10      4898.7           2

20 rows selected.

SQL>


如果我們將以上兩組,在group by中一起.結果:Click Here

SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       fact_4_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id, fact_4_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id, fact_4_id;


將會產生一下組集合的分組,

GROUPING SETS(fact_1_id, fact_2_id) 
(fact_1_id)
(fact_2_id)

GROUPING SETS(fact_3_id, fact_4_id) 
(fact_3_id)
(fact_4_id)

GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id) 
(fact_1_id, fact_3_id)
(fact_1_id, fact_4_id)
(fact_2_id, fact_3_id)
(fact_2_id, fact_4_id)
在兩個GROUPING SETS中將產生笛卡爾積.

GROUPING SETS(a, b), GROUPING SETS(c, d) 
(a, c)
(a, d)
(b, c)
(b, d)


[個人理解]

   在使用group by的時候,特別是使用統計函數時,cube rollup和grouping sets是很有用的,可以分組進行統計.在分組的同時,我們還有grouping_id(返回分組的級別層次計數)和grouping(判斷行是否由cube或rollup產生)進行一系列的操作.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章