Grouping_ID函數
當我們沒有統計某一列時,它的值顯示爲null,這可能與列本身就有null值衝突,這就需要一種方法區分是沒有統計還是值本來就是null。(寫一個排列組合的算法,就馬上理解了,grouping_id其實就是所統計各列二進制和)
Column1 (key) |
Column2 (value) |
---|---|
1 | NULL |
1 |
1 |
2 |
2 |
3 |
3 |
3 |
NULL |
4 |
5 |
hql統計:
SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
統計結果如下:
NULL | NULL | 0 00 | 6 |
1 | NULL | 1 10 | 2 |
1 | NULL | 3 11 | 1 |
1 | 1 | 3 11 | 1 |
2 | NULL | 1 10 | 1 |
2 | 2 | 3 11 | 1 |
3 | NULL | 1 10 | 2 |
3 | NULL | 3 11 | 1 |
3 | 3 | 3 11 | 1 |
4 | NULL | 1 10 | 1 |
4 | 5 | 3 11 | 1 |
如果列中沒有有null值
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
(case when fact_1_id is null then 1 else 0 end) as f1g,
(case when fact_2_id is null then 1 else 0 end) as f2
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
如果列中本來就有null值
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
(case when (CAST (GROUPING__ID AS INT) & 1) = 0 then 1 else 0 end) as f1g,
(case when (CAST (GROUPING__ID AS INT) & 2) = 0 then 1 else 0 end) as f2g
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
如下
(case when (CAST (GROUPING__ID AS INT) & 1) = 1 then '1 ' else 0 end),
(case when (CAST (GROUPING__ID AS INT) & 2) = 2 then '2' else 0 end) ,
(case when (CAST (GROUPING__ID AS INT) & 4) = 4 then '3' else 0 end) ,
第一位如果和1與爲1,證明第一位存在同時包括本來存在的null值,
用來判斷每一個字段知否爲聚合的字段中,null值本來的null還是grouping set的null值。
參考
http://stackoverflow.com/questions/29577887/grouping-in-hive
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup