概述
GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
這幾個分析函數通常用於OLAP中,不能累加,而且需要根據不同維度上鑽和下鑽的指標統計,比如,分小時、天、月的UV數。
數據準備
2015-03,2015-03-10,cookie1
2015-03,2015-03-10,cookie5
2015-03,2015-03-12,cookie7
2015-04,2015-04-12,cookie3
2015-04,2015-04-13,cookie2
2015-04,2015-04-13,cookie4
2015-04,2015-04-16,cookie4
2015-03,2015-03-10,cookie2
2015-03,2015-03-10,cookie3
2015-04,2015-04-12,cookie5
2015-04,2015-04-13,cookie6
2015-04,2015-04-15,cookie3
2015-04,2015-04-15,cookie2
2015-04,2015-04-16,cookie1
玩一玩GROUPING SETS和GROUPING__ID
說明
在一個GROUP BY查詢中,根據不同的維度組合進行聚合,等價於將不同維度的GROUP BY結果集進行UNION ALL
GROUPING__ID,表示結果屬於哪一個分組集合。
select month, day, count(distinct cookieid) as uv, GROUPING__ID from cookie.cookie5 group by month,day grouping sets (month,day) order by GROUPING__ID;
等價於
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day
2015-04 NULL 6 1 1
2015-03 NULL 5 1 1
NULL 2015-04-16 2 2
NULL 2015-04-15 2 2
NULL 2015-04-13 3 2
NULL 2015-04-12 2 2
NULL 2015-03-12 1 2
NULL 2015-03-10 4 2
結果說明
第一列是按照month進行分組
第二列是按照day進行分組
第三列是按照month或day分組是,統計這一組有幾個不同的cookieid
第四列grouping_id表示這一組結果屬於哪個分組集合,根據grouping sets中的分組條件month,day,1是代表month,2是代表day
hive> select
> month,
> day,
> count(distinct cookieid) as uv,
> GROUPING__ID
> from cookie5
> group by month,day
> with cube
> order by grouping__id;
Total MapReduce CPU Time Spent: 7 seconds 500 msec
顯示結果
OK
NULL NULL 7 0 0
2015-03 NULL 5 1 1
2015-04 NULL 6 1 1
NULL 2015-04-16 2 2
NULL 2015-04-15 2 2
NULL 2015-04-13 3 2
NULL 2015-04-12 2 2
NULL 2015-03-12 1 2
NULL 2015-03-10 4 2
2015-04 2015-04-12 2 3
2015-04 2015-04-16 2 3
2015-03 2015-03-12 1 3
2015-03 2015-03-10 4 3
2015-04 2015-04-15 2 3
2015-04 2015-04-13 3 3
Time taken: 44.156 seconds, Fetched: 15 row(s)
再比如
SELECT month, day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM cookie5
GROUP BY month,day
GROUPING SETS (month,day,(month,day))
ORDER BY GROUPING__ID;
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day
玩一玩ROLLUP
說明
是CUBE的子集,以最左側的維度爲主,從該維度進行層級聚合
查詢語句
-- 比如,以month維度進行層級聚合
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM cookie5 GROUP BY month,day WITH ROLLUP ORDER BY GROUPING__ID;
可以實現這樣的上鑽過程:
月天的UV->月的UV->總UV
--把month和day調換順序,則以day維度進行層級聚合:
可以實現這樣的上鑽過程:
天月的UV->天的UV->總UV
(這裏,根據天和月進行聚合,和根據天聚合結果一樣,因爲有父子關係,如果是其他維度組合的話,就會不一樣)
Total MapReduce CPU Time Spent: 7 seconds 500 msec
OK
2015-04 NULL 6 1
2015-03 NULL 5 1
NULL 2015-03-10 4 2
NULL 2015-04-16 2 2
NULL 2015-04-15 2 2
NULL 2015-04-13 3 2
NULL 2015-04-12 2 2
NULL 2015-03-12 1 2
2015-04 2015-04-16 2 3
2015-04 2015-04-12 2 3
2015-04 2015-04-13 3 3
2015-03 2015-03-12 1 3
2015-03 2015-03-10 4 3
2015-04 2015-04-15 2 3
Time taken: 44.196 seconds, Fetched: 14 row(s)