hive 計算 球隊比賽各球隊反超比分的球員姓名及時間 連續三次得分球員

球隊 比賽 反超球員及連續得分球員問題計算

問題:兩支籃球隊進行了激烈的籃球比賽,比分交替上升。比賽結束後,你有一張兩隊得分分數的明細表,記錄了球隊team,球員號碼number,球員姓名name, 得分分數score 以及得分時間score_time(string,秒級)。現在球隊要對比賽中表現突出的球員做出嘉獎,所以請你用sql統計出
1)比賽中幫助各自球隊反超比分的球員姓名以及對應時間。
2)連續三次(及以上)爲球隊得分的球員名單以及對應時間段內獲得的分數。

一 數據準備

1建表

hdfs dfs -mkdir '/tmp/yj_ext_csv_unc_basketball_game'
use tmp;
create external table tmp.yj_ext_csv_unc_basketball_game
(
team string,
number int,
score int,
score_time int
)
partitioned by (dt STRING)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as textfile
LOCATION '/tmp/yj_ext_csv_unc_basketball_game/';

2.導入數據

數據樣例:
a,2,1,0
a,1,2,1
a,1,3,2
b,1,4,3
a,1,2,4
b,2,5,5
b,6,3,6
a,1,5,7
a,3,2,8
b,2,4,9
b,5,3,10
a,1,2,11
a,1,3,12
b,5,4,13
a,4,3,14
a,4,2,15
a,4,3,16
b,6,7,17
hdfs dfs -put basketball.cav /tmp/yj_ext_csv_unc_basketball_game/20190527
altet table tmp.yj_ext_csv_unc_basketball_game partition(dt='20190527') loaction '/tmp/yj_ext_csv_unc_basketball_game/20190527'

二 問題分析

  • 1 計算比賽中幫助各自球隊反超比分的球員姓名以及對應時間
 計算a,b球隊的累計得分總和;一個時間點上只有一個球隊得分,所以同一時間點如果a球隊得分,那麼b球隊一定得零分,一行數據增加對手球隊信息

select *, case when team='a' then 'b' else 'a' end team_2 , 0 number_2, 0 score_2 from tmp.yj_ext_csv_unc_freemudb where dt='20190527';
數據結果
team	number	score	score_time	dt	team_2	number_2	score_2	sum_a	sum_b
a	2	1	0	20190527	b	0	0	1	0
a	1	2	1	20190527	b	0	0	3	0
a	1	3	2	20190527	b	0	0	6	0
b	1	4	3	20190527	a	0	0	6	4
a	1	2	4	20190527	b	0	0	8	4
b	2	5	5	20190527	a	0	0	8	9
b	6	3	6	20190527	a	0	0	8	12
a	1	5	7	20190527	b	0	0	13	12
a	3	2	8	20190527	b	0	0	15	12
b	2	4	9	20190527	a	0	0	15	16
b	5	3	10	20190527	a	0	0	15	19
a	1	2	11	20190527	b	0	0	17	19
a	1	3	12	20190527	b	0	0	20	19
b	5	4	13	20190527	a	0	0	20	23
a	4	3	14	20190527	b	0	0	23	23
a	4	2	15	20190527	b	0	0	25	23
a	4	3	16	20190527	b	0	0	28	23
b	6	7	17	20190527	a	0	0	28	30
-- 計算a,b球隊累計得分
create table tmp.yj_bask_tmp as
(select *, sum(case when team='a' then score else score_2 end) over(order by score_time) sum_a,
sum(case when team='b' then score else score_2 end) over(order by score_time) sum_b
from (select *, case when team='a' then 'b' else 'a' end team_2 , 0 number_2, 0 score_2 from tmp.yj_ext_csv_unc_freemudb where dt='20190527') a
team	number	score	score_time	dt	team_2	number_2	score_2	sum_a	sum_b) a
數據結果
a	2	1	0	20190527	b	0	0	1	0
a	1	2	1	20190527	b	0	0	3	0
a	1	3	2	20190527	b	0	0	6	0
b	1	4	3	20190527	a	0	0	6	4
a	1	2	4	20190527	b	0	0	8	4
b	2	5	5	20190527	a	0	0	8	9
b	6	3	6	20190527	a	0	0	8	12
a	1	5	7	20190527	b	0	0	13	12
a	3	2	8	20190527	b	0	0	15	12
b	2	4	9	20190527	a	0	0	15	16
b	5	3	10	20190527	a	0	0	15	19
a	1	2	11	20190527	b	0	0	17	19
a	1	3	12	20190527	b	0	0	20	19
b	5	4	13	20190527	a	0	0	20	23
a	4	3	14	20190527	b	0	0	23	23
a	4	2	15	20190527	b	0	0	25	23
a	4	3	16	20190527	b	0	0	28	23
b	6	7	17	20190527	a	0	0	28	30
a隊被反超時b隊的球員信息
select * from (select a.*, b.a_lose float_a_lost, b.b_lose float_b_lose from
(select *, (sum_a-sum_b) a_lose,  (sum_b-sum_a) b_lose, row_number() over (order by  score_time) rank  from tmp.yj_bask_tmp) a
left join
(select *, (sum_a-sum_b) a_lose,  (sum_b-sum_a) b_lose, row_number() over (order by  score_time) rank  from tmp.yj_bask_tmp)  b
on a.rank = (b.rank+1)) c
where a_lose < 0 and float_a_lost > 0;
team	number	score	score_time	dt	team_2	number_2	score_2	sum_a	sum_b	a_lose	b_lose	rank	float_a_lost	float_b_lose
數據結果
b	2	5	5	20190527	a	0	0	7	9	-2	2	5	3	-3
b	2	4	9	20190527	a	0	0	14	16	-2	2	9	2	-2

  • 2 連續三次(及以上)爲球隊得分的球員名單以及對應時間段內獲得的分數
-- a隊連續三次得分的球員信息
-- 按時間對每條得分記錄排序rank,把同一球員的相關信息分到一組排序,然後用時間排序rank減去分組排序

select number, case when count(1) >=3 then sum(score) else -100 end,count(1) from
(select * , (rank-row_number() over (partition by number order by rank)) diff from
(select *, row_number() over (order by score_time) rank from tmp.yj_ext_csv_unc_basketball_game where dt='20190527' and team='a') a) b
group by number, diff;
數據結果
number	_c1	_c2
1	12	4
1	-100	2
2	-100	1
3	-100	1
4	8	3

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章