球隊 比賽 反超球員及連續得分球員問題計算
問題:兩支籃球隊進行了激烈的籃球比賽,比分交替上升。比賽結束後,你有一張兩隊得分分數的明細表,記錄了球隊team,球員號碼number,球員姓名name, 得分分數score 以及得分時間score_time(string,秒級)。現在球隊要對比賽中表現突出的球員做出嘉獎,所以請你用sql統計出
1)比賽中幫助各自球隊反超比分的球員姓名以及對應時間。
2)連續三次(及以上)爲球隊得分的球員名單以及對應時間段內獲得的分數。
一 數據準備
1建表
hdfs dfs -mkdir '/tmp/yj_ext_csv_unc_basketball_game'
use tmp;
create external table tmp.yj_ext_csv_unc_basketball_game
(
team string,
number int,
score int,
score_time int
)
partitioned by (dt STRING)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as textfile
LOCATION '/tmp/yj_ext_csv_unc_basketball_game/';
2.導入數據
數據樣例:
a,2,1,0
a,1,2,1
a,1,3,2
b,1,4,3
a,1,2,4
b,2,5,5
b,6,3,6
a,1,5,7
a,3,2,8
b,2,4,9
b,5,3,10
a,1,2,11
a,1,3,12
b,5,4,13
a,4,3,14
a,4,2,15
a,4,3,16
b,6,7,17
hdfs dfs -put basketball.cav /tmp/yj_ext_csv_unc_basketball_game/20190527
altet table tmp.yj_ext_csv_unc_basketball_game partition(dt='20190527') loaction '/tmp/yj_ext_csv_unc_basketball_game/20190527'
二 問題分析
- 1 計算比賽中幫助各自球隊反超比分的球員姓名以及對應時間
計算a,b球隊的累計得分總和;一個時間點上只有一個球隊得分,所以同一時間點如果a球隊得分,那麼b球隊一定得零分,一行數據增加對手球隊信息
select *, case when team='a' then 'b' else 'a' end team_2 , 0 number_2, 0 score_2 from tmp.yj_ext_csv_unc_freemudb where dt='20190527';
數據結果
team number score score_time dt team_2 number_2 score_2 sum_a sum_b
a 2 1 0 20190527 b 0 0 1 0
a 1 2 1 20190527 b 0 0 3 0
a 1 3 2 20190527 b 0 0 6 0
b 1 4 3 20190527 a 0 0 6 4
a 1 2 4 20190527 b 0 0 8 4
b 2 5 5 20190527 a 0 0 8 9
b 6 3 6 20190527 a 0 0 8 12
a 1 5 7 20190527 b 0 0 13 12
a 3 2 8 20190527 b 0 0 15 12
b 2 4 9 20190527 a 0 0 15 16
b 5 3 10 20190527 a 0 0 15 19
a 1 2 11 20190527 b 0 0 17 19
a 1 3 12 20190527 b 0 0 20 19
b 5 4 13 20190527 a 0 0 20 23
a 4 3 14 20190527 b 0 0 23 23
a 4 2 15 20190527 b 0 0 25 23
a 4 3 16 20190527 b 0 0 28 23
b 6 7 17 20190527 a 0 0 28 30
-- 計算a,b球隊累計得分
create table tmp.yj_bask_tmp as
(select *, sum(case when team='a' then score else score_2 end) over(order by score_time) sum_a,
sum(case when team='b' then score else score_2 end) over(order by score_time) sum_b
from (select *, case when team='a' then 'b' else 'a' end team_2 , 0 number_2, 0 score_2 from tmp.yj_ext_csv_unc_freemudb where dt='20190527') a
team number score score_time dt team_2 number_2 score_2 sum_a sum_b) a
數據結果
a 2 1 0 20190527 b 0 0 1 0
a 1 2 1 20190527 b 0 0 3 0
a 1 3 2 20190527 b 0 0 6 0
b 1 4 3 20190527 a 0 0 6 4
a 1 2 4 20190527 b 0 0 8 4
b 2 5 5 20190527 a 0 0 8 9
b 6 3 6 20190527 a 0 0 8 12
a 1 5 7 20190527 b 0 0 13 12
a 3 2 8 20190527 b 0 0 15 12
b 2 4 9 20190527 a 0 0 15 16
b 5 3 10 20190527 a 0 0 15 19
a 1 2 11 20190527 b 0 0 17 19
a 1 3 12 20190527 b 0 0 20 19
b 5 4 13 20190527 a 0 0 20 23
a 4 3 14 20190527 b 0 0 23 23
a 4 2 15 20190527 b 0 0 25 23
a 4 3 16 20190527 b 0 0 28 23
b 6 7 17 20190527 a 0 0 28 30
a隊被反超時b隊的球員信息
select * from (select a.*, b.a_lose float_a_lost, b.b_lose float_b_lose from
(select *, (sum_a-sum_b) a_lose, (sum_b-sum_a) b_lose, row_number() over (order by score_time) rank from tmp.yj_bask_tmp) a
left join
(select *, (sum_a-sum_b) a_lose, (sum_b-sum_a) b_lose, row_number() over (order by score_time) rank from tmp.yj_bask_tmp) b
on a.rank = (b.rank+1)) c
where a_lose < 0 and float_a_lost > 0;
team number score score_time dt team_2 number_2 score_2 sum_a sum_b a_lose b_lose rank float_a_lost float_b_lose
數據結果
b 2 5 5 20190527 a 0 0 7 9 -2 2 5 3 -3
b 2 4 9 20190527 a 0 0 14 16 -2 2 9 2 -2
- 2 連續三次(及以上)爲球隊得分的球員名單以及對應時間段內獲得的分數
-- a隊連續三次得分的球員信息
-- 按時間對每條得分記錄排序rank,把同一球員的相關信息分到一組排序,然後用時間排序rank減去分組排序
select number, case when count(1) >=3 then sum(score) else -100 end,count(1) from
(select * , (rank-row_number() over (partition by number order by rank)) diff from
(select *, row_number() over (order by score_time) rank from tmp.yj_ext_csv_unc_basketball_game where dt='20190527' and team='a') a) b
group by number, diff;
數據結果
number _c1 _c2
1 12 4
1 -100 2
2 -100 1
3 -100 1
4 8 3