Hive窗口函數中，有一個功能是統計當前行之前或之後指定行作爲一個聚合，關鍵字是 preceding 和 following，舉例說明其使用方法。

一、加載測試數據

在 hive 環境中創建臨時表：

create table tmp_student
(
   name           string,
   class          tinyint,
   cooperator_name   string,
   score          tinyint
)
row format delimited fields terminated by '|';

然後加載測試數據：

load data local inpath 'text.txt' into table tmp_student;

其中，text.txt 內容如下：

adf|3|測試公司1|45
asdf|3|測試公司2|55
cfe|2|測試公司2|74
3dd|3|測試公司5|n
fda|1|測試公司7|80
gds|2|測試公司9|92
ffd|1|測試公司10|95
dss|1|測試公司4|95
ddd|3|測試公司3|99
gf|3|測試公司9|99

查看是否加載成功：

hive> select * from tmp_student;
OK
adf	3	測試公司1	45
asdf	3	測試公司2	55
cfe	2	測試公司2	74
3dd	3	測試公司5	NULL
fda	1	測試公司7	80
gds	2	測試公司9	92
ffd	1	測試公司10	95
dss	1	測試公司4	95
ddd	3	測試公司3	99
gf	3	測試公司9	99
Time taken: 1.314 seconds, Fetched: 10 row(s)

二、測試窗口函數

執行sql：

select
    name,
    score,
    sum(score) over(order by score range between 2 preceding and 2 following) s1, -- 當前行的score值加減2的範圍內的所有行
    sum(score) over(order by score rows between 2 preceding and 2 following) s2, -- 當前行+前後2行，一共5行
    sum(score) over(order by score range between unbounded preceding and unbounded following) s3, -- 全部行，不做限制
    sum(score) over(order by score rows between unbounded preceding and unbounded following) s4, -- 全部行，不做限制
    sum(score) over(order by score) s5, -- 第一行到當前行（和當前行相同score值的所有行都會包含進去）
    sum(score) over(order by score rows between unbounded preceding and current row) s6, -- 第一行到當前行（和當前行相同score值的其他行不會包含進去，這是和上面的區別）
    sum(score) over(order by score rows between 3 preceding and current row) s7, -- 當前行+往前3行
    sum(score) over(order by score rows between 3 preceding and 1 following) s8, --當前行+往前3行+往後1行
    sum(score) over(order by score rows between current row and unbounded following) s9 --當前行+往後所有行
from
    tmp.tmp_student
order by 
    score;

結果如下：

name	score	s1	s2	s3	s4	s5	s6	s7	s8	s9
3dd	NULL	NULL	100	734	734	NULL	NULL	NULL	45	734
adf	45	45	174	734	734	45	45	45	100	734
asdf	55	55	254	734	734	100	100	100	174	689
cfe	74	74	346	734	734	174	174	174	254	634
fda	80	80	396	734	734	254	254	254	346	560
gds	92	92	436	734	734	346	346	301	396	480
ffd	95	190	480	734	734	536	536	362	461	293
dss	95	190	461	734	734	536	441	341	436	388
ddd	99	198	293	734	734	734	734	388	388	99
gf	99	198	388	734	734	734	635	381	480	198
說明	score升序排列	95加減2，包括95，所以需要加起來，等於190	當前行+前後2行	全部行	全部行	第一行到當前行（和當前行相同score值的所有行都會包含進去）	第一行到當前行	當前行+往前3行	當前行+往前3行+往後1行	當前行+往後所有行

備註：
1、對於 score 相同的行，其order by之後的順序會不確定，因此這兩行的窗口函數結果可以互換，比如 s9 的倒數兩行，按照內置計算邏輯，應該是倒數第二行爲 198，倒數第一行爲 99。

Hive窗口函數之preceding and following

一、加載測試數據

二、測試窗口函數

工作中用到的腳本合集

微服務實踐Aspire項目發佈到遠程k8s集羣

通過f-string編寫簡潔高效的Python格式化輸出代碼

[轉帖]20個常用的Linux工具命令

[轉帖]PostgreSQL從小白到高手教程 - 第46講：poc-tpch測試

24-5-18 X

Leetcode算法——46、全排列

pythony語法——使用threadpool實現多線程並行

python中matplotlib畫圖中文亂碼的解決方法

Hive之grouping sets用法及grouping_id計算方法

Hive窗口函數之preceding and following

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結