Hive分組取Top N

Hive在0.11.0版本開始加入了row_number、rank、dense_rank分析函數，可以查詢分組排序後的top值

使用規則：

row_number() over ([partition col1] [order by col2] )

rank() over ( [partition col1] [order by col2] )

dense_rank() over ( [partition col1] [order by col2] )

它們都是根據col1字段分組，然後對col2字段進行排序，對排序後的每行生成一個行號，這個行號從1開始遞增

col1、col2都可以是多個字段，用','分隔

區別

1）row_number：不管col2字段的值是否相等，行號一直遞增，比如：有兩條記錄的值相等，但一個是第一，一個是第二

2）rank：上下兩條記錄的col2相等時，記錄的行號是一樣的，但下一個col2值的行號遞增N（N是重複的次數），比如：有兩條並列第一，下一個是第三，沒有第二

3）dense_rank：上下兩條記錄的col2相等時，下一個col2值的行號遞增1，比如：有兩條並列第一，下一個是第二

row_number可以實現分頁查詢

實際操作

創建表

create table t(name string, sub string, score int) row format delimited fields terminated by '\t';

數據在附件的a.txt裏

a    chinese    98
a    english    90
d    chinese    88
c    english    82
c    math    98
b    math    89
b    chinese    79
z    english    90
z    math    89
z    chinese    80
e    math    99
e    english    87
d    english    90

加載數據
load data local inpath '/home/hadoop/hive-example/a.txt' into table tb4;

分組排序

--row_number
select *, row_number() over (partition by sub order by score) as od from t; 

--rank
select *, rank() over (partition by sub order by score) as od from t; 

--dense_ran
select *, dense_rank() over (partition by sub order by score desc) from t;

業務實例

--統計每個學科的前三名
select * from (select *, row_number() over (partition by sub order by score desc) as od from t ) t where od<=3;

--語文成績是80分的排名是多少
select od from (select *, row_number() over (partition by sub order by score desc) as od from t ) t where sub='chinese' and score=80;

--分頁查詢
select * from (select *, row_number() over () as rn from t) t1 where rn between 1 and 5;

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive 常用的SQL3 Hive分組取Top N

Hive分組取Top N

使用規則：

區別

實際操作

業務實例

二進制文件查看工具和方法

[oeasy]python019_ 如何在github倉庫中進入目錄_找到程序代碼_找到代碼

金融反欺詐指南：車險欺詐爲何如此猖獗？

Haskell網絡爬蟲：視頻列表獲取案例分析

scp遠程連接複製文件或目錄

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結