hive基本排序函數
語法:
rank()over([partition by col1] order by col2)
dense_rank()over([partition by col1] order by col2)
row_number()over([partition by col1] order by col2)
其中[partition by col1]可省略
案例:
selectname,score,rank() over(partition by name order by score) tt from t;
selectname,score,dense_rank() over(partition by name order by score) tt from t;
selectname,score,row_number() over(partition by name order by score) tt from t;
select name,score,rank()over(order by score) tt from t;
前三名
select name,score from (selectname,score,dense_rank() over(partition by name order by score desc) tt from t)x where x.tt<=3;
分數爲70的排第幾
select name,score,x.tt from (selectname,score,rank() over(partition by name order by score desc) tt from t) xwhere x.name='語文' and x.score=70;
分頁查詢
select xx.* from (select t.*,row_number()over(order by score desc) rowno from t) xx where xx.rowno between 1 and 3;
實際案例:
insert overwritetable otheravgrank_amt select substr(bus_inst_no,0,5),xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts,count(bus_inst_no),dense_rank()over (order by cr_tx_amt desc) as cr_tx_amt_rank,dense_rank() over (order byf_fare desc) as f_fare_rank,dense_rank() over (order by counts desc) as counts_rankfrom branch_amt group bysubstr(bus_inst_no,0,5),xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts;
insert overwritetable denserank_amt selectbus_inst_no,xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts,count(bus_inst_no),dense_rank()over (partition by bus_inst_no order by cr_tx_amt desc) ascr_tx_amt_rank,dense_rank() over (partition by bus_inst_no order by f_faredesc) as f_fare_rank,dense_rank() over (partition by bus_inst_no order bycounts desc) as counts_rank from otheravgrank_amt group bybus_inst_no,xt_op_trl,canal,sa_tx_dt,dr_cr_cod,cr_tx_amt,f_fare,counts;
insert overwritetable denserank_amt select * from denserank_amt sort by bus_inst_no;
普及一下:
rank/dense_rank/row_number區別
row_number函數返回一個唯一的值,當碰到相同數據時,排名按照記錄集中記錄的順序依次遞增。
rank函數會返回數據項在分組中的排名,排名相等會在名次中留下空位
dense_rank返回數據項在分組中的排名,排名相等會在名次中不會留下空位
更詳細介紹可參考這裏
REGION_ID CUSTOMER_ID TOTAL RANK DENSE_RANK ROW_NUMBER-------------
5 2 1224992 12 12 12
9 23 1224992 12 12 13
9 24 1224992 12 12 14
10 30 1216858 15 13 15
排序函數進階
percent_rank 百分比排序函數
計算公式爲:PERCENT_RANK() = (RANK() – 1) / (Total Rows – 1)
其中,RANK() 表示當前行基於ORDER BY後所跟字段的排名,而Total Rows 是當前行所在分區的總行數。
· Hive-0.12.0中內置的分析函數,參考oracle用法
· org.apache.hadoop.hive.ql.exe.FunctionRegistry
· registerHiveUDAFsAsWindowFunctions();
· registerWindowFunction("row_number", newGenericUDAFRowNumber()); --row_number實現類
· registerWindowFunction("rank", new GenericUDAFRank());
· registerWindowFunction("dense_rank", new GenericUDAFDenseRank());
· registerWindowFunction("percent_rank", newGenericUDAFPercentRank());
· registerWindowFunction("cume_dist", new GenericUDAFCumeDist());
· registerWindowFunction("ntile", new GenericUDAFNTile());
· registerWindowFunction("first_value", new GenericUDAFFirstValue());
· registerWindowFunction("last_value", newGenericUDAFLastValue());
· registerWindowFunction(LEAD_FUNC_NAME, new GenericUDAFLead(), false);
registerWindowFunction(LAG_FUNC_NAME,new GenericUDAFLag(), false);
實例:
SELECT DepartmentID, Surname,Salary, Sex,
PERCENT_RANK( ) OVER ( PARTITION BY Sex
ORDER BY Salary DESC ) AS PctRank
FROM Employees
WHERE State IN ( 'NY' );
由於按性別 (Sex) 劃分輸入,所以分別對男僱員和女僱員執行PERCENT_RANK 計算。