Hive 表常用查詢語句-總結

原創

GrowthDiary007

2020-06-02 11:45

Hive之前不常用，每次都是現用現查，就是現在總結記下筆記，邊學邊記（下面都是一些簡單的例子，由易到難嗎）>_<。

1、基本的查詢語句

現在假設有數據庫 db，數據表table1，table2，

--1、查看錶的創建信息：
show create table db.table1;

--2、查看錶的分區信息：
show partitions db.table1;

--3、查看錶的記錄數：
select count(*) from db.table1 where dt = '2019-03-21';

--4、簡單連接操作：
select t1.userid, t1.name, t2.score from 
(select userid, name from db.table1 where dt = '2019-03-21' ) t1
left join
(select userid, score from db.table2 where dt='2019-03-21') t2
on t1.userid=t2.userid;

--5、給字段起別名：
select userid as user_id from db.table1 where dt = '2019-03-19' ;

--6、求兩個表的差集，出在表A中，但不能出現表B中，即 A-B：
select a.user_id from 
       (select user_id from db.table1) a
        left outer join
       (select user_id from db.table2) b
        on a.user_id = b.user_id 
where b.user_id is null ;

2、hive的排序相關查詢

1. 基本的排序操作

語句	功能	優點	缺點	備註
order by	與其他SQL一樣，全局有序	保證全局有序	大數據下，會比較耗時	`------`
sort by	在reduce階段進行排序	在reduce階段進行排序，比較快	不能保證全局有序，除非只有一個reduce	使用sort by 之後，在進行全局排序會比較快
distribute by和sort by 聯合使用	先聚合在排序，或者先分組在排序	`------`	`------`	`------`
cluster by	先聚合在排序，或者先分組在排序（同上）	`------`	`------`	`------`

注意： 關鍵字 asc 和 desc 表示升序和降序，其中 cluster by 指定的列只能降序
使用示例：

-- 1、 對單個字段，降序排序（如果是多個字段，就繼續在後面追加即可）
   -- 按照年齡降序排序， sort by 使用方法與order by 一樣。
select user_id, age from db.table order by age desc;


--2、 先按照班級class分組，在按照得分score、年齡age 升序排列
select class, age, score from db.table distribute by class sort by age asc, score asc;

--3、 先按照班級class分組，在按照年齡age 排列
select  class, age from db.table cluster by class sort by age;

2. 分組排序實現

一般有兩種實現方式：

（1）row_number() over( partition by 分組字段 order by 排序字段) as rank(rank 可隨起名，表示排序後標識)
（2）row_number() over( distribute by 分組字段 sort by 排序字段) as rank(rank 可隨起名，表示排序後標識)

-- 注意： 
--1、 partition by  只與 order by 組合使用
--2、 distribute by 只與 sort by 組合使用
--3、 rank，可以隨便起的名字，表示排序後的序號，例如，1,2,3,4,5... 
--4、 分組字段、排序字段，均可爲多個字段。
--5、 分組字段設置爲常量，例如爲1，這時，僅可以獲取按照排列字段，排序後的--序號。

使用示例：

--1、選取每個班級成績前三名的同學:
select class, student, score from (
       select class, student, score, row_number() over (distribute by class sort by score desc) as rank from db.table1
       )as t1 
where t1.rank < 4;


--2、distribute by，後面可以跟常數，例如1，這樣只是獲取按照某一列排序後的標識：
select class, student, score, row_number() over (distribute by 1 sort by score desc) as rank from db.table1;

參考鏈接：1基本排序、2分組排序

3. 從全量表數據獲取增量數據

select a.id from
(select distinct id from db.table1 where dt='2020-05-27') a
left outer join
(select distinct id from db.table1 where dt='2020-05-26') b
on a.id=b.id
where b.id is null

4. 從全量表數據獲取增量數據

partition="show partitions db.table1;"
latest_info=$(hive -e  "$partition" | sort | tail -n 1)
latest_dt=${latest_info:3:13}
echo $latest_dt

5. 去重後統計行數

select count(*) from (select distinct id from ab.table where dt='2020-05-26') a

聲明： 總結學習，有問題或不當之處，可以批評指正哦，謝謝。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive 表常用查詢語句-總結

1、基本的查詢語句

2、hive的排序相關查詢

1. 基本的排序操作

2. 分組排序實現

3. 從全量表數據獲取增量數據

4. 從全量表數據獲取增量數據

5. 去重後統計行數

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

LeetCode：1287. Element Appearing More Than 25% In Sorted Array - Python

圖像像素座標問題

使用openpyxl模塊向Excel中插入圖片

Linux 安裝 Python libsvm - 相關問題

LeetCode：1293. Shortest Path in a Grid with Obstacles Elimination - Python

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結