歡迎訪問我的GitHub
https://github.com/zq2599/blog_demos
內容:所有原創文章分類彙總及配套源碼,涉及Java、Docker、Kubernetes、DevOPS等;
《hive學習筆記》系列導航
本篇概覽
- 作爲《hive學習筆記》的第二篇,前面咱們瞭解了基本類型,本篇要學習的是複雜數據類型;
- 複雜數據類型一共有四種:
- ARRAY:數組
- MAP:鍵值對
- STRUCT:命名字段集合
- UNION:從幾種數據類型中指明選擇一種,UNION的值必須於這些數據類型之一完全匹配;
- 接下來逐個學習;
準備環境
- 確保hadoop已經啓動;
- 進入hive控制檯的交互模式;
- 執行以下命令,使查詢結果中帶有字段名:
set hive.cli.print.header=true;
ARRAY
- 創建名爲<font color="blue">t2</font>的表,只有person和friends兩個字段,<font color="blue">person</font>是字符串類型,<font color="blue">friends</font>是數組類型,通過文本文件導入數據時,person和friends之間的分隔符是<font color="red">豎線</font>,friends內部的多個元素之間的分隔符是<font color="blue">逗號</font>,注意聲明分隔符的語法:
create table if not exists t2(
person string,
friends array<string>
)
row format delimited
fields terminated by '|'
collection items terminated by ',';
- 創建文本文件<font color="blue">002.txt</font>,內容如下,可見只有兩條記錄,第一條person字段值爲tom,friends字段裏面有三個元素,用逗號分隔:
tom|tom_friend_0,tom_friend_1,tom_friend_2
jerry|jerry_friend_0,jerry_friend_1,jerry_friend_2,jerry_friend_3,jerry_friend_4,jerry_friend_5
- 執行以下語句,從本地的002.txt文件導入數據到t2表:
load data local inpath '/home/hadoop/temp/202010/25/002.txt' into table t2;
- 查看全部數據:
hive> select * from t2;
OK
t2.person t2.friends
tom ["tom_friend_0","tom_friend_1","tom_friend_2"]
jerry ["jerry_friend_0","jerry_friend_1","jerry_friend_2","jerry_friend_3","jerry_friend_4","jerry_friend_5"]
Time taken: 0.052 seconds, Fetched: 2 row(s)
- 查詢friends中的某個元素的SQL:
select person, friends[0], friends[3] from t2;
執行結果如下,第一條記錄沒有friends[3],顯示爲NULL:
hive> select person, friends[0], friends[3] from t2;
OK
person _c1 _c2
tom tom_friend_0 NULL
jerry jerry_friend_0 jerry_friend_3
Time taken: 0.052 seconds, Fetched: 2 row(s)
- 數組元素中是否包含某值的SQL:
select person, array_contains(friends, 'tom_friend_0') from t2;
執行結果如下,第一條記錄friends數組中有<font color="red">tom_friend_0</font>,顯示爲true,第二條記錄不包含,就顯示false:
hive> select person, array_contains(friends, 'tom_friend_0') from t2;
OK
person _c1
tom true
jerry false
Time taken: 0.061 seconds, Fetched: 2 row(s)
- 第一條記錄的friends數組中有三個元素,藉助<font color="blue">LATERAL VIEW</font>語法可以把這三個元素拆成三行,SQL如下:
select t.person, single_friend
from (
select person, friends
from t2 where person='tom'
) t LATERAL VIEW explode(t.friends) v as single_friend;
執行結果如下,可見數組中的每個元素都能拆成單獨一行:
OK
t.person single_friend
tom tom_friend_0
tom tom_friend_1
tom tom_friend_2
Time taken: 0.058 seconds, Fetched: 3 row(s)
- 以上就是數組的基本操作,接下來是鍵值對;
MAP,建表,導入數據
- 接下來打算創建名爲<font color="blue">t3</font>的表,只有person和address兩個字段,<font color="blue">person</font>是字符串類型,<font color="blue">address</font>是MAP類型,通過文本文件導入數據時,對分隔符的定義如下:
- person和address之間的分隔符是<font color="red">豎線</font>;
- address內部有多個鍵值對,它們的分隔符是<font color="red">逗號</font>;
- 而每個鍵值對的鍵和值的分隔符是<font color="red">冒號</font>;
- 滿足上述要求的建表語句如下所示:
create table if not exists t3(
person string,
address map<string, string>
)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':';
- 創建文本文件<font color="blue">003.txt</font>,可見用了三種分隔符來分隔字段、MAP中的多個元素、每個元素鍵和值:
tom|province:guangdong,city:shenzhen
jerry|province:jiangsu,city:nanjing
- 導入003.txt的數據到t3表:
load data local inpath '/home/hadoop/temp/202010/25/003.txt' into table t3;
MAP,查詢
- 查看全部數據:
hive> select * from t3;
OK
t3.person t3.address
tom {"province":"guangdong","city":"shenzhen"}
jerry {"province":"jiangsu","city":"nanjing"}
Time taken: 0.075 seconds, Fetched: 2 row(s)
- 查看MAP中的某個key,語法是<font color="blue">field["xxx"]</font>:
hive> select person, address["province"] from t3;
OK
person _c1
tom guangdong
jerry jiangsu
Time taken: 0.075 seconds, Fetched: 2 row(s)
- 使用<font color="blue">if</font>函數,下面的SQL是判斷address字段中是否有"street"鍵,如果有就顯示對應的值,沒有就顯示<font color="blue">filed street not exists</font>:
select person,
if(address['street'] is null, "filed street not exists", address['street'])
from t3;
輸出如下,由於address字段只有<font color="blue">province</font>和<font color="blue">city</font>兩個鍵,因此會顯示<font color="blue">filed street not exists</font>:
OK
tom filed street not exists
jerry filed street not exists
Time taken: 0.087 seconds, Fetched: 2 row(s)
- 使用<font color="blue">explode</font>將address字段的每個鍵值對展示成一行:
hive> select explode(address) from t3;
OK
province guangdong
city shenzhen
province jiangsu
city nanjing
Time taken: 0.081 seconds, Fetched: 4 row(s)
- 上面的<font color="blue">explode</font>函數只能展示address字段,如果還要展示其他字段就要繼續<font color="blue">LATERAL VIEW</font>語法,如下,可見前面的數組展開爲一個字段,MAP展開爲兩個字段,分別是key和value:
select t.person, address_key, address_value
from (
select person, address
from t3 where person='tom'
) t LATERAL VIEW explode(t.address) v as address_key, address_value;
結果如下:
OK
tom province guangdong
tom city shenzhen
Time taken: 0.118 seconds, Fetched: 2 row(s)
- <font color="blue">size</font>函數可以查看MAP中鍵值對的數量:
hive> select person, size(address) from t3;
OK
tom 2
jerry 2
Time taken: 0.082 seconds, Fetched: 2 row(s)
STRUCT
- STRUCT是一種記錄類型,它封裝了一個命名的字段集合,裏面有很多屬性,新建名爲<font color="blue">t4</font>的表,其info字段就是<font color="blue">STRUCT</font>類型,裏面有age和city兩個屬性,person和info之間的分隔符是<font color="red">豎線</font>,info內部的多個元素之間的分隔符是<font color="red">逗號</font>,注意聲明分隔符的語法:
create table if not exists t4(
person string,
info struct<age:int, city:string>
)
row format delimited
fields terminated by '|'
collection items terminated by ',';
- 準備好名爲004.txt的文本文件,內容如下:
tom|11,shenzhen
jerry|12,nanjing
- 加載004.txt的數據到t4表:
load data local inpath '/home/hadoop/temp/202010/25/004.txt' into table t4;
- 查看t4的所有數據:
hive> select * from t4;
OK
tom {"age":11,"city":"shenzhen"}
jerry {"age":12,"city":"nanjing"}
Time taken: 0.063 seconds, Fetched: 2 row(s)
- 查看指定字段,用filedname.xxx語法:
hive> select person, info.city from t4;
OK
tom shenzhen
jerry nanjing
Time taken: 0.141 seconds, Fetched: 2 row(s)
UNION
- 最後一種是UNIONTYPE,這是從幾種數據類型中指明選擇一種,由於UNIONTYPE數據的創建設計到UDF(create_union),這裏先不展開了,先看看建表語句:
CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>);
- 查詢結果:
SELECT foo FROM union_test;
{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
{1:10.0}
- 至此,hive的基礎數據類型和複雜數據類型咱們都實際操作過一遍了,接下來的文章將展開更多hive知識,期待與您共同進步;
你不孤單,欣宸原創一路相伴
歡迎關注公衆號:程序員欣宸
微信搜索「程序員欣宸」,我是欣宸,期待與您一同暢遊Java世界... https://github.com/zq2599/blog_demos