186. 部署Hive數據倉庫

6. 部署Hive數據倉庫

6.1 部署Hive

進入先電大數據平臺主界面，點擊左側的動作按鈕，添加Hive服務

# mysql -uroot -pbigdata
MariaDB [(none)]> create database hive;  
MariaDB [(none)]> grant all privileges on hive.* to 'hive'@'localhost' identified by 'bigdata';
MariaDB [(none)]> grant all privileges on hive.* to 'hive'@'%' identified by 'bigdata'; 
注意：如果使用master節點的MariaDB作爲Hive的元數據存儲數據庫，需要將Hive MetaStore安裝在master節點。

6.2 Hive用戶指南

6.2.1測試驗證

啓動hive客戶端
# su hive
$ hive

6.2.2 hive命令參數

usage: hive

 -d，--define <key=value>      Variable subsitution to apply to hive

                 commands. e.g. -d A=B or --define A=B

  --database <databasename>   Specify the database to use

 -e <quoted-query-string>     SQL from command line

 -f <filename>           SQL from files

 -H，--help             Print help information

  --hiveconf <property=value>  Use value for given property

  --hivevar <key=value>     Variable subsitution to apply to hive

                 commands. e.g. --hivevar A=B

 -i <filename>           Initialization SQL file

 -S，--silent            Silent mode in interactive shell

 -v，--verbose           Verbose mode (echo executed SQL to the

                 console)

1.hive交互模式

hive> show tables; #查看所有表名

hive> show tables  'ad*'  #查看以'ad'開頭的表名

hive> set 命令 #設置變量與查看變量；

hive> set -v #查看所有的變量

hive> set hive.stats.atomic #查看hive.stats.atomic變量

hive> set hive.stats.atomic=false #設置hive.stats.atomic變量

hive> dfs  -ls #查看hadoop所有文件路徑

hive> dfs  -ls /user/hive/warehouse/ #查看hive所有文件

hive> dfs  -ls /user/hive/warehouse/ptest #查看ptest文件

hive> source file <filepath> #在client裏執行一個hive腳本文件

hive> quit #退出交互式shell

hive> exit #退出交互式shell

hive> reset #重置配置爲默認值

hive> !ls #從Hive shell執行一個shell命令

2.操作及函數

查看函數：

hive> show  functions;  

正則查看函數名：

show  functions  'xpath.*';  

查看具體函數內容：

describe function xpath; | desc function  xpath;

3.字段類型

Hive支持基本數據類型和複雜類型，基本數據類型主要有數值類型(INT、FLOAT、DOUBLE)、布爾型和字符串，複雜類型有三種:ARRAY、MAP 和 STRUCT。

4.基本數據類型

TINYINT: 1個字節

SMALLINT: 2個字節

INT: 4個字節  

BIGINT: 8個字節

BOOLEAN: TRUE/FALSE  

FLOAT: 4個字節，單精度浮點型

DOUBLE: 8個字節，雙精度浮點型STRING    字符串

5.複雜數據類型

ARRAY: 有序字段

MAP: 無序字段

STRUCT: 一組命名的字段

6.2.3 表類型

hive表大致分爲普通表、外部表、分區表三種。

1.普通表

創建表

hive> create table tb_person(id int, name string);

創建表並創建分區字段ds

hive> create table tb_stu(id int, name string) partitioned by(ds string);

查看分區

hive> show  partitions tb_stu;

顯示所有表

hive> show tables;

按正則表達式顯示錶，

hive> show tables 'tb_*';

表添加一列

hive> alter table tb_person add columns (new_col int);

 

添加一列並增加列字段註釋

hive> alter table tb_stu add columns (new_col2 int comment 'a comment');

更改表名

hive> alter table tb_stu rename to tb_stu;

刪除表(hive只能刪分區，不能刪記錄或列 )

hive> drop table tb_stu;

對於託管表，drop 操作會把元數據和數據文件刪除掉，對於外部表，只是刪除元數據。如果只要刪除表中的數據，保留表名可以在 HDFS 上刪除數據文件:

hive> dfs -rmr /user/hive/warehouse/mutill1/*

將本地/home/hadoop/ziliao/stu.txt文件中的數據加載到表中， stu.txt文件數據如下：

1 zhangsan

2 lisi

3 wangwu

將文件中的數據加載到表中

hive> load data local inpath '/home/hadoop/ziliao/stu.txt' overwrite into table tb_person;

加載本地數據，同時給定分區信息

hive> load data local inpath '/home/hadoop/ziliao/stu.txt' overwrite into table tb_stu partition (ds='2008-08-15');

備註：如果導入的數據在 HDFS 上，則不需要 local 關鍵字。託管表導入的數據文件可在數據倉庫目錄“user/hive/warehouse/<tablename>”中看到。

查看數據

hive> dfs -ls /user/hive/warehouse/tb_stu

hive> dfs -ls /user/hive/warehouse/tb_person

2.外部表

external關鍵字可以讓用戶創建一個外部表，在建表的同時指定一個指向實際數據的路徑(location)，hive創建內部表時，會將數據移動到數據倉庫指向的路徑；若創建外部表，僅記錄數據所在的路徑，不對數據的位置做任何改變。在刪除表的時候，內部表的元數據和數據會被一起刪除，而外部表只刪除元數據，不刪除數據。

eg. 創建外部表：

hive> create external table tb_record(col1 string， col2 string) row format delimited fields terminated by '\t' location '/user/hadoop/input';

這樣表tb_record的數據就是hdfs://user/hadoop/input/* 的數據了。

3.分區表

分區是表的部分列的集合， 可以爲頻繁使用的數據建立分區， 這樣查找分區中的數據時就不需要掃描全表， 這對於提高查找效率很有幫助。

創建分區：create table log(ts bigint，line string) partitioned by(name string);

插入分區：insert overwrite table log partition(name='xiapi') select id from userinfo where name='xiapi';

查看分區：show  partitions log;

刪除分區: alter table ptest drop partition (name='xiapi')

備註:通常情況下需要先預先創建好分區，然後才能使用該分區。還有分區列的值要轉化爲文件夾的存儲路徑，所以如果分區列的值中包含特殊值，如 '%'， ':'， '/'， '#'，它將會被使用%加上 2 字節的 ASCII 碼進行轉義。

6.2.4sql操作及桶

1.創建表

首先建立三張測試表:

userinfo表中有兩列，以tab鍵分割，分別存儲用戶的id和名字name;

classinfo表中有兩列，以tab鍵分割，分別存儲課程老師teacher和課程名classname;

choice表中有兩列，以tab鍵分割，分別存儲用戶的userid和選課名稱classname(類似中間表)。

創建測試表:

hive> create table userinfo(id int，name string) row format delimited fields terminated by '\t';

hive> create table classinfo(teacher string，classname string) row format delimited fields terminated by '\t';

hive> create table choice(userid int，classname string) row format delimited fields terminated by '\t';

注意：'\t'相當於一個tab鍵盤。

顯示剛纔創建的數據表:

hive> show tables;

2.導入數據

建表後，可以從本地文件系統或 HDFS 中導入數據文件，導入數據樣例如下:

userinfo.txt內容如下(數據之間用tab鍵隔開)：

1   xiapi

2   xiaoxue

3   qingqing

classinfo.txt內容如下(數據之間用tab鍵隔開)：

jack   math

sam   china

lucy   english

choice.txt內容如下(數據之間用tab鍵隔開)：

1   math

1   china

1   english

2   china

2   english

3   english

首先在本地“/home/hadoop/ziliao”下按照上面建立三個文件， 並添加如上的內容信息。

3.按照下面導入數據

hive> load data local inpath '/home/hadoop/ziliao/userinfo.txt' overwrite into table userinfo;

hive> load data local inpath '/home/hadoop/ziliao/classinfo.txt' overwrite into table classinfo;

hive> load data local inpath '/home/hadoop/ziliao/choice.txt' overwrite into table choice;

查詢表數據

hive> select * from userinfo;

hive> select * from classinfo;

hive> select * from choice;

4.分區

創建分區

hive> create table ptest(userid int) partitioned by (name string) row format delimited fields terminated by '\t';

準備導入數據

xiapi.txt內容如下(數據之間用tab鍵隔開)：

1   

導入數據

hive> load data local inpath '/home/hadoop/ziliao/xiapi.txt' overwrite into table ptest partition (name='xiapi');

查看分區

hive> dfs -ls /user/hive/warehouse/ptest/name=xiapi;

查詢分區

hive> select * from ptest where name='xiapi';

顯示分區

hive> show partitions ptest;

對分區插入數據(每次都會覆蓋掉原來的數據):

hive> insert overwrite table ptest partition(name='xiapi') select id from userinfo where name='xiapi';

刪除分區

hive> alter table ptest drop partition (name='xiapi')

5.桶

可以把表或分區組織成桶， 桶是按行分開組織特定字段， 每個桶對應一個 reduce 操作。在建立桶之前， 需要設置“hive.enforce.bucketing”屬性爲 true， 使 Hive 能夠識別桶。在表中分桶的操作如下:

hive> set hive.enforce.bucketing=true;

hive> set hive.enforce.bucketing;

hive.enforce.bucketing=true;

hive> create table btest2(id int， name string) clustered by(id) into 3 buckets row format delimited fields terminated by '\t';

向桶中插入數據， 這裏按照用戶 id 分了三個桶， 在插入數據時對應三個 reduce 操作，輸出三個文件。

hive> insert overwrite table btest2 select * from userinfo;

查看數據倉庫下的桶目錄，三個桶對應三個目錄。

hive> dfs -ls /user/hive/warehouse/btest2;

Hive 使用對分桶所用的值進行 hash，並用 hash 結果除以桶的個數做取餘運算的方式來分桶，保證了每個桶中都有數據，但每個桶中的數據條數不一定相等，如下所示。

hive>dfs -cat /user/hive/warehouse/btest2/*0_0;

hive>dfs -cat /user/hive/warehouse/btest2/*1_0;

hive>dfs -cat /user/hive/warehouse/btest2/*2_0;

分桶可以獲得比分區更高的查詢效率，同時分桶也便於對全部數據進行採樣處理。下面是對桶取樣的操作。

hive>select * from btest2 tablesample(bucket 1 out of 3 on id);

6.多表插入

多表插入指的是在同一條語句中， 把讀取的同一份元數據插入到不同的表中。只需要掃描一遍元數據即可完成所有表的插入操作， 效率很高。多表操作示例如下。

hive> create table mutill as select id，name from userinfo; #有數據

hive> create table mutil2 like mutill; #無數據，只有表結構

hive> from userinfo insert overwrite table mutill

   select id，name insert overwrite table mutil2 select count(distinct id)，name group by name;

7.連接

連接是將兩個表中在共同數據項上相互匹配的那些行合併起來， HiveQL 的連接分爲內連接、左向外連接、右向外連接、全外連接和半連接 5 種。

a. 內連接(等值連接)

內連接使用比較運算符根據每個表共有的列的值匹配兩個表中的行。

例如， 檢索userinfo和choice表中標識號相同的所有行。

hive> select userinfo.*， choice.* from userinfo join choice on(userinfo.id=choice.userid);

b. 左連接

左連接的結果集包括“LEFT OUTER”子句中指定的左表的所有行， 而不僅僅是連接列所匹配的行。如果左表的某行在右表中沒有匹配行， 則在相關聯的結果集中右表的所有選擇列均爲空值。

hive> select userinfo.*， choice.* from userinfo left outer join choice on(userinfo.id=choice.userid);

c. 右連接

右連接是左向外連接的反向連接，將返回右表的所有行。如果右表的某行在左表中沒有匹配行，則將爲左表返回空值。

hive> select userinfo.*， choice.* from userinfo right outer join choice on(userinfo.id=choice.userid);

d. 全連接

全連接返回左表和右表中的所有行。當某行在另一表中沒有匹配行時，則另一個表的選擇列表包含空值。如果表之間有匹配行，則整個結果集包含基表的數據值。

hive> select userinfo.*， choice.* from userinfo full outer join choice on(userinfo.id=choice.userid);

e. 半連接

半連接是 Hive 所特有的， Hive 不支持 IN 操作，但是擁有替代的方案; left semi join， 稱爲半連接， 需要注意的是連接的表不能在查詢的列中，只能出現在 on 子句中。

hive> select userinfo.* from userinfo left semi join choice on (userinfo.id=choice.userid);

8.子查詢

標準 SQL 的子查詢支持嵌套的 select 子句，HiveQL 對子查詢的支持很有限，只能在from 引導的子句中出現子查詢。如下語句在 from 子句中嵌套了一個子查詢(實現了對教課最多的老師的查詢)。

hive>select teacher，MAX(class_num) from (select teacher,count(classname) as class_num from classinfo group by teacher)  subq group by teacher;

9.視圖操作

目前，只有 Hive0.6 之後的版本才支持視圖。

Hive 只支持邏輯視圖， 並不支持物理視圖， 建立視圖可以在 MySQL 元數據庫中看到創建的視圖表， 但是在 Hive 的數據倉庫目錄下沒有相應的視圖表目錄。

當一個查詢引用一個視圖時， 可以評估視圖的定義併爲下一步查詢提供記錄集合。這是一種概念的描述， 實際上， 作爲查詢優化的一部分， Hive 可以將視圖的定義與查詢的定義結合起來，例如從查詢到視圖所使用的過濾器。

在視圖創建的同時確定視圖的架構，如果隨後再改變基本表(如添加一列)將不會在視圖的架構中體現。如果基本表被刪除或以不兼容的方式被修改，則該視圖的查詢將被無效。

視圖是隻讀的，不能用於 LOAD/INSERT/ALTER。

視圖可能包含 ORDER BY 和 LIMIT 子句，如果一個引用了視圖的查詢也包含這些子句，那麼在執行這些子句時首先要查看視圖語句，然後返回結果按照視圖中的語句執行。

以下是創建視圖的例子:

hive> create view teacher_classsum as select teacher, count(classname)  from classinfo group by teacher;

刪除視圖：

hive>drop view teacher_classnum;

10.函數

創建函數

hive> create temporary function function_name as class_name

該語句創建一個由類名實現的函數。在 Hive 中用戶可以使用 Hive 類路徑中的任何類，用戶通過執行 add files 語句將函數類添加到類路徑，並且可持續使用該函數進行操作。

刪除函數

註銷用戶定義函數的格式如下:

hive> drop temporary function function_na

186. 部署Hive數據倉庫

6. 部署Hive數據倉庫

6.1 部署Hive

6.2 Hive用戶指南

6.2.1測試驗證

6.2.2 hive命令參數

6.2.3 表類型

6.2.4sql操作及桶

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

219. lolcat特效工具全網詳解

217. k8s_v1.15二進制部署【上】

218. k8s_v1.15二進制部署【中】

220. k8s_v1.15二進制部署【下】

216.解決Service Unavailable

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結