Hive shell基本操作

創建數據庫與創建數據庫表

創建數據庫操作

創建數據庫

create database if not exists myhive;

use  myhive;

說明：hive的表存放位置模式是由hive-site.xml當中的一個屬性指定的

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

創建數據庫並指定hdfs存儲位置

create database myhive2 location '/myhive2';

修改數據庫

可以使用alter database 命令來修改數據庫的一些屬性。但是數據庫的元數據信息是不可更改的，包括數據庫的名稱以及數據庫所在的位置

alter  database  myhive2  set  dbproperties('createtime'='20180611');

查看數據庫詳細信息

查看數據庫基本信息

desc  database  myhive2;

查看數據庫更多詳細信息

desc database extended  myhive2;

刪除數據庫

刪除一個空數據庫，如果數據庫下面有數據表，那麼就會報錯

drop  database  myhive2;

強制刪除數據庫，包含數據庫下面的表一起刪除

drop  database  myhive  cascade;

不要執行了

創建數據庫表操作

創建數據庫表語法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

   [(col_name data_type [COMMENT col_comment], ...)]

   [COMMENT table_comment]

   [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

   [CLUSTERED BY (col_name, col_name, ...)

   [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

   [ROW FORMAT row_format]

   [STORED AS file_format]

   [LOCATION hdfs_path]

說明：

CREATE TABLE 創建一個指定名字的表。如果相同名字的表已經存在，則拋出異常；用戶可以用 IF NOT EXISTS 選項來忽略這個異常。
EXTERNAL關鍵字可以讓用戶創建一個外部表，在建表的同時指定一個指向實際數據的路徑（LOCATION），Hive 創建內部表時，會將數據移動到數據倉庫指向的路徑；若創建外部表，僅記錄數據所在的路徑，不對數據的位置做任何改變。在刪除表的時候，內部表的元數據和數據會被一起刪除，而外部表只刪除元數據，不刪除數據。
LIKE 允許用戶複製現有的表結構，但是不復制數據。
ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

用戶在建表的時候可以自定義 SerDe 或者使用自帶的 SerDe。如果沒有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED，將會使用自帶的 SerDe。在建表的時候，用戶還需要爲表指定列，用戶在指定表的列的同時也會指定自定義的 SerDe，Hive通過 SerDe 確定表的具體的列的數據。

STORED AS

SEQUENCEFILE|TEXTFILE|RCFILE

如果文件數據是純文本，可以使用 STORED AS TEXTFILE。如果數據需要壓縮，使用 STORED AS SEQUENCEFILE。

2、CLUSTERED BY

對於每一個表（table）或者分區， Hive可以進一步組織成桶，也就是說桶是更爲細粒度的數據範圍劃分。Hive也是針對某一列進行桶的組織。Hive採用對列值哈希，然後除以桶的個數求餘的方式決定該條記錄存放在哪個桶當中。

把表（或者分區）組織成桶（Bucket）有兩個理由：

（1）獲得更高的查詢處理效率。桶爲表加上了額外的結構，Hive 在處理有些查詢時能利用這個結構。具體而言，連接兩個在（包含連接列的）相同列上劃分了桶的表，可以使用 Map 端連接（Map-side join）高效的實現。比如JOIN操作。對於JOIN操作兩個表有一個相同的列，如果對這兩個表都進行了桶操作。那麼將保存相同列值的桶進行JOIN操作就可以，可以大大較少JOIN的數據量。

（2）使取樣（sampling）更高效。在處理大規模數據集時，在開發和修改查詢的階段，如果能在數據集的一小部分數據上試運行查詢，會帶來很多方便。

管理表

hive建表初體驗

use myhive;

create table stu(id int,name string);

insert into stu values (1,"zhangsan");

select * from stu;

Hive建表時候的字段類型

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

分類	類型	描述	字面量示例
原始類型	BOOLEAN	true/false	TRUE
	TINYINT	1字節的有符號整數 -128~127	1Y
	SMALLINT	2個字節的有符號整數，-32768~32767	1S
	INT	4個字節的帶符號整數	1
	BIGINT	8字節帶符號整數	1L
	FLOAT	4字節單精度浮點數1.0
	DOUBLE	8字節雙精度浮點數	1.0
	DEICIMAL	任意精度的帶符號小數	1.0
	STRING	字符串，變長	“a”,’b’
	VARCHAR	變長字符串	“a”,’b’
	CHAR	固定長度字符串	“a”,’b’
	BINARY	字節數組	無法表示
	TIMESTAMP	時間戳，毫秒值精度	122327493795
	DATE	日期	‘2016-03-29’
	INTERVAL	時間頻率間隔
複雜類型	ARRAY	有序的的同類型的集合	array(1,2)
	MAP	key-value,key必須爲原始類型，value可以任意類型	map(‘a’,1,’b’,2)
	STRUCT	字段集合,類型可以不同	struct(‘1’,1,1.0), named_stract(‘col1’,’1’,’col2’,1,’clo3’,1.0)
	UNION	在有限取值範圍內的一個值	create_union(1,’a’,63)

創建表並指定字段之間的分隔符

create  table if not exists stu2(id int ,name string) row format delimited fields terminated by '\t' stored as textfile location '/user/stu2';

根據查詢結果創建表

create table stu3 as select * from stu2;

根據已經存在的表結構創建表

create table stu4 like stu2;

查詢表的類型

desc formatted  stu2;

外部表：

外部表說明：

外部表因爲是指定其他的hdfs路徑的數據加載到表當中來，所以hive表會認爲自己不完全獨佔這份數據，所以刪除hive表的時候，數據仍然存放在hdfs當中，不會刪掉

管理表和外部表的使用場景：

每天將收集到的網站日誌定期流入HDFS文本文件。在外部表（原始日誌表）的基礎上做大量的統計分析，用到的中間表、結果表使用內部表存儲，數據通過SELECT+INSERT進入內部表。

操作案例

分別創建老師與學生表外部表，並向表中加載數據

創建老師表：

create external table techer (t_id string,t_name string) row format delimited fields terminated by '\t';

創建學生表：

create external table student (s_id string,s_name string,s_birth string , s_sex string ) row format delimited fields terminated by '\t';

從本地文件系統向表中加載數據

load data local inpath '/export/servers/hivedatas/student.csv' into table student;

加載數據並覆蓋已有數據

load data local inpath '/export/servers/hivedatas/student.csv' overwrite  into table student;

從hdfs文件系統向表中加載數據（需要提前將數據上傳到hdfs文件系統，其實就是一個移動文件的操作）

cd /export/servers/hivedatas

hdfs dfs -mkdir -p /hivedatas

hdfs dfs -put techer.csv /hivedatas/

load data inpath '/hivedatas/techer.csv' into table techer;

如果刪掉student表，hdfs的數據仍然存在，並且重新創建表之後，表中就直接存在數據了,因爲我們的student表使用的是外部表，drop table之後，表當中的數據依然保留在hdfs上面了

分區表：

在大數據中，最常用的一種思想就是分治，我們可以把大的文件切割劃分成一個個的小的文件，這樣每次操作一個小的文件就會很容易了，同樣的道理，在hive當中也是支持這種思想的，就是我們可以把大的數據，按照每天，或者每小時進行切分成一個個的小的文件，這樣去操作小的文件就會容易得多了

創建分區表語法

create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

創建一個錶帶多個分區

create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t';

加載數據到分區表中

load data local inpath '/export/servers/hivedatas/score.csv' into table score partition (month='201806');

加載數據到一個多分區的表中去

load data local inpath '/export/servers/hivedatas/score.csv' into table score2 partition(year='2018',month='06',day='01');

多分區聯合查詢使用union all來實現

select * from score where month = '201806' union all select * from score where month = '201806';

查看分區

show  partitions  score;

添加一個分區

alter table score add partition(month='201805');

同時添加多個分區

alter table score add partition(month='201804') partition(month = '201803');

注意：添加分區之後就可以在hdfs文件系統當中看到表下面多了一個文件夾

刪除分區

alter table score drop partition(month = '201806');

外部分區表綜合練習：

需求描述：現在有一個文件score.csv文件，存放在集羣的這個目錄下/scoredatas/month=201806，這個文件每天都會生成，存放到對應的日期文件夾下面去，文件別人也需要公用，不能移動。需求，創建hive對應的表，並將數據加載到表中，進行數據統計分析，且刪除表之後，數據不能刪除

需求實現:

數據準備：

hdfs dfs -mkdir -p /scoredatas/month=201806

hdfs dfs -put score.csv /scoredatas/month=201806/

創建外部分區表，並指定文件數據存放目錄

create external table score4(s_id string, c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by '\t' location '/scoredatas';

進行表的修復,說白了就是建立我們表與我們數據文件之間的一個關係映射

msck  repair   table  score4;

修復成功之後即可看到數據已經全部加載到表當中去了

第二種實現方式，上傳數據之後手動添加分區即可

數據準備：

hdfs dfs -mkdir -p /scoredatas/month=201805

hdfs dfs -put score.csv /scoredatas/month=201805

修改表，進行手動添加方式

alter table score4 add partition(month='201805');

分桶表

將數據按照指定的字段進行分成多個桶中去，說白了就是將數據按照字段進行劃分，可以將數據按照字段劃分到多個文件當中去

開啓hive的桶表功能

set hive.enforce.bucketing=true;

設置reduce的個數

set mapreduce.job.reduces=3;

創建通表

create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by '\t';

桶表的數據加載，由於通標的數據加載通過hdfs dfs -put文件或者通過load data均不好使，只能通過insert overwrite

創建普通表，並通過insert overwrite的方式將普通表的數據通過查詢的方式加載到桶表當中去

創建普通表：

create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';

普通表中加載數據

load data local inpath '/export/servers/hivedatas/course.csv' into table course_common;

通過insert overwrite給桶表中加載數據

insert overwrite table course select * from course_common cluster by(c_id);

修改表

表重命名

基本語法：

 alter  table  old_table_name  rename  to  new_table_name;

把表score4修改成score5

 alter table score4 rename to score5;

增加/修改列信息

（1）查詢表結構

desc score5;

（2）添加列

alter table score5 add columns (mycol string, mysco string);

（3）查詢表結構

desc score5;

（4）更新列

alter table score5 change column mysco mysconew int;

（5）查詢表結構

desc score5;

刪除表

drop table score5;

hive表中加載數據

直接向分區表中插入數據

create table score3 like score;

insert into table score3 partition(month ='201807') values ('001','002','100');

通過查詢插入數據

通過load方式加載數據

load data local inpath '/export/servers/hivedatas/score.csv' overwrite into table score partition(month='201806');

通過查詢方式加載數據

create table score4 like score;

insert overwrite[a1]  table score4 partition(month = '201806') select s_id,c_id,s_score from score;

多插入模式

常用於實際生產環境當中，將一張表拆開成兩部分或者多部分

給score表加載數據

load data local inpath '/export/servers/hivedatas/score.csv' overwrite into table score partition(month='201806');

創建第一部分表：

create table score_first( s_id string,c_id  string) partitioned by (month string) row format delimited fields terminated by '\t' ;

創建第二部分表：

create table score_second(c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

分別給第一部分與第二部分表加載數據

from score insert overwrite table score_first partition(month='201806') select s_id,c_id insert overwrite table score_second partition(month = '201806')  select c_id,s_score;

查詢語句中創建表並加載數據（as select）

將查詢的結果保存到一張表當中去

create table score5 as select * from score;

創建表時通過location指定加載數據路徑

創建表，並指定在hdfs上的位置

create external table score6 (s_id string,c_id string,s_score int) row format delimited fields terminated by '\t' location '/myscore6';

2）上傳數據到hdfs上

hdfs dfs -mkdir -p /myscore6

hdfs dfs -put score.csv /myscore6;

3）查詢數據

select * from score6;

export導出與import 導入 hive表數據（內部表操作）

create table techer2 like techer;

export table techer to  '/export/techer';

import table techer2 from '/export/techer';

Hive shell基本操作

創建數據庫與創建數據庫表

一個開源且全面的C#算法實戰教程

C語言--右移左移

12款高效開源Wiki系統推薦，打造團隊知識管理利器

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

常用的 Git 指令

sm4加密工具類

hadoop 僞集羣搭建完整版

文本編輯工具vim（筆記）

爬蟲驅動下載

多線程設計模式---Future設計模式（筆記）

maven指定jdk版本

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結