hive的庫及表的基本操作

一.數據庫操作

1.創建數據庫

create database financials;

2.創建數據庫，避免拋出錯誤信息

create database if not exists financials;

3.查看hive中包含的數據庫

show databases;

4.顯示當前所在數據庫

use financials ;

5.刪除空數據庫

drop database if exists financials;

6.刪除有表格的數據庫

drop database if exists financials cascade;

7.修改數據庫

alter database financials set dbproperties (`edited-by`=`job dba`);

二. 創建表

1.創建內部表

CREATE TABLE if not exists financials.employees1(

name STRING COMMENT 'Empolyee name',

salary FLOAT COMMENT 'Empolyee salary',

subordinates ARRAY<STRING> COMMENT 'Names of subordinates',

duduction MAP<STRING,FLOAT>

COMMENT 'Keys are deductions names,vlaues are percentages',

address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>

COMMENT 'Home address')

COMMENT 'Description ofthe table'

LOCATION '/user/hive/warehouse/financials.db/employees'

TBLPROPERTIES('creator'='li','created_at'='2014-1-23 10:00:00');

2.創建外部表

CREATE EXTERNAL TABLE financials.employees2(
ts String,
uid String,
keyword String,
rank int,
order int,
url String)
comment 'this is the sogou search data of one day'
row format delimited
fields terminated by '\t'
stored as textfile

location '/sogou/20150923';

3.創建分區表

CREATE TABLE financials.employees3(

name STRING,

salary FLOAT,

subordinates ARRAY<STRING>,

deductions MAP<STRING, FLOAT>,

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

)

PARTITIONED BY(country STRING, state STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

COLLECTION ITEMS TERMINATED BY '|'

MAP KEYS TERMINATED BY ':';

(1)創建分區後，會在相應的目錄下建立以分區命名的目錄，目錄下是分區的數據

{$pwd}.../employees/country=CH/state=BeiJin'
.../employees/country=US/state=NY
.../employees/country=US/state=AK

(2)查看錶的分區

hive>show partitions employees;
country=US/state=CL
country=US/state=NY
country=CH/state=BeiJin

(3)對分區進行查詢

hive> SHOWPARTITIONS employees PARTITION(country='US');
country=US/state=AL
country=US/state=AK

(4)對分區插入數據

load data local inpath '/app/hadoop/data/employees_1'
overwrite into table employees partition(country = 'CH',state = 'BeiJin');

load data local inpath '/app/hadoop/data/employees_2'
overwrite into table employees partition(country = 'US',state = 'NY');

注意：
當用戶不加限制條件對一個非常大的分區表進行全表掃描時，這樣觸發一個巨大的MapReduce Job，會給硬盤帶來很大的壓力。所以Hive強烈建議使用“strict”，即當用戶的查詢語句不加where條件時，是禁止對分區表進行查詢的。你能改成“nonstrict”模式（默認的模式）取消這種限制。

4.創建分桶表

可以的表或分區組織成桶，桶是用組織特定字段把行分開，每個桶對應一個reduce操作。

在建立桶之前，需要設置hive.enforce.bucketing屬性爲true,使hive能識別桶。

Create Table bemp(empno Int,ename String,mgr Int,sal Float,deptno Int)
clustered By (empno) Into 3 buckets
Row format delimited fields terminated By '\t';

查看數據倉庫下的桶目錄，三個桶對應三個文件
hive> dfs -ls /user/hive/warehouse/bemp ;
Found 3 items
-rw-r--r-- 2 licz supergroup 177 2013-12-17 01:15 /user/hive/warehouse/bemp/000000_0
-rw-r--r-- 2 licz supergroup 103 2013-12-17 01:15 /user/hive/warehouse/bemp/000001_0
-rw-r--r-- 2 licz supergroup 80 2013-12-17 01:15 /user/hive/warehouse/bemp/000002_0
hive> dfs -ls /user/hive/warehouse/bemp;
Found 3 items
-rw-r--r-- 2 licz supergroup 177 2013-12-17 01:15 /user/hive/warehouse/bemp/000000_0
-rw-r--r-- 2 licz supergroup 103 2013-12-17 01:15 /user/hive/warehouse/bemp/000001_0
-rw-r--r-- 2 licz supergroup 80 2013-12-17 01:15 /user/hive/warehouse/bemp/000002_0
hive> dfs -ls /user/hive/warehouse/bemp/000000_0;
Found 1 items
-rw-r--r-- 2 licz supergroup 177 2013-12-17 01:15 /user/hive/warehouse/bemp/000000_0

Hive使用對分桶所用的值進行hash，並用hash結果除以桶的個數取餘運算的方式來分桶，

保證每個桶裏都有數據，但每個桶中的記錄不一定相等。

hive> dfs -cat /user/hive/warehouse/bemp/000000_0;
7788 SCOTT 7566 3000.0 20
7839 KING \N 5000.0 10
7521 WARD 7698 1250.0 30
7566 JONES 7839 2975.0 20
7902 FORD 7566 3000.0 20
7698 BLAKE 7839 2850.0 30
7782 CLARK 7839 2450.0 10
hive> dfs -cat /user/hive/warehouse/bemp/000001_0;
7369 SMITH 7902 800.0 20
7654 MARTIN 7698 1250.0 30
7876 ADAMS 7788 1100.0 20
7900 JAMES 7698 950.0 30
hive> dfs -cat /user/hive/warehouse/bemp/000002_0;
7934 MILLER 7782 1300.0 10
7844 TURNER 7698 1500.0 30
7499 ALLEN 7698 1600.0 30

分桶可以獲得比分區更高的查詢效率，同時分權也便於對全部數據數據進行採樣，如下取樣操作
hive> select * from bemp tablesample(bucket 1 out of 3 on empno);
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
……
Total MapReduce CPU Time Spent: 6 seconds 110 msec
OK
7788 SCOTT 7566 3000.0 20
7839 KING NULL 5000.0 10
7521 WARD 7698 1250.0 30
7566 JONES 7839 2975.0 20
7902 FORD 7566 3000.0 20
7698 BLAKE 7839 2850.0 30
7782 CLARK 7839 2450.0 10
Time taken: 57.213 seconds, Fetched: 7 row(s)

5.顯示數據庫中的表

備註（默認的“數據倉庫”路徑地址 /user/hive/warehouse/）

show tables in financials;

6.刪除表

DROP TABLE IF EXISTS employees4;

如果啓用hadoop回收站（.Trash）功能，刪除的會移動到.Trash目錄，通過設置fs.trash.interval參數回收站的回收週期。

但不是能保證所有版本的都能使用這方法。如果不小心刪除了重要的管理表，可以重新創建一個相同表名的空表，

然後把回收站的移回原來的目錄，這樣就能恢復數據。

三. 加載數據入表

1.導入本地數據

load data local inpath '/data/employees1' overwrite into table employees1;

備註（如果導入的數據在HDFS上，則不需要加local關鍵字）

2.導入分區表數據

load data inpath '/data/employees3' overwrite into table employees3 partition(country = 'CH',state = 'BeiJin');

3.利用sqoop將mysql的數據導入

bin/sqoop export --connect jdbc:mysql://192.168.1.161:3306/test --username root --password 123 --table stats

--export-dir '/user/hive/warehouse/test.db/stats' --fields-terminated-by '\t'

如有錯誤歡迎指出，謝謝指教

hive的庫及表的基本操作

Sqoop詳細介紹包括：sqoop命令，原理，流程

hive的庫及表的基本操作

hive 的ARRAY,MAP,STRUCT使用

啓動hive時 mysql報錯

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結