注:hive其他語法在hive官網有說明,建議初學者,去官網學習一手的資料,
官網:https://cwiki.apache.org/confluence/display/Hive/Home#Home-UserDocumentation
官網說明
Hive建表方式共有三種:
直接建表法
查詢建表法
like建表法
首先看官網介紹
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
一、hive的string和varchar的區別
簡要介紹:
Hive有2種類型用於存儲變長文本。
1.Hive-0.12.0版本引入了VARCHAR類型,VARCHAR類型使用長度指示器(1到65355)創建,長度指示器定義了在字符串中允許的最大字符數量。如果一個字符串值轉換爲或者被賦予一個varchar值,其長度超過了長度指示器則該字符串值會自動被截斷。
2.STRING存儲變長的文本,對長度沒有限制。理論上將STRING可以存儲的大小爲2GB,但是存儲特別大的對象時效率可能受到影響,可以考慮使用Sqoop提供的大對象支持。
二、兩者主要區別:
1.VARCHAR與STRING類似,但是STRING存儲變長的文本,對長度沒有限制;varchar長度上只允許在1-65355之間。
2.還沒有通用的UDF可以直接用於VARCHAR類型,可以使用String UDF代替,VARCHAR將會轉換爲String再傳遞給UDF。
觀察可發現一共有三種建表方式,接下來我們將一一講解。
1.直接建表法:
create table table_name(col_name data_type);
一個複雜的例子
主要要按照上面的定義的格式順序去進行編寫
CREATE EXTERNAL TABLE IF NOT EXISTS `dmp_clearlog` (
`date_log` string COMMENT 'date in file',
`hour` int COMMENT 'hour',
`device_id` string COMMENT '(android) md5 imei / (ios) origin mac',
`imei_orgin` string COMMENT 'origin value of imei',
`mac_orgin` string COMMENT 'origin value of mac',
`mac_md5` string COMMENT 'mac after md5 encrypt',
`android_id` string COMMENT 'androidid',
`os` string COMMENT 'operating system',
`ip` string COMMENT 'remote real ip',
`app` string COMMENT 'appname' )
COMMENT 'cleared log of origin log'
PARTITIONED BY (
`date` date COMMENT 'date used by partition'
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
TBLPROPERTIES ('creator'='szh', 'crate_time'='2018-06-07')
;
這裏我們針對裏面的一些不同於關係型數據庫的地方進行說明。
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
Hive將HDFS上的文件映射成表結構,通過分隔符來區分列(比如’,’ ‘;’ or ‘^’ 等),row format就是用於指定序列化和反序列化的規則。
比如對於以下記錄:
1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin
逗號用於分割列(FIELDS TERMINATED BY char:對應ID、name、hobby(數組形式,COLLECTION ITEMS TERMINATED BY char)、address(鍵值對形式map,MAP KEYS TERMINATED BY char)),而LINES TERMINATED BY char 用於區分不同條的數據,默認是換行符;
file format(HDFS文件存放的格式)
默認是TEXTFILE,即文本格式,可以直接打開。
如下:根據上述文件內容,創建一個表t1
create table t1(
id int
,name string
,hobby array<string>
,add map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;
下面插入數據
注:一般很少用insert (不是insert overwrite)語句,因爲就算就算插入一條數據,也會調用MapReduce,這裏我們選擇Load Data的方式。
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
然後上載
load data local inpath '/home/hadoop/Desktop/data' overwrite into table t1;
external
未被external修飾的是內部表(managed table),被external修飾的爲外部表(external table);
區別:
內部表數據由Hive自身管理,外部表數據由HDFS管理;
內部表數據存儲的位置是hive.metastore.warehouse.dir(默認:/user/hive/warehouse),外部表數據的存儲位置由自己制定;
刪除內部表會直接刪除元數據(metadata)及存儲數據;刪除外部表僅僅會刪除元數據,HDFS上的文件並不會被刪除;
對內部表的修改會將修改直接同步給元數據,而對外部表的表結構和分區進行修改,則需要修復(MSCK REPAIR TABLE table_name;)
創建一個外部表t2
create external table t2(
id int
,name string
,hobby array<string>
,add map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'
;
裝載數據
load data local inpath '/home/hadoop/Desktop/data' overwrite into table t2;
2.查詢建表法
通過AS 查詢語句完成建表:將子查詢的結果存在新表裏,有數據
一般用於中間表
CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;
3.like建表法
會創建結構完全相同的表,但是沒有數據。
常用語中間表
CREATE TABLE empty_key_value_store
LIKE key_value_store;