大數據開發----Hive（入門篇）

原創

谷震平

2018-09-01 22:33

前言

本篇介紹Hive的一些常用知識。要說和網上其他manual的區別，那就是這是筆者寫的一套成體系的文檔，不是隨心所欲而作。

本文所用的環境爲：

CentOS 6.5 64位
Hive 2.1.1
Java 1.8

Hive Architecture

引自官網，務必仔細閱讀：

Figure 1 also shows how a typical query flows through the system. The UI calls the execute interface to the Driver (step 1 in Figure 1). The Driver creates a session handle for the query and sends the query to the compiler to generate an execution plan (step 2). The compiler gets the necessary metadata from the metastore (steps 3 and 4). This metadata is used to typecheck the expressions in the query tree as well as to prune partitions based on query predicates. The plan generated by the compiler (step 5) is a DAG of stages with each stage being either a map/reduce job, a metadata operation or an operation on HDFS. For map/reduce stages, the plan contains map operator trees (operator trees that are executed on the mappers) and a reduce operator tree (for operations that need reducers). The execution engine submits these stages to appropriate components (steps 6, 6.1, 6.2 and 6.3). In each task (mapper/reducer) the deserializer associated with the table or intermediate outputs is used to read the rows from HDFS files and these are passed through the associated operator tree. Once the output is generated, it is written to a temporary HDFS file though the serializer (this happens in the mapper in case the operation does not need a reduce). The temporary files are used to provide data to subsequent map/reduce stages of the plan. For DML operations the final temporary file is moved to the table’s location. This scheme is used to ensure that dirty data is not read (file rename being an atomic operation in HDFS). For queries, the contents of the temporary file are read by the execution engine directly from HDFS as part of the fetch call from the Driver (steps 7, 8 and 9).

Hive QL

創建數據庫

-- 創建hello_world數據庫
create database hello_world;

查看所有數據庫

show databases;

查看所有表

show tables;

創建內部表

-- 創建hello_world_inner
create table hello_world_inner
(
    id bigint, 
    account string, 
    name string,
    age int
)
row format delimited fields terminated by '\t';

創建分區表

create table hello_world_parti
(
    id bigint,
    name string
)
partitioned by (dt string, country string)
;

展示表分區

show partition hello_world_parti;

更改表名稱

alter table hello_world_parti to hello_world2_parti;

導入數據

load data local inpath '/home/deploy/user_info.txt' into table user_info;

導入數據的幾種方式

比如有一張測試表：

create table hello
(
id int,
name string,
message string
)
partitioned by (
dt string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
;

從本地文件系統中導入數據到hive表

例如：

load data local inpath 'data.txt' into table hello;

從HDFS上導入數據到hive表
從別的表中查詢出相應的數據並導入到hive表中
創建表時從別的表查到數據並插入的所創建的表中

結語

更多學習交流、技術分析，可加微信羣聊–谷同學的IT圈。掃碼進入：

參考資料

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

大數據開發----Hive（入門篇）

前言

Hive Architecture

Hive QL

導入數據的幾種方式

結語

參考資料

Python篇----多線程1TB數據生成腳本

大數據工具測評：Clickhouse vs TiDB vs Palo

3D打印----Cura軟件二次開發

HTML5----HTML在Browser中的運行機制（筆記篇）

Python篇----提供pip和virtualenv的Uranium之介紹（翻譯篇）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結