5 DML操作

5.1 數據導入

5.1.1 向表中加載數據（load）

1. 語法：

hive> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student

（1）load data:表示加載數據

（2）local:表示從本地加載數據到hive表（複製）；否則從HDFS加載數據到hive表（移動）

（3）inpath:表示加載數據的路徑

（4）overwrite into:表示覆蓋表中已有數據，否則表示追加

（5）into table:表示加載到哪張表

（6）student:表示具體的表

2. 案例

（1）創建一張表

hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';

（2）加載本地文件到hive

hive (default)> load data local inpath '/opt/module/datas/student.txt' into table default.student;

（3）加載HDFS文件到hive中

上傳文件到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/hadoop/hive;

加載HDFS上數據（從hdfs上加載數據爲移動而不是複製）

hive (default)> load data inpath '/user/root/hive/student.txt' into table default.student;

（4）加載數據覆蓋表中已有的數據

上傳文件到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/hadoop/hive;

加載數據覆蓋表中已有的數據

hive (default)> load data inpath '/user/hadoop/hive/student.txt' overwrite into table default.student;

5.1.2 通過查詢語句向表中插入數據（Insert）

1．創建一張表

hive (default)> create table student(id int, name string) row format delimited fields terminated by '\t';

2．基本插入數據

hive (default)> insert into table student values(1,'wangwang');

快速複製表數據（快速備份）

insert into table student select * from student1；

5.1.3 查詢語句中創建表並加載數據（As Select）

根據查詢到的結果創建表（查詢到的結果會添加到新創建的表中）

create table if not exists student3 as select id, name from student;

5.1.4 創建表時通過location指定加載數據路徑

1. 創建表，並指定在hdfs上的位置（如果當前路徑沒有數據，表爲空，等到該路徑有數據時再加載）

hive (default)> create table if not exists student4( id int, name string )

row format delimited fields terminated by '\t' location '/user/hive/warehouse/student4';

5.1.5 Import數據到指定Hive表中

hive (default)> import table student8 from '/user/hive/warehouse/export/student';

5.2 數據導出

5.2.1 Insert導出

1. 將查詢結果導出到本地

hive (default)>insert overwrite local directory '/opt/module/datas/export/student' select * from student;

2. 將查詢到的結果格式化導出到本地

hive (default)>insert overwrite local directory '/opt/module/datas/export/student1' row format delimited fields terminated by '\t'

select * from student;

3. 將查詢到的結果導出到HDFS上（把local去掉即可）

hive (default)>insert overwrite directory '/opt/module/datas/export/student1' row format delimited fields terminated by '\t'

select * from student;

5.2.2 Hadoop與hive Shell導出到本地對比

1. hadoop

hive (default)> dfs -get /user/hive/warehouse/student/student.txt /opt/module/datas/export/student3.txt;

2. hive Shell

[hadoop@hadoop101 hive]$ bin/hive -e 'select * from default.student;' > /opt/module/datas/export/student4.txt;

5.2.3 Export 導出到HDFS上

hive (default)>export table default.student to '/user/hive/warehouse/export/student';

5.3 清除表中數據

hive (default)> truncate table student;

6 DQL查詢

查詢基本語法（與數據庫查詢基本一致）

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[ORDER BY col_list]

[SORT BY col_list]

[LIMIT number]

HQL和SQL語法基本上一致，這也是Hive的一個很大優勢，很容易使熟悉sql的人員上手，下面就值介紹一些Hive的不同之處：

1. Like和RLike

RLike是Hive中的一個擴展，其可以通過Java的正則表達式這個更強大的語言來直指定匹配條件

比如：Rlike查詢員工薪水中包含2的員工信息

hive (default)> select * from emp where sal RLIKE '[2]';

2. HQL底層是將查詢語句翻譯爲MapReduce程序

sory by：是對MapReduce內部進行排序，對全局結果集來說不是排序（也就是說分區內有序，全局不一定有序）

既然HQL最後是MapReduce程序，那麼我們就可以設置reduce的個數

設置reduce個數：

hive (default)> set mapreduce.job.reduces=3;

查看reduce的個數

hive (default)> set mapreduce.job.reduces;

其他常用的查詢函數

1. 空字段賦值

NVL：給值爲NULL的數據賦值，它的格式是NVL( string1, replace_with)。它的功能是如果string1爲NULL，則NVL函數返回replace_with的值，否則返回string1的值，如果兩個參數都爲NULL ，則返回NULL。

例如：oracle數據庫推廣使用過的一個例子，計算每個員工的年薪（12*sla + comm）

select ename,(sal*12 + nvl(comm,0)) yearSal from emp;

例二：如果員工的comm爲null，則用領導id代替

hive (default)> select nvl(comm,mgr) from emp;

2. case when

1. 準備數據

name	dept_id	sex
八戒	A	男
猴哥	A	男
鬆鬆	B	男
鳳姐	A	女
小可愛	B	女
萌萌	B	女

2．需求

求出不同部門男女各多少人。結果如下：

A 2 1

B 1 2

3．創建本地emp_sex.txt，導入數據

[bigdata@hadoop102 datas]$ vim emp_sex.txt

八戒 A 男

猴哥 A 男

鬆鬆 B 男

鳳姐 A 女

小可愛 B 女

萌萌 B 女

4．創建hive表並導入數據

create table emp_sex(name string, dept_id string, sex string)

row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/emp_sex.txt' into table emp_sex;

5．按需求查詢數據

select

dept_id,

sum(case sex when '男' then 1 else 0 end) male_count,

sum(case sex when '女' then 1 else 0 end) female_count

from

emp_sex

group by

dept_id;

3. 行轉列

1. 相關函數說明

CONCAT(string A/col, string B/col…)：返回輸入字符串連接後的結果，支持任意個輸入字符串;

CONCAT_WS(separator, str1, str2,...)：它是一個特殊形式的 CONCAT()。第一個參數爲剩餘參數間的分隔符。分隔符可以是與剩餘參數一樣的字符串。如果分隔符是 NULL，返回值也將爲 NULL。這個函數會跳過分隔符參數後的任何 NULL 和空字符串。分隔符將被加到被連接的字符串之間;

COLLECT_SET(col)：函數只接受基本數據類型，它的主要作用是將某字段的值進行去重彙總，產生array類型字段。

2. 數據

name	Constellation	blood_type
八戒	白羊座	A
猴哥	射手座	A
鬆鬆	白羊座	B
豬八戒	白羊座	A
鳳姐	射手座	A

3. 需求：把星座和血型一樣的人歸類到一起。結果如下：

射手座,A            猴哥|鳳姐

白羊座,A            八戒|豬八戒

白羊座,B            鬆鬆

4．創建本地person.txt，導入數據

[bigdata@hadoop102 datas]$ vim person.txt

八戒    白羊座 A

猴哥    射手座 A

鬆鬆    白羊座 B

豬八戒白羊座 A

鳳姐    射手座 A

5．創建hive表並導入數據

create table person_info(name string, xingzuo string, blood_type string)

row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/person.txt' into table person_info;

分析：

6. 實現

select

    t1.base,

    concat_ws('|', collect_set(t1.name)) name

from

    (select

        name,

        concat(xingzuo, ',' , blood_type) base

    from

        person_info) t1

group by

    t1.base;

4. 列轉行

1. 函數說明

EXPLODE(col)：將hive一列中複雜的array或者map結構拆分成多行。

LATERAL VIEW

用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias

解釋：用於和split, explode等UDTF一起使用，它能夠將一列數據拆成多行數據，在此基礎上可以對拆分後的數據進行聚合。

2. 數據準備

Movie	category
《福爾摩斯》	懸疑,動作,科幻,劇情
《無間道》	懸疑,警匪,動作,心理,劇情
《紅海行動》	戰爭,動作,災難

3. 需求：將電影分類中的數組展開。結果如下：

《福爾摩斯》懸疑

《福爾摩斯》動作

《福爾摩斯》科幻

《福爾摩斯》劇情

《無間道》懸疑

《無間道》警匪

《無間道》動作

《無間道》心理

《無間道》劇情

《紅海行動》戰爭

《紅海行動》動作

《紅海行動》災難

4. 創建本地movie.txt，導入數據

[bigdata@hadoop102 datas]$ vi movie.txt

《福爾摩斯》懸疑,動作,科幻,劇情

《無間道》懸疑,警匪,動作,心理,劇情

《紅海行動》戰爭,動作,災難

5．創建hive表並導入數據

create table movie_info( movie string, category array<string>)

row format delimited fields terminated by "\t"

collection items terminated by ","; //第二個字段以，分割

load data local inpath "/opt/module/datas/movie.txt" into table movie_info;

6．按需求查詢數據

select movie, category_name

from movie_info lateral view explode(category) table_tmp as category_name;

7 函數

7.1 系統內置函數

1．查看系統自帶的函數

hive> show functions;

2．顯示自帶的函數的用法

hive> desc function upper;

3．詳細顯示自帶的函數的用法

hive> desc function extended upper;

7.2 自定以函數

1）Hive 自帶了一些函數，比如：max/min等，但是數量有限，自己可以通過自定義UDF來方便的擴展。

2）當Hive提供的內置函數無法滿足你的業務處理需要時，此時就可以考慮使用用戶自定義函數（UDF：user-defined function）。

3）根據用戶自定義函數類別分爲以下三種：

（1）UDF（User-Defined-Function）一進一出

（2）UDAF（User-Defined Aggregation Function）聚集函數，多進一出類似於：count/max/min

（3）UDTF（User-Defined Table-Generating Functions）一進多出如lateral view explore()

4）官方文檔地址：https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5）編程步驟：

（1）繼承org.apache.hadoop.hive.ql.UDF

（2）需要實現evaluate函數；evaluate函數支持重載；

（3）在hive的命令行窗口創建函數

a）添加jar

add jar linux_jar_path

b）創建function，

create [temporary] function [dbname.]function_name AS class_name;

（4）在hive的命令行窗口刪除函數

drop [temporary] function [if exists] [dbname.]function_name;

6）注意事項

（1）UDF必須要有返回類型，可以返回null，但是返回類型不能爲void；

7.3 自定義UDF函數

<dependency>

<groupId>org.apache.hive</groupId>

<artifactId>hive-exec</artifactId>

<version>1.2.1</version>

</dependency>

3．創建一個類

package com.bigdata.hive;
import org.apache.hadoop.hive.ql.exec.UDF;

public class Lower extends UDF {   
    public String evaluate (final String s) {
        if (s == null) {
        return null;
        }
    return s.toLowerCase();
    }
}

4．打成jar包上傳到服務器/opt/module/jars/udf.jar

5．將jar包添加到hive的classpath

hive (default)> add jar /opt/module/jars/udf.jar;

6．創建臨時函數與開發好的java class關聯

hive (default)> create temporary function mylower as "com.bigdata.hive.Lower";

7．即可在hql中使用自定義的函數strip

hive (default)> select ename, mylower(ename) lowername from emp;

Hadoop生態圈（六）：Hive（二）

5 DML操作

5.1 數據導入

5.1.1 向表中加載數據（load）

5.1.2 通過查詢語句向表中插入數據（Insert）

5.1.3 查詢語句中創建表並加載數據（As Select）

5.1.4 創建表時通過location指定加載數據路徑

5.1.5 Import數據到指定Hive表中

5.2 數據導出

5.2.1 Insert導出

5.2.2 Hadoop與hive Shell導出到本地對比

5.2.3 Export 導出到HDFS上

5.3 清除表中數據

6 DQL查詢

其他常用的查詢函數

7 函數

7.1 系統內置函數

7.2 自定以函數

7.3 自定義UDF函數

Hadoop生態圈（七）：Sqoop

Hadoop生態圈（三）：MapReduce

spark學習（四）：共享變量及一些優化

Hadoop生態圈（八）：Flume

Hadoop生態圈（五）：Zookeeper

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結