Hive的DML數據操作

DML是Data Manipulation Language的縮寫，意思是數據庫操縱語言，主要是指對數據庫的增刪改查操作。Hive雖然是數據倉庫，但是它也有的DML。今天剛好學習了，通過此文章來鞏固並記錄學習過程。

1.數據導入

向表中裝載數據（Load）

load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示加載數據
（2）local:表示從本地加載數據到hive表；否則從HDFS加載數據到hive表
（3）inpath:表示加載數據的路徑
（4）overwrite:表示覆蓋表中已有數據，否則表示追加
（5）into table:表示加載到哪張表
（6）student:表示具體的表
（7）partition:表示上傳到指定分區
案例：
首先創建一個含有id和name列的學生表：

由於指定了行分隔符是\t，所以hive會以\t來分割student.txt裏面的數據，如果分割出來的數據大於創建表的列數，hive會把多出來的數據拋棄；如果不足創表時的列數，則會一NULL值代替。student.txt文件如下所示：

加載本地數據到hive

把本地的student.txt文件內容插入到student表中：

load data local inpath '/opt/module/datas/student.txt' into table default.student;

加載hdfs的數據到hive：

load data inpath '/student.txt' into table default.student;

加載數據覆蓋表中已有的數據

load data inpath '/user/atguigu/hive/student.txt' overwrite into table default.student;

通過查詢語句向表中插入數據（Insert）
創建一張分區表：

create table student(
id string,name string)
partitioned by(month string)
row format delimited fields treminated by '\t';

插入一些基礎數據
這樣的效率是非常低的，因爲單詞插入的數據非常有限，而每次執行這個語句都要執行一次mapreduce程序，執行一次大概要20秒得時間，所以效率很低，不常用

insert into table  student partition(month='201709') values('1','wangwu'),('2','zhaoliu');

將某張表的查詢結果插入到表內
insert into：以追加數據的方式插入到表或分區，原有數據不會刪除
insert overwrite：會覆蓋表或分區中已存在的數據

insert overwrite table student partition(month='201708')
             select id, name from student where month='201709';

多表（多分區）插入模式（根據多張表查詢結果）
將多張表或者多個分區查詢結果插入到表內

from student
              insert overwrite table student partition(month='201707')
              select id, name where month='201709'
              insert overwrite table student partition(month='201706')
              select id, name where month='201709';

查詢語句中創建表並加載數據（As Select）
根據查詢結果創建表（查詢的結果會添加到新創建的表中）

create table if not exists student3
as select id, name from student;

創建表時通過Location指定加載數據路徑
創建表，並指定在hdfs上的位置：

create external table if not exists student5(
              id int, name string
              )
              row format delimited fields terminated by '\t'
              location '/student;

Import數據到指定Hive表中
/user/hive/warehouse/export/student爲Hive導出的數據

import table student2 partition(month='201709') from
 '/user/hive/warehouse/export/student';

2.數據導出

Insert導出

將查詢的結果導出到本地

insert overwrite local directory '/opt/module/datas/export/student'
            select * from student;

將查詢的結果格式化導出到本地

insert overwrite local directory '/opt/module/datas/export/student1'
           ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             select * from student;

將查詢的結果導出到HDFS上(沒有local)

insert overwrite directory '/user/atguigu/student2'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
             select * from student;

Hadoop命令導出到本地

dfs -get /user/hive/warehouse/student/month=201709/000000_0
/opt/module/datas/export/student3.txt;

Hive Shell 命令導出

hive -e 'select * from default.student;' >
 /opt/module/datas/export/student4.txt;

Export導出到HDFS上

export table default.student to
 '/user/hive/warehouse/export/student';

3.清除表中數據（Truncate）

Truncate只能刪除管理表，不能刪除外部表中數據

truncate table student;

情深不僅李義山

發佈了18 篇原創文章 · 獲贊 11 · 訪問量 5128

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive的DML數據操作

1.數據導入

2.數據導出

3.清除表中數據（Truncate）

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

windows上傳文件到虛擬機的hdfs上

Flink的WaterMark詳解

Hive管理表和外部表的區別

mysql把數據插入order表出錯

Spark Hello World以及RDD簡述

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結