hive常用命令和壓縮

原創

2020-02-24 21:37

1.創建數據庫

1）創建一個數據庫，數據庫在HDFS上的默認存儲路徑是/user/hive/warehouse/*.db。

hive (default)> create database db_hive;

2）避免要創建的數據庫已經存在錯誤，增加if not exists判斷。（標準寫法）

hive (default)> create database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists
hive (default)> create database if not exists db_hive;

3）創建一個數據庫，指定數據庫在HDFS上存放的位置

hive (default)> create database db_hive2 location '/db_hive2.db';

2.刪除數據庫

1．刪除空數據庫

hive>drop database db_hive2;

2．如果刪除的數據庫不存在，最好採用 if exists判斷數據庫是否存在

hive> drop database db_hive;
FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
hive> drop database if exists db_hive2;

3．如果數據庫不爲空，可以採用cascade命令，強制刪除

hive> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
hive> drop database db_hive cascade;

4.建表語義(列式存儲+snappy壓縮)

create external table dwd_order_info (
    `id` string COMMENT '',
    `total_amount` decimal(10,2) COMMENT '',
    `order_status` string COMMENT ' 1 2 3 4 5',
    `user_id` string COMMENT 'id',
    `payment_way` string COMMENT '',
    `out_trade_no` string COMMENT '',
    `create_time` string COMMENT '',
    `operate_time` string COMMENT ''
) 
PARTITIONED BY (`dt` string)
stored as (parquet|orc)
location '/warehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy");

3. 數據導入

3.1 向表中裝載數據（Load）

1．語法

hive> load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];

（1）load data:表示加載數據
（2）local:表示從本地加載數據到hive表；否則從HDFS加載數據到hive表
（3）inpath:表示加載數據的路徑
（4）overwrite:表示覆蓋表中已有數據，否則表示追加
（5）into table:表示加載到哪張表
（6）student:表示具體的表
（7）partition:表示上傳到指定分區
2．加載數據到分區表中

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');

4.hive腳本

#!/bin/bash

# 定義變量方便修改
APP=gmall

# 如果是輸入的日期按照取輸入日期；如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`
fi 

sql="
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table "$APP".ads_sale_tm_category1_stat_mn
select   
    mn.sku_tm_id,
    mn.sku_category1_id,
    mn.sku_category1_name,
    sum(if(mn.order_count>=1,1,0)) buycount,
    sum(if(mn.order_count>=2,1,0)) buyTwiceLast,
    sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,
    sum(if(mn.order_count>=3,1,0)) buy3timeLast,
    sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,
    date_format('$do_date' ,'yyyy-MM') stat_mn,
    '$do_date' stat_date
from 
(     
select 
        user_id, 
od.sku_tm_id, 
        od.sku_category1_id,
        od.sku_category1_name,  
        sum(order_count) order_count
    from "$APP".dws_sale_detail_daycount  od 
    where date_format(dt,'yyyy-MM')=date_format('$do_date' ,'yyyy-MM')
    group by user_id, od.sku_tm_id, od.sku_category1_id, od.sku_category1_name
) mn
group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;
"
hive=/opt/module/hive/bin/hive
$hive -e "$sql"
#cdh
beeline -u "jdbc:hive2://hadoop102:10000/" -n hive -e "$sql"

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hive常用命令和壓縮

1.創建數據庫

2.刪除數據庫

3. 數據導入

3.1 向表中裝載數據（Load）

4.hive腳本

java代碼獲取Redis客戶端

spark讀取hive和寫入hive

linux集羣同步腳本

使用Druid連接mysql數據

Datax使用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結