hive常用命令和壓縮

1.創建數據庫

1)創建一個數據庫,數據庫在HDFS上的默認存儲路徑是/user/hive/warehouse/*.db。

hive (default)> create database db_hive;

2)避免要創建的數據庫已經存在錯誤,增加if not exists判斷。(標準寫法)

hive (default)> create database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists
hive (default)> create database if not exists db_hive;

3)創建一個數據庫,指定數據庫在HDFS上存放的位置

hive (default)> create database db_hive2 location '/db_hive2.db';

2.刪除數據庫

1.刪除空數據庫

hive>drop database db_hive2;

2.如果刪除的數據庫不存在,最好採用 if exists判斷數據庫是否存在

hive> drop database db_hive;
FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
hive> drop database if exists db_hive2;

3.如果數據庫不爲空,可以採用cascade命令,強制刪除

hive> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
hive> drop database db_hive cascade;

4.建表語義(列式存儲+snappy壓縮)

create external table dwd_order_info (
    `id` string COMMENT '',
    `total_amount` decimal(10,2) COMMENT '',
    `order_status` string COMMENT ' 1 2 3 4 5',
    `user_id` string COMMENT 'id',
    `payment_way` string COMMENT '',
    `out_trade_no` string COMMENT '',
    `create_time` string COMMENT '',
    `operate_time` string COMMENT ''
) 
PARTITIONED BY (`dt` string)
stored as (parquet|orc)
location '/warehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy");

3. 數據導入

3.1 向表中裝載數據(Load)

1.語法

hive> load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];

(1)load data:表示加載數據
(2)local:表示從本地加載數據到hive表;否則從HDFS加載數據到hive表
(3)inpath:表示加載數據的路徑
(4)overwrite:表示覆蓋表中已有數據,否則表示追加
(5)into table:表示加載到哪張表
(6)student:表示具體的表
(7)partition:表示上傳到指定分區
2.加載數據到分區表中

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');

4.hive腳本

#!/bin/bash

# 定義變量方便修改
APP=gmall

# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$1" ] ;then
	do_date=$1
else 
	do_date=`date -d "-1 day" +%F`
fi 

sql="
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table "$APP".ads_sale_tm_category1_stat_mn
select   
    mn.sku_tm_id,
    mn.sku_category1_id,
    mn.sku_category1_name,
    sum(if(mn.order_count>=1,1,0)) buycount,
    sum(if(mn.order_count>=2,1,0)) buyTwiceLast,
    sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,
    sum(if(mn.order_count>=3,1,0)) buy3timeLast,
    sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,
    date_format('$do_date' ,'yyyy-MM') stat_mn,
    '$do_date' stat_date
from 
(     
select 
        user_id, 
od.sku_tm_id, 
        od.sku_category1_id,
        od.sku_category1_name,  
        sum(order_count) order_count
    from "$APP".dws_sale_detail_daycount  od 
    where date_format(dt,'yyyy-MM')=date_format('$do_date' ,'yyyy-MM')
    group by user_id, od.sku_tm_id, od.sku_category1_id, od.sku_category1_name
) mn
group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;
"
hive=/opt/module/hive/bin/hive
$hive -e "$sql"
#cdh
beeline -u "jdbc:hive2://hadoop102:10000/" -n hive -e "$sql"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章