數倉搭建之DWD層
4.1 DWD層啓動表數據解析
4.1.1 創建啓動表
1)建表語句
hive (gmall)>
drop table if exists dwd_start_log;
CREATE EXTERNAL TABLE dwd_start_log(
mid_id
string,
user_id
string,
version_code
string,
version_name
string,
lang
string,
source
string,
os
string,
area
string,
model
string,
brand
string,
sdk_version
string,
gmail
string,
height_width
string,
app_time
string,
network
string,
lng
string,
lat
string,
entry
string,
open_ad_type
string,
action
string,
loading_time
string,
detail
string,
extend1
string
)
PARTITIONED BY (dt string)
location ‘/warehouse/gmall/dwd/dwd_start_log/’;
4.1.2 向啓動表導入數據
hive (gmall)>
insert overwrite table dwd_start_log
PARTITION (dt=‘2019-02-10’)
select
get_json_object(line,’.mid′)midid,getjsonobject(line,′.uid’) user_id,
get_json_object(line,’.vc′)versioncode,getjsonobject(line,′.vn’) version_name,
get_json_object(line,’.l′)lang,getjsonobject(line,′.sr’) source,
get_json_object(line,’.os′)os,getjsonobject(line,′.ar’) area,
get_json_object(line,’.md′)model,getjsonobject(line,′.ba’) brand,
get_json_object(line,’.sv′)sdkversion,getjsonobject(line,′.g’) gmail,
get_json_object(line,’.hw′)heightwidth,getjsonobject(line,′.t’) app_time,
get_json_object(line,’.nw′)network,getjsonobject(line,′.ln’) lng,
get_json_object(line,’.la′)lat,getjsonobject(line,′.entry’) entry,
get_json_object(line,’.openadtype′)openadtype,getjsonobject(line,′.action’) action,
get_json_object(line,’.loadingtime′)loadingtime,getjsonobject(line,′.detail’) detail,
get_json_object(line,’$.extend1’) extend1
from ods_start_log
where dt=‘2019-02-10’;
3)測試
hive (gmall)> select * from dwd_start_log limit 2;
4.1.3 DWD層啓動表加載數據腳本
1)在hadoop102的/home/atguigu/bin目錄下創建腳本
[atguigu@hadoop102 bin]$ vim dwd_start_log.sh
在腳本中編寫如下內容
#!/bin/bash
定義變量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive
如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table “APP".dwdstartlogPARTITION(dt=′do_date’)
select
get_json_object(line,’.mid′)midid,getjsonobject(line,′.uid’) user_id,
get_json_object(line,’.vc′)versioncode,getjsonobject(line,′.vn’) version_name,
get_json_object(line,’.l′)lang,getjsonobject(line,′.sr’) source,
get_json_object(line,’.os′)os,getjsonobject(line,′.ar’) area,
get_json_object(line,’.md′)model,getjsonobject(line,′.ba’) brand,
get_json_object(line,’.sv′)sdkversion,getjsonobject(line,′.g’) gmail,
get_json_object(line,’.hw′)heightwidth,getjsonobject(line,′.t’) app_time,
get_json_object(line,’.nw′)network,getjsonobject(line,′.ln’) lng,
get_json_object(line,’.la′)lat,getjsonobject(line,′.entry’) entry,
get_json_object(line,’.openadtype′)openadtype,getjsonobject(line,′.action’) action,
get_json_object(line,’.loadingtime′)loadingtime,getjsonobject(line,′.detail’) detail,
get_json_object(line,'.extend1′)extend1from"APP”.ods_start_log
where dt=’$do_date’;
"
hive−e"sql"
2)增加腳本執行權限
[atguigu@hadoop102 bin]$ chmod 777 dwd_start_log.sh
3)腳本使用
[atguigu@hadoop102 module]$ dwd_start_log.sh 2019-02-11
4)查詢導入結果
hive (gmall)>
select * from dwd_start_log where dt=‘2019-02-11’ limit 2;
5)腳本執行時間
企業開發中一般在每日凌晨30分~1點