StreamSets 實時同步mysql數據到kudu
初始化數據
業務庫下的表 wm.admin_user_app wm.department
1.在hive創建數據庫;
CREATE DATABASE zxl_db;
CREATE DATABASE zxl_db_tmp;
2.在hive建kudu表,hive臨時表;
# impala-shell
CREATE TABLE IF NOT EXISTS zxl_db_tmp.admin_user_app (
`id` bigint ,
`user_id` bigint ,
`app_id` bigint ,
`o_id` bigint ,
`c_id` bigint ,
`status` tinyint ,
`update_time` string ,
`create_time` string )
row format delimited fields terminated by '\t'
STORED AS TEXTFILE
# impala-shell
CREATE TABLE IF NOT EXISTS zxl_db.admin_user_app (
`id` bigint ,
`user_id` bigint ,
`app_id` bigint ,
`o_id` bigint ,
`c_id` bigint ,
`status` tinyint ,
`update_time` string ,
`create_time` string ,
PRIMARY KEY (`id`))
STORED AS KUDU;
CREATE TABLE zxl_db_tmp.department (
`dept_id` bigint,
`unit_code` string,
`parent_id` bigint,
`name` string,
`status` tinyint,
`sort` bigint,
`ext` string,
`update_time` string,
`create_time` string )
row format delimited fields terminated by '\t'
STORED AS TEXTFILE;
CREATE TABLE zxl_db.department (
`dept_id` bigint,
`unit_code` string,
`parent_id` bigint,
`name` string,
`status` tinyint,
`sort` bigint,
`ext` string,
`update_time` string,
`create_time` string,
PRIMARY KEY (`dept_id`))
STORED AS KUDU;
Note:
(1) 建kudu表時必須指明PRIMARY KEY
(2)建kudu表時必須指明爲kudu存儲類型
(3)建hive表最好指定分隔符 row format delimited fields terminated by '\t'
,本次測試從mysql抽數到hive,默認的分隔符是逗號或空格,抽數時指定 \t 分隔符導致數據都爲null。
3.從mysql抽數到hive臨時表
# 抽數
sudo -u hive sqoop import \
--connect jdbc:mysql://10.234.7.73:3306/wm?tinyInt1isBit=false \
--username work \
--password phkAmwrF \
--hive-database zxl_db_tmp \
--hive-table admin_user_app \
--query "select id,user_id,app_id,o_id,c_id,status,date_format(update_time, '%Y-%m-%d %H:%i:%s') update_time,date_format(create_time, '%Y-%m-%d %H:%i:%s') create_time from admin_user_app where 1=1 and \$CONDITIONS" \
--hive-import \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--delete-target-dir \
--target-dir /user/hive/import/admin_user_app \
--hive-drop-import-delims \
--hive-overwrite \
-m 1;
sudo -u hive sqoop import \
--connect jdbc:mysql://10.234.7.73:3306/wm?tinyInt1isBit=false \
--username work \
--password phkAmwrF \
--hive-database zxl_db_tmp \
--hive-table department \
--query "select dept_id,unit_code,parent_id,name,status,sort,ext,date_format(update_time, '%Y-%m-%d %H:%i:%s') update_time,date_format(create_time, '%Y-%m-%d %H:%i:%s') create_time from department where 1=1 and \$CONDITIONS" \
--hive-import \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--delete-target-dir \
--target-dir /user/hive/import/department \
--hive-drop-import-delims \
--hive-overwrite \
-m 1;
# 修復分區(若有分區則需要修復) # beeline
msck repair table zxl_db_tmp.admin_user_app
Note:
(a) time,date,datetime ,timestamp(非string類型)導入到hive時時間格式會有問題,如:“2018-07-17 10:01:54.0”;需要在導入時進行處理;
(b) tinyInt1isBit=false 是爲了解決sqoop從mysql導入數據到hive時tinyint(1)格式自動變成Boolean;
© mysql到hive字段類型會發生改變,本例中mysql的int映射到hive變成了bigint,若要指定映射類型,需要在hive手動創建表指定數據類型;
4.從hive臨時表抽數到kudu
# impala-shell
upsert into table zxl_db.admin_user_app select id,user_id,app_id,o_id,c_id,status,update_time,create_time from zxl_db_tmp.admin_user_app
upsert into table zxl_db.department select dept_id,unit_code,parent_id,name,status,sort,ext,,update_time,create_time from zxl_db_tmp.department
# 修復元數據 # impala-shell
invalidate metadata zxl_db.admin_user_app
invalidate metadata zxl_db.department
# 刪除hive臨時表 # beeline
drop table zxl_db_tmp.admin_user_app
SDC實時同步數據
1.創建管道
2.添加和配置 binlog採集組件
Initial offset 在數據庫中使用 SHOW MASTER STATUS;
獲取;
Include Tables配置需要實時同步的表,多個使用逗號隔開;
3.添加和配置流選擇器,可用於過濾數據庫
4.數據處理
for record in records:
newRecord = sdcFunctions.createRecord(record.sourceId + ':newRecordId')
try:
if record.value['Type'] == 'DELETE':
newRecord.attributes['sdc.operation.type']='2'
newRecord.value = record.value['OldData']
else:
newRecord.attributes['sdc.operation.type']='4';
newRecord.value = record.value['Data'];
# Write record to processor output
record.value['Type'] = record.value['Type']
newRecord.value['Table'] = record.value['Table']
output.write(newRecord)
except Exception as e:
# Send record to error
error.write(newRecord, str(e))
5.寫入kudu
Table Name:impala::zxl_db.${record:value(’/Table’)} 表示將表數據寫入zxl_db的表中
6.啓動管道
7.驗證是否實時同步
Shylin