StreamSets 實時同步mysql數據到kudu

StreamSets 實時同步mysql數據到kudu

初始化數據

業務庫下的表 wm.admin_user_app wm.department

1.在hive創建數據庫;

CREATE DATABASE zxl_db;
CREATE DATABASE zxl_db_tmp;

2.在hive建kudu表,hive臨時表;

# impala-shell
CREATE TABLE IF NOT EXISTS zxl_db_tmp.admin_user_app (
  `id` bigint ,
  `user_id` bigint ,
  `app_id` bigint ,
  `o_id` bigint ,
  `c_id` bigint ,
  `status` tinyint ,
  `update_time` string ,
  `create_time` string ) 
  row format delimited fields terminated by '\t'
  STORED AS TEXTFILE
  
# impala-shell
CREATE TABLE IF NOT EXISTS  zxl_db.admin_user_app (
  `id` bigint ,
  `user_id` bigint ,
  `app_id` bigint ,
  `o_id` bigint ,
  `c_id` bigint ,
  `status` tinyint ,
  `update_time` string ,
  `create_time` string ,
  PRIMARY KEY (`id`))
	STORED AS KUDU;
	
	CREATE TABLE zxl_db_tmp.department (
  `dept_id` bigint,
  `unit_code` string,
  `parent_id` bigint,
  `name` string,
  `status` tinyint,
  `sort` bigint,
  `ext` string,
  `update_time` string,
  `create_time` string )
  row format delimited fields terminated by '\t'
  STORED AS TEXTFILE;
  
  CREATE TABLE zxl_db.department (
  `dept_id` bigint,
  `unit_code` string,
  `parent_id` bigint,
  `name` string,
  `status` tinyint,
  `sort` bigint,
  `ext` string,
  `update_time` string,
  `create_time` string,
  PRIMARY KEY (`dept_id`))
  STORED AS KUDU;

Note:

​ (1) 建kudu表時必須指明PRIMARY KEY

​ (2)建kudu表時必須指明爲kudu存儲類型

​ (3)建hive表最好指定分隔符 row format delimited fields terminated by '\t',本次測試從mysql抽數到hive,默認的分隔符是逗號或空格,抽數時指定 \t 分隔符導致數據都爲null。

3.從mysql抽數到hive臨時表

# 抽數
sudo -u hive sqoop  import  \
--connect jdbc:mysql://10.234.7.73:3306/wm?tinyInt1isBit=false \
--username work \
--password phkAmwrF \
--hive-database zxl_db_tmp \
--hive-table admin_user_app \
--query "select id,user_id,app_id,o_id,c_id,status,date_format(update_time, '%Y-%m-%d %H:%i:%s') update_time,date_format(create_time, '%Y-%m-%d %H:%i:%s') create_time from admin_user_app where 1=1 and \$CONDITIONS" \
--hive-import \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by "\t" \
--lines-terminated-by "\n"  \
--delete-target-dir \
--target-dir /user/hive/import/admin_user_app \
--hive-drop-import-delims \
--hive-overwrite  \
-m 1;


sudo -u hive sqoop  import  \
--connect jdbc:mysql://10.234.7.73:3306/wm?tinyInt1isBit=false \
--username work \
--password phkAmwrF \
--hive-database zxl_db_tmp \
--hive-table department \
--query "select dept_id,unit_code,parent_id,name,status,sort,ext,date_format(update_time, '%Y-%m-%d %H:%i:%s') update_time,date_format(create_time, '%Y-%m-%d %H:%i:%s') create_time from department where 1=1 and \$CONDITIONS" \
--hive-import \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by "\t" \
--lines-terminated-by "\n"  \
--delete-target-dir \
--target-dir /user/hive/import/department \
--hive-drop-import-delims \
--hive-overwrite  \
-m 1;


# 修復分區(若有分區則需要修復) # beeline
msck repair table zxl_db_tmp.admin_user_app

Note:

​ (a) time,date,datetime ,timestamp(非string類型)導入到hive時時間格式會有問題,如:“2018-07-17 10:01:54.0”;需要在導入時進行處理;

(b) tinyInt1isBit=false 是爲了解決sqoop從mysql導入數據到hive時tinyint(1)格式自動變成Boolean;

© mysql到hive字段類型會發生改變,本例中mysql的int映射到hive變成了bigint,若要指定映射類型,需要在hive手動創建表指定數據類型;

4.從hive臨時表抽數到kudu

# impala-shell
upsert into table zxl_db.admin_user_app select id,user_id,app_id,o_id,c_id,status,update_time,create_time from zxl_db_tmp.admin_user_app

upsert into table zxl_db.department select dept_id,unit_code,parent_id,name,status,sort,ext,,update_time,create_time from zxl_db_tmp.department



# 修復元數據  # impala-shell
invalidate metadata zxl_db.admin_user_app
invalidate metadata zxl_db.department

# 刪除hive臨時表  # beeline
drop table zxl_db_tmp.admin_user_app

SDC實時同步數據

1.創建管道

在這裏插入圖片描述

2.添加和配置 binlog採集組件

在這裏插入圖片描述
在這裏插入圖片描述

在這裏插入圖片描述

Initial offset 在數據庫中使用 SHOW MASTER STATUS;獲取;

Include Tables配置需要實時同步的表,多個使用逗號隔開;

3.添加和配置流選擇器,可用於過濾數據庫

在這裏插入圖片描述

4.數據處理

在這裏插入圖片描述

for record in records:
  newRecord = sdcFunctions.createRecord(record.sourceId + ':newRecordId')
  try:
    if record.value['Type'] == 'DELETE':
      newRecord.attributes['sdc.operation.type']='2'
      newRecord.value = record.value['OldData']
    else:
      newRecord.attributes['sdc.operation.type']='4';
      newRecord.value = record.value['Data'];
    # Write record to processor output
    record.value['Type'] = record.value['Type']
    newRecord.value['Table'] = record.value['Table']
    
    output.write(newRecord)
  except Exception as e:
    # Send record to error
    error.write(newRecord, str(e))

5.寫入kudu

在這裏插入圖片描述

在這裏插入圖片描述

Table Name:impala::zxl_db.${record:value(’/Table’)} 表示將表數據寫入zxl_db的表中

6.啓動管道

在這裏插入圖片描述7.驗證是否實時同步

在這裏插入圖片描述


Shylin

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章