利用 StreamSets 實現將 SQL Server 中數據實時同步寫入 Kudu

  1. 環境準備
    1. SQL Server 中創建測試庫表
      CREATE DATABASE test;
      CREATE TABLE [dbo].[cdc_test] (
        [id] int  IDENTITY(1,1) NOT NULL,
        [name] varchar(60) COLLATE Chinese_PRC_CI_AS  NOT NULL,
        CONSTRAINT [PK_cdc_test] PRIMARY KEY CLUSTERED ([id])
      WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)  
      ON [PRIMARY]
      )  
      ON [PRIMARY]
      GO
      ALTER TABLE [dbo].[cdc_test] SET (LOCK_ESCALATION = TABLE)
      
    2. 對測試庫表啓用 CDC
      -- 對 test 庫啓用 CDC
      USE test 
      GO
      EXECUTE sys.sp_cdc_enable_db;
      GO
      -- 對 cdc_test 表啓用 CDC
      USE test  
      GO  
      EXEC sys.sp_cdc_enable_table  
      @source_schema = N'dbo',  
      @source_name   = N'cdc_test',  
      @role_name     = NULL,  
      @supports_net_changes = 1  
      GO 
      

      參考: https://blog.csdn.net/weixin_43215250/article/details/105813087

    3. 在 HUE 上創建 KUDU 表
      CREATE DATABASE IF NOT EXISTS test;
      CREATE TABLE IF NOT EXISTS test.cdc_test ( 
        id int, 
        name String,
        PRIMARY key(id)
      ) 
      PARTITION BY HASH PARTITIONS 16 
      STORED AS KUDU;
      
  2. 創建 StreamSets 的 Pipline


    SQL Server CDC 客戶端配置


    Stream Selector 配置

    ${record:attribute('sdc.operation.type') == 5 }
    

    ${record:attribute('sdc.operation.type') == 5 }
    

    Kudu 配置


  3. 啓動 Pipelines

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章