【原创】Talend ETL开发——基于joblet的统一的email发送

一、背景

在ETL数据整合过程中,肯定会涉及到email的通知,比如ETL的执行情况汇报,执行耗时汇报,关键数据更新情况等信息汇报等,这些信息都是需要及时给到相应的operation人员或者使用BI数据的人员。

但是,如果一开始没有规划好邮件推送的一些基本信息,有可能会导致后期邮件发送混乱和不好管理等问题,例如:每个人都有自己的etl,每个人都会去开发自己的邮件通知,那随着时间推移,后期哪些邮件要取消、哪些通知人要屏蔽等都是个难事,可能需要打开所有的ETL job去检查,去修改,耗时耗力,非常不利于管理。

二、实现

在实现该方案的时候,我主要考虑了以下几个方面:

1、每个人需要调用发送邮件的时候,尽量不要重复再做一次拖拉整套组件了,拖拉一次公共组件就好了,所以我选择用joblet来实现这个。

2、邮件的一些基础公共信息必须在一个地方维护,比如发送、接收邮件列表,发送记录等信息,所以我设计了数据库表来存放这些信息,这样只要更新数据库信息,就可以使得全局都使用统一的信息。

3、信息的发送、状态、生成的方式都可以灵活控制,所以我设计了一个表来存储这些信息,而且通过存储过程生成具体的email信息,这样可以追踪发送记录等信息。

4、因为talend joblet支持变量,所以我尽量将发送邮件组件中的一些变量都设计到数据库表中,这样方便于维护和修改。

2.1、数据库表设计

数据库设计主要有2张表:mail_send_group、mail_send_list_rec

mail_send_group:该表是用于记录发送者和接收者之间的信息,维护在这里可以让后去维护更简单,修改数据库则全局启用。

IF (OBJECT_ID(N'[chk].[mail_send_group]', N'U') IS NOT NULL)
BEGIN
    PRINT N'删除表:[chk].[mail_send_group]';
    DROP TABLE [chk].[mail_send_group];
END
GO

CREATE TABLE [chk].[mail_send_group]
(
    [group_id] NVARCHAR(50) NOT NULL,--主键
    [mail_to] NVARCHAR(1000) NOT NULL,--接收者邮箱列表,多个邮箱用;分割
    [mail_from] NVARCHAR(100) NOT NULL,--发送者邮箱
    [sender_name] NVARCHAR(100) NOT NULL,--发送者暱称
    [mail_cc] NVARCHAR(1000) NULL,--抄送邮箱列表,多个邮箱用;分割
    [mail_bcc] NVARCHAR(100) NULL,--密送邮箱列表,多个邮箱用;分割
    [create_date] DATETIME NOT NULL,--创建日期
    [status] SMALLINT NULL--状态(0禁用,1启用)
)
GO

 

1

mail_send_list_rec:该表是用于记录email生成的记录和发送记录的,每条信息通过group_id和上表关联,就可以知道每条信息是由谁发给谁的,什么时候发送的。

IF (OBJECT_ID(N'[chk].[mail_send_list_rec]', N'U') IS NOT NULL)
BEGIN
    PRINT N'删除表:[chk].[mail_send_list_rec]';
    DROP TABLE [chk].[mail_send_list_rec];
END
GO

CREATE TABLE [chk].[mail_send_list_rec]
(
    [mail_id] NVARCHAR(50) NOT NULL,--主键
    [group_id] NVARCHAR(50) NOT NULL,--所属的group id,用于确认发送接收等信息
    [scope] NVARCHAR(100) NOT NULL,--业务,用于区分不同业务生成的邮件,相当于一个分类
    [subject] NVARCHAR(100) NOT NULL,--主题
    [message] NVARCHAR(4000) NOT NULL,--正文,支持HTML代码,建议是用HTML
    [create_date] DATETIME NOT NULL,--创建日期
    [send_date] DATETIME NULL,--发送日期
    [send_status] SMALLINT NULL--发送状态(0创建完未发送,1已经发送)
)
GO

2

2.2、Joblet开发

3

1、 Joblet采用了input方式,输入的参数是mail_id,即邮件的ID,这个是外部调用这个joblet的时候需要传递进来的一个参数。

4

2、 tFlowToIterate是用于将mail_id生成一个全局变量,用于传递给3的mssqlinput。

5

3、 该组件是用于根据mail_id去查询数据库表中的详细email信息,为后面的发送信息提供明细。

SELECT
    [a].[mail_id]
    ,[a].[subject]
    ,[a].[message]
    ,[b].[mail_from]
    ,[b].[mail_to]
    ,[b].[sender_name]
    ,[b].[mail_cc]
    ,[b].[mail_bcc]
    ,[b].[status]
FROM [chk].[mail_send_list_rec] AS a WITH(NOLOCK)
INNER JOIN [chk].[mail_send_group] AS b WITH(NOLOCK)
        ON ([a].[group_id] = [b].[group_id])
WHERE [a].[mail_id] = '" + ((String)globalMap.get("curr_mail_id")) + "'
      AND ISNULL([b].[status], 0) = 0

 

4、 发送邮件组件主要就是根据数据库中查询的数据,通过变量方式传递过来后,执行发送邮件的操作。

6

5、 更新数据库中相应的mail_id的记录为已发送和发送时间等信息。先用tfixedflowinput生成相应的存储过程参数,然后MSSQL_SP调用存储过程更新。

7

8

 

2.3、存储过程开发生成&更新email内容

生成email:主要功能就是按照你想要发送的内容生成一个message,并插入到数据库表中即可。

IF (OBJECT_ID(N'[chk].[usp_insert_ids_mail_send_list_rec]', N'P') IS NOT NULL)
BEGIN
    PRINT N'删除存储过程:[chk].[usp_insert_ids_mail_send_list_rec]';
    DROP PROC [chk].[usp_insert_ids_mail_send_list_rec];
END
GO

CREATE PROC [chk].[usp_insert_ids_mail_send_list_rec]
(
    @curr_date NVARCHAR(20)
)
AS
--====================================================================================================================================
--    ProcedureName      :          chk.usp_insert_ids_mail_send_list_rec
--    Author             :          john.xiong    
--    CreateDate         :          2019-01-02
--    Description        :          生成daily的detail mail content

/*************************************Parameters参数说明*******************************************************************************
--    @curr_date         :          数据实行日期YYYYMMDD
      
**************************************Modfied List修改记录*****************************************************************************
--    Modified Date       Modified User      Version           Modified Reason
**************************************************************************************************************************************
--    2019-01-02          john.xiong         V01.00.00         初始化版本
**************************************************************************************************************************************/
--====================================================================================================================================
BEGIN
    BEGIN TRY
        DECLARE
            @begin_time DATETIME
            ,@end_time DATETIME
            ,@cost_time INT;
        SET @begin_time = DATEADD(HOUR, 8, GETDATE());
        INSERT INTO [chk].[tb_proc_cost_log]
        (
            [proc_name]
            ,[Object_name]
            ,[execute_time]
            ,[action]
            ,[remark]
            ,[cost_time]
        )
        SELECT
            N'chk.usp_insert_ids_mail_send_list_rec' AS [proc_name]
            ,N'chk.mail_send_list_rec' AS [Object_name]
            ,@begin_time AS [execute_time]
            ,N'start' AS [action]
            ,'' AS [remark]
            ,0 AS [cost_time]
        
        DECLARE
            @mail_id UNIQUEIDENTIFIER,
            @scope NVARCHAR(100),
            @group_id UNIQUEIDENTIFIER,
            @subject NVARCHAR(100),
            @create_date DATETIME,
            @message NVARCHAR(4000),
            @temp_message NVARCHAR(4000),
            @count INT,
            @count1 INT,
            @count2 INT,
            @error_count INT
    
        SET @mail_id = NEWID();
        SET @scope = N'IDS';
        SET @group_id = N'8D42D25D-59C7-4A5E-AE9C-4A5F24D910B0'
        SET @subject = N'IDS daily - job运行情况';
        SET @create_date = DATEADD(HOUR, 8, GETDATE());
        SET @count1 = 0;
        SET @count2 = 0;
        SET @error_count = 0;
        SET @message = '<span style="color:#000; line-height:30px"><ol>';
        SET @temp_message = '';
        SET @count = 0;
        SELECT
            @count = COUNT(*)
        FROM [chk].[log_move_blob_rec] AS a
        WHERE LEFT([a].[rec_load_time], 8) = @curr_date
              AND ([a].[scope] IN ('ids_regular_data') OR [a].[blobFileName] LIKE '%LCH%')
        SET @message = @message + N'<li>从landing搬移blob文件总数:' + CONVERT(NVARCHAR(20), ISNULL(@count, 0));
        SET @temp_message = '';
        SET @count = 0;
        SELECT
            @count = COUNT(*)
        FROM [chk].[log_move_blob_rec] AS a
        WHERE LEFT([a].[rec_load_time], 8) = @curr_date
              AND [a].[scope] = 'ids_regular_data'
        SET @message = @message + N'<br>经销商regular data:' + CONVERT(NVARCHAR(20), ISNULL(@count, 0));
        SET @temp_message = '';
        SET @count = 0;
        SELECT
            @count = COUNT(*)
        FROM [chk].[log_move_blob_rec] AS a
        WHERE LEFT([a].[rec_load_time], 8) = @curr_date
              AND [a].[blobFileName] LIKE '%LCH%'
        SET @message = @message + N'<br>local customer hierarchy daily:' + CONVERT(NVARCHAR(20), ISNULL(@count, 0)) + '</li>';
        SET @temp_message = '';
        SET @count = 0;
        SELECT
            @count = SUM([a].[file_count])
        FROM [chk].[log_blob_file_deal] AS a
        WHERE LOWER([a].[data_scope]) = 'ids'
                AND LOWER([a].[deal_level]) = 'ext'
                AND LOWER([a].[job_name]) = LOWER('IDS_Data_Blob_To_Stg_Ongoing_Loop_New_1_3')
                AND [a].[remark] LIKE '%tFileList Count%'
                AND CONVERT(NVARCHAR(8), [a].[deal_date], 112) = @curr_date
        SET @message = @message + N'<li>实际处理经销商regular data文件数:' + CONVERT(NVARCHAR(20), ISNULL(@count, 0)) + '</li>';
        SET @temp_message = '';
        SET @count = 0;
        SELECT
            @count = COUNT([a].[file_name])
        FROM [chk].[log_file_deal_error_rec] AS a
        WHERE LOWER([a].[data_scope]) = 'ids'
                AND LOWER([a].[deal_level]) = 'ext'
                AND LOWER([a].[job_name]) = LOWER('IDS_Data_Blob_To_Stg_Ongoing_Loop_New_1_3')
                AND CONVERT(NVARCHAR(8), [a].[deal_date], 112) = @curr_date
        SET @message = @message + N'<li>无法解压的经销商regular data文件数:' + CONVERT(NVARCHAR(20), ISNULL(@count, 0)) + '</li>';
        SET @temp_message = '';
        SET @count = 0;
        SELECT
            @count = SUM([a].[file_count])
        FROM [chk].[log_blob_file_deal] AS a
        WHERE LOWER([a].[data_scope]) = 'ids'
                AND LOWER([a].[deal_level]) = 'ext'
                AND LOWER([a].[job_name]) = LOWER('IDS_RCS_Local_Master_Data_Daily_1_2')
                AND [a].[remark] LIKE '%tFileList Count lch%'
                AND CONVERT(NVARCHAR(8), [a].[deal_date], 112) = @curr_date
        SET @message = @message + N'<li>处理local customer hierarchy daily文件数:' + CONVERT(NVARCHAR(20), ISNULL(@count, 0));
        SET @temp_message = '';
        SET @count = 0;
        SET @count1 = 0;
        SELECT TOP (1)
            @count1 = [a].[row_count]
        FROM [chk].[log_table_data_rec] AS a
        WHERE [a].[data_scope] = 'rcs dim'
              AND [a].[table_name] = 'stg.cust_ids_rcs_local_customer_hierarchy_daily'
              AND CONVERT(NVARCHAR(8), [a].[action_time], 112) = @curr_date
        ORDER BY [a].[action_time] DESC
        SET @message = @message + N'<br>文件数据行数:' + CONVERT(NVARCHAR(20), ISNULL(@count1, 0));
        SET @temp_message = '';
        SET @count = 0;
        SET @count2 = 0;
        SELECT
            @count2 = COUNT(*)
        FROM [stg].[cust_ids_rcs_local_customer_hierarchy_daily] AS a
        WHERE LEFT([a].[rec_load_time], 8) = @curr_date
        SET @message = @message + N'<br>入库数据行数:' + CONVERT(NVARCHAR(20), ISNULL(@count2, 0)) + '</li>';

        IF (@count1 <> @count2)
        BEGIN
            SET @error_count = @error_count + 1;
        END

        IF (OBJECT_ID(N'[chk].[temp_mail_send_proc_error_list_ids_daily]', N'U') IS NOT NULL)
        BEGIN
            DROP TABLE [chk].[temp_mail_send_proc_error_list_ids_daily];
        END

        /*生成错误proc的记录*/
        CREATE TABLE [chk].[temp_mail_send_proc_error_list_ids_daily]
        WITH
        (
            DISTRIBUTION = ROUND_ROBIN,
            CLUSTERED COLUMNSTORE INDEX
        )
        AS
            SELECT
                [a].[proc_name]
                ,ROW_NUMBER() OVER(ORDER BY [a].[error_time] ASC) AS [Num]
            FROM [chk].[log_proc_error_rec] AS a
            WHERE [a].[proc_name] LIKE '%ids%'
                  AND [a].[proc_name] NOT LIKE '%mail%'
                  AND CONVERT(NVARCHAR(8), [a].[error_time], 112) = @curr_date
        SET @count = 0;
        SELECT @count = COUNT(*) FROM [chk].[temp_mail_send_proc_error_list_ids_daily];
        IF (@count > 0)
        BEGIN
            SET @message = @message + N'<li style="color:red">有错误的PROC:' + CONVERT(NVARCHAR(20), @count);
            SET @error_count = @error_count + @count;
        END
        WHILE (@count > 0)
        BEGIN
            SELECT @temp_message = [proc_name] FROM [chk].[temp_mail_send_proc_error_list_ids_daily] WHERE [Num] = @count;
            SET @message = @message + N'<br />' + @temp_message + ';&nbsp;';
            SET @count = @count - 1;
        END

        SET @message = @message + '</li>';

        IF (@error_count <> 0)
        BEGIN
            SET @subject = @subject + ':有 ' + CONVERT(NVARCHAR(20), @error_count) + ' 个错误';
        END

        SET @subject = @curr_date + N'  ' + @subject;
        
        SET @message = @message + '</ol></span>'
        PRINT @message
        INSERT INTO [chk].[mail_send_list_rec]
        (
            [mail_id]
            ,[group_id]
            ,[scope]
            ,[subject]
            ,[message]
            ,[create_date]
            ,[send_date]
            ,[send_status]
        )
        SELECT
            @mail_id,
            @group_id,
            @scope,
            @subject,
            @message,
            @create_date,
            NULL,
            0

        SET @end_time = DATEADD(HOUR, 8, GETDATE());
        SET @cost_time = DATEDIFF(SECOND, @begin_time, @end_time);
        INSERT INTO [chk].[tb_proc_cost_log]
        (
            [proc_name]
            ,[Object_name]
            ,[execute_time]
            ,[action]
            ,[remark]
            ,[cost_time]
        )
        SELECT
            N'chk.usp_insert_ids_mail_send_list_rec' AS [proc_name]
            ,N'chk.mail_send_list_rec' AS [Object_name]
            ,@end_time AS [execute_time]
            ,N'end' AS [action]
            ,CONVERT(NVARCHAR(50), @mail_id) AS [remark]
            ,@cost_time AS [cost_time]

        PRINT N'Exec success';
        SELECT @mail_id AS [curr_mail_id]
    END TRY
    BEGIN CATCH
        INSERT INTO [chk].[log_proc_error_rec]
        (
            [proc_name]
            ,[error_source]
            ,[error_time]
            ,[error_severity]
            ,[error_state]
            ,[error_msg]
            ,[log_user]
        )
        SELECT
             N'chk.usp_insert_ids_mail_send_list_rec' AS [proc_name]
            ,ERROR_PROCEDURE() AS [error_source]
            ,DATEADD(HOUR, 8, GETDATE()) AS [error_time]
            ,ERROR_SEVERITY() AS [error_severity]
            ,ERROR_STATE() AS [error_state]
            ,ERROR_MESSAGE() AS [error_msg]
            ,SUSER_SNAME() AS [log_user]
        PRINT N'Exec failed';
    END CATCH
END
View Code

 

更新email by mail_id

IF (OBJECT_ID(N'[chk].[usp_update_mail_send_list_rec_by_mail_id]', N'P') IS NOT NULL)
BEGIN
    PRINT N'删除存储过程:[chk].[usp_update_mail_send_list_rec_by_mail_id]';
    DROP PROC [chk].[usp_update_mail_send_list_rec_by_mail_id];
END
GO

CREATE PROC [chk].[usp_update_mail_send_list_rec_by_mail_id]
(
    @mail_id NVARCHAR(50)
    ,@send_date DATETIME
    ,@send_status SMALLINT
)
AS
--====================================================================================================================================
--    ProcedureName      :          [chk].[usp_update_mail_send_list_rec_by_mail_id]
--    Author             :          john.xiong    
--    CreateDate         :          2018-12-24
--    Description        :          根据mail_id更新mail发生记录信息

/*************************************Parameters参数说明*******************************************************************************
--    @mail_id           :          邮件id NEWID
      
**************************************Modfied List修改记录*****************************************************************************
--    Modified Date       Modified User      Version           Modified Reason
**************************************************************************************************************************************
--    2018-12-24          john.xiong         V01.00.00         初始化版本
**************************************************************************************************************************************/
--====================================================================================================================================
BEGIN
    BEGIN TRY
        DECLARE
            @begin_time DATETIME
            ,@end_time DATETIME
            ,@cost_time INT

        SET @begin_time = DATEADD(HOUR, 8, GETDATE());
        INSERT INTO [chk].[tb_proc_cost_log]
        (
            [proc_name]
            ,[Object_name]
            ,[execute_time]
            ,[action]
            ,[remark]
            ,[cost_time]
        )
        SELECT
            N'chk.usp_update_mail_send_list_rec_by_mail_id' AS [proc_name]
            ,N'chk.mail_send_list_rec' AS [Object_name]
            ,@begin_time AS [execute_time]
            ,N'start' AS [action]
            ,N'' AS [remark]
            ,0 AS [cost_time]
        
        IF (@mail_id IS NULL)
        BEGIN
            RAISERROR (N'mail id错误!强制退出', 16, 1);
        END

        IF (@send_date IS NULL)
        BEGIN
            SET @send_date = DATEADD(HOUR, 8, GETDATE());
        END
        
        UPDATE [chk].[mail_send_list_rec]
        SET [send_date] = @send_date, [send_status] = @send_status
        WHERE [mail_id] = @mail_id;

        SET @end_time = DATEADD(HOUR, 8, GETDATE());
        SET @cost_time = DATEDIFF(SECOND, @begin_time, @end_time);
        INSERT INTO [chk].[tb_proc_cost_log]
        (
            [proc_name]
            ,[Object_name]
            ,[execute_time]
            ,[action]
            ,[remark]
            ,[cost_time]
        )
        SELECT
            N'chk.usp_update_mail_send_list_rec_by_mail_id' AS [proc_name]
            ,N'chk.mail_send_list_rec' AS [Object_name]
            ,@end_time AS [execute_time]
            ,N'end' AS [action]
            ,N'' AS [remark]
            ,@cost_time AS [cost_time]
        
        PRINT N'exec successed'
    END TRY
    BEGIN CATCH
        INSERT INTO [chk].[log_proc_error_rec]
        (
            [proc_name]
            ,[error_source]
            ,[error_time]
            ,[error_severity]
            ,[error_state]
            ,[error_msg]
            ,[log_user]
        )
        SELECT
             N'chk.usp_update_mail_send_list_rec_by_mail_id' AS [proc_name]
            ,ERROR_PROCEDURE() AS [error_source]
            ,DATEADD(HOUR, 8, GETDATE()) AS [error_time]
            ,ERROR_SEVERITY() AS [error_severity]
            ,ERROR_STATE() AS [error_state]
            ,ERROR_MESSAGE() AS [error_msg]
            ,SUSER_SNAME() AS [log_user]
            
        PRINT N'exec failed'
    END CATCH
END
View Code

 

三、和job结合调用

在需要发送email的job中,将joblet拖拉过去即可,然后生成一个你需要发送的邮件的mail_id,通过input组件将其传递到joblet组件的input输入中,这样就可以将joblet融入到job中。

9

如果您觉得此文章对您有帮助,请点击右下方【推荐】让更多人看到,thanks!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章