版本

Flask==0.10.1
celery==3.1.18

celery 自定義數據持久化方案

由於當前使用MySQL存儲任務即可完成日常檢索需求，故使用MySQL來做任務的持久化。
如果有複雜檢索的需求，可以使用 elasticsearch 來存儲數據，可以更方便的實現 UI 界面搜索，比如elasticsearch-head，Kibana， ElasticHD 等。

以下是MySQL的表結構：

create table mq_task_log
(
    id             int auto_increment comment '主鍵id'
        primary key,
    task_id        varchar(36)  default '' not null comment 'mq的task_id',
    task_name      varchar(128) default '' not null comment '調用的task',
    queue          varchar(36)  default '' not null comment '進入的queue',
    payload        text                    null,
    properties     varchar(256) default '' null,
    status         tinyint      default 0  not null comment '任務執行狀態，0 初始 1 執行成功',
    is_compensated tinyint      default 0  not null comment '任務是否被補償執行，0 否 1 是',
    relation_id    varchar(64)  default '' not null comment '關聯的relation_id表，便於後期排查問題',
    relation_type  varchar(64)  default '' not null comment 'relation_id的關聯關係',
    created_time   int          default 0  not null comment '創建時間',
    updated_time   int          default 0  not null comment '修改時間',
    constraint task_id
        unique (task_id)
)
    comment 'mq任務log表';

create index ix_mq_task_log_created_time
    on mq_task_log (created_time);

create index ix_mq_task_log_relation_id
    on mq_task_log (relation_id);

持久化數據

有了表，那數據何時寫入呢？

以下提供兩種方案：

可以自己封裝一個新的調用異步任務的方法，以後在項目中使用統一格式的異步任務的調用。代碼如下:

def high_priority_task(func, args=None, kwargs=None, queue=None, producer=None, link=None, link_error=None,
custom_relation_id='', custom_relation_type='',
**options):
    task_id = str(uuid.uuid4())
    # 執行已封裝的，將任務信息寫入數據庫的function
    create_mq_task_log(func.__name__, 
        args=args, kwargs=kwargs,
        task_id=task_id, 
        queue=queue,
        custom_relation_id=custom_relation_id, custom_relation_type=custom_relation_type,
        )
    # 調用異步任務
    func.apply_async(
        args=args, kwargs=kwargs, 
        task_id=task_id, 
        producer=producer, 
        link=link, 
        link_error=link_error,
        queue=queue,
        **options)

若項目中已有大量的delay和apply_async，全部重寫調用方式成本較大，此時可以選擇重寫 celery.app.task.Task.delay 和 celery.app.task.Task.apply_async，或給 delay/apply_async 添加自定義的插入log的裝飾器，重寫delay的代碼參考如下：

class ContextTask(celery.Task):
    def __call__(self, *args, **kwargs):
        with app.app_context():
            return TaskBase.__call__(self, *args, **kwargs)
        
    def delay(self, *args, queue=None, custom_relation_id='', custom_relation_type='', **kwargs):
        task_id = str(uuid.uuid4())
        # 執行已封裝的，將任務信息寫入數據庫的function
        create_mq_task_log(self.__name__, 
            args=args, kwargs=kwargs,
            task_id=task_id, 
            queue=queue,
            custom_relation_id=custom_relation_id, custom_relation_type=custom_relation_type,
            )
        # 調用異步任務
        func.apply_async(
            args=args, kwargs=kwargs, 
            task_id=task_id, 
            queue=queue)

# 實例化 Celery 之後，爲實例指定新的Task類
celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.Task = ContextTask
# 或 實例化celery的時候直接指定新的Task類
celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'], task_cls=ContextTask)

問：爲什麼選擇在執行 apply_async 之前寫入log呢？

答：此時，mq_task_log.is_compensated 字段就顯得很重要了，該字段代表了任務是否真正的進入了 RabbitMq ，可以避免由於 RabbitMq 連接異常等原因造成的任務丟失。

如果是正常業務隊列，可以實時消費掉，可以寫一個定時腳本去校驗是否出現任務丟失的情況，並更新is_compensated=1，然後進行任務的補償執行。

task 執行結果的保存

爲了方便統一管理，這裏對celery的task進行重寫，統一處理任務處理結果的更新和任務異常的重試方案
如果不想重寫 task，可以重寫 celery.app.task.Task.on_failure 和 celery.app.task.Task.on_succes 來實現。

以下是我們項目中對celery.app.base.Celery.task 的重新封裝，具體邏輯請看註釋

class Celery(_celery.Celery):
    
    # 重寫task的執行
    def task(self, *args_task, **opts_task):

        def decorator(func):
            sup = super(Celery, self).task

            @sup(*args_task, **opts_task)
            @functools.wraps(func)
            def wrapper(*args, **kwargs):
                from init import redis_store
                from models.mq_task_log_model import MqTaskLog
                try:
                    # 在執行任務之前驗證任務的執行結果，並加鎖，保障任務只會執行一次
                    task_id = wrapper.request.id
                    with RedLock(task_id, connection_details=[redis_store], ttl=60):
                        mq_task_log = MqTaskLog.find(task_id=task_id).first()
                        # 若任務已經執行，直接結束
                        if mq_task_log and mq_task_log.is_done:
                            return
                        # 執行任務
                        func(*args, **kwargs)
                except RedLockError:
                    return
                except Exception as exc:
                    # 任務重試方案，最多重試3次，每次間隔5秒鐘
                    wrapper.retry(exc=exc, args=args, kwargs=kwargs, countdown=5, max_retries=3)
                else:
                    if mq_task_log:
                        # 這裏是在model上實現了task_done方法，將status更新爲1，表示任務執行成功
                        mq_task_log.task_done()

            return wrapper

        return decorator

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

celery 自定義數據持久化方案

版本

celery 自定義數據持久化方案

持久化數據

task 執行結果的保存

項目管理筆記-第十三章項目相關方管理

python 內置函數一覽

Docker 部署 Flask 應用實踐

淺談開發的角度對項目規劃、項目管理的理解

嘮嘮python的協程分享

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結