Celery實踐, 多隊列,多優先級,任務重試

       在 Python 的 web 框架中,Flask 由於其輕量、易於擴展而得到了廣泛的應用,本文主要基於 Flask 淺談 Celery 的應用。

       Celery是一個簡單、靈活且可靠的分佈式系統,常用於處理大量消息,同時也提供了維護該系統所需的工具。簡而言之,Celery就是一個任務隊列,專注於實時處理,同時還支持任務調度。

       Celery 使用代理人(broker)在客戶端(client)和工人(worker) 之間進行調度。爲了啓動任務,客戶端向隊列添加消息,然後代理將該消息傳遞給工作者。

1 celery 入門

1.1 選擇代理人(broker)

Celery 在執行任務時需要通過一個消息中間件來接收和發送任務消息,以及存儲任務結果, 一般使用 RabbitMQ 或者 Redis。本文將以Redis爲例,在項目中進行配置。

1.2 安裝celery

可以使用pip或easy_install快速安裝

pip install celery

1.3 在項目中創建Celery實例

可以將實例創建在項目的啓動文件或者初始化文件中,在本項目中,我們將實例創建在初始化文件init.py中:

from celery import Celery

celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)

celery的配置可以直接在項目配置文件config.py中統一寫入,以下是常用的幾項配置:

CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_BROKER_URL = 'redis://127.0.0.1:6379' 
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379'  # 它用於跟蹤任務狀態和結果。雖然默認情況下禁用了結果,但我在此處使用RPC結果後端,因爲我演示瞭如何在以後檢索結果,您可能希望爲應用程序使用不同的後端。他們都有不同的優點和缺點。如果不需要結果,最好禁用它們。通過設置@task(ignore_result=True)選項,也可以爲單個任務禁用結果。


如果有隊列優先級的需求,可以配置交換機和隊列,本文舉例爲配置低優先級和高優先級兩個隊列

# 配置低優先級和高優先級交換機
low_exchange = Exchange('low_priority', type='direct')
high_exchange = Exchange('high_priority', type='direct')

# 配置低優先級和高優先級隊列
CELERY_QUEUES = (
    Queue(name='celery'),  # 爲celery的默認隊列,如果項目中不使用,可以不啓用
    Queue(name='low_priority', exchange=low_exchange, routing_key='low_priority'),
    Queue(name='high_priority', exchange=high_exchange, routing_key='high_priority')
)

1.4 註冊任務,註冊任務指定優先級

from init import celery

@celery.task
def add(x, y):
    return x + y
   
 # 若需要自定義該任務在某一隊列中的優先級,可以傳入參數
 # property 共有0-9個層級,優先級 從 0 -9 越來越低
@celery.task(property=3)  
def add(x, y):
    return x + y
   

1.5 啓動任務

可以使用delay方法,該方法會將任務放入celery默認隊列中
>>> add.delay(2, 2)

delay 方法是 apply_async 方法的簡易版本,使用apply_async我們可以傳遞參數來控制任務的執行,比如指定隊列,指定交換機,延遲執行時間:

>>> add.apply_async((2, 2))

>>> add.apply_async((2, 2), queue='low_priority', countdown=10)

在實際項目中,爲了避免隊列使用混亂,我們可以將做一個統一的隊列任務啓動接口,方便管理,比如:

def high_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
    func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
                     queue='high_priority', **options)


def low_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
    func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
                     queue='low_priority', **options)

這樣,啓動任務的調用就變成了自定義方式

high_priority_task(add, args=(2, 2))

1.6 任務重試

在任務執行的過程中,總會由於偶爾的網絡抖動或者其他原因造成網絡請求超時或者拋出其他未可知的異常,任務中不能保證所有的異常都被及時重試處理,celery 提供了很方便的重試機制,可以配置重試次數,和重試時間間隔。
單次任務重試,可以在任務中手動重試:

from init import celery

@celery.task(bind=True)
def add(self, x, y):
    try:
        return x + y
    except Exception as exc:
        raise self.retry(countdown=5, max_retries=3, exc=exc)  # 下次重試5s以後,最多重試3次

介紹一下retry

如果覺着這種方法不方便,可以重寫celery

import celery as _celery
import functools

def make_celery():
    class Celery(_celery.Celery):
    
        def task(self, *args_task, **opts_task):
            def decorator(func):
                sup = super(Celery, self).task

                @sup(*args_task, **opts_task)
                @functools.wraps(func)
                def wrapper(*args, **kwargs):
                    try:
                        func(*args, **kwargs)
                    except Test1Exception as exc:  # 如果有特定的exception不需要celery進行重試,可以統一處理
                        raise Test1Exception(exc)
                    except Exception as exc:
                        wrapper.retry(exc=exc, args=args, kwargs=kwargs, countdown=5, max_retries=3)

                return wrapper

            return decorator

    celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
    celery.conf.update(app.config)
    return celery

celery = make_celery()

2 celery 啓動命令

celery -A message_queue worker -l info

參數詳解:

-A APP, --app=APP     app instance to use (e.g. module.attr_name) 模塊名
-b BROKER, --broker=BROKER
                    url to broker.  default is 'amqp://guest@localhost//'
--loader=LOADER       name of custom loader class to use.
--config=CONFIG       Name of the configuration module
--workdir=WORKING_DIRECTORY
                    Optional directory to change to after detaching.
-C, --no-color        
-q, --quiet           
-c CONCURRENCY, --concurrency=CONCURRENCY  進程數,一般默認爲CPU核數
                    Number of child processes processing the queue. The
                    default is the number of CPUs available on your
                    system.
-P POOL_CLS, --pool=POOL_CLS
                    Pool implementation: prefork (default), eventlet,
                    gevent, solo or threads.
--purge, --discard    Purges all waiting tasks before the daemon is started.
                    **WARNING**: This is unrecoverable, and the tasks will
                    be deleted from the messaging server.
-l LOGLEVEL, --loglevel=LOGLEVEL  日誌級別,一般用info就好,能打日誌
                    Logging level, choose between DEBUG, INFO, WARNING,
                    ERROR, CRITICAL, or FATAL.
-n HOSTNAME, --hostname=HOSTNAME
                    Set custom hostname, e.g. 'w1.%h'. Expands: %h
                    (hostname), %n (name) and %d, (domain).
-B, --beat            Also run the celery beat periodic task scheduler.
                    Please note that there must only be one instance of
                    this service.
-s SCHEDULE_FILENAME, --schedule=SCHEDULE_FILENAME
                    Path to the schedule database if running with the -B
                    option. Defaults to celerybeat-schedule. The extension
                    ".db" may be appended to the filename. Apply
                    optimization profile.  Supported: default, fair
--scheduler=SCHEDULER_CLS
                    Scheduler class to use. Default is
                    celery.beat.PersistentScheduler
-S STATE_DB, --statedb=STATE_DB
                    Path to the state database. The extension '.db' may be
                    appended to the filename. Default: None
-E, --events          Send events that can be captured by monitors like
                    celery events, celerymon, and others.
--time-limit=TASK_TIME_LIMIT
                    Enables a hard time limit (in seconds int/float) for
                    tasks.
--soft-time-limit=TASK_SOFT_TIME_LIMIT
                    Enables a soft time limit (in seconds int/float) for
                    tasks.
--maxtasksperchild=MAX_TASKS_PER_CHILD
                    Maximum number of tasks a pool worker can execute
                    before it's terminated and replaced by a new worker.
-Q QUEUES, --queues=QUEUES  允許這個worker消費的隊列
                    List of queues to enable for this worker, separated by
                    comma. By default all configured queues are enabled.
                    Example: -Q video,image
-X EXCLUDE_QUEUES, --exclude-queues=EXCLUDE_QUEUES
-I INCLUDE, --include=INCLUDE
                    Comma separated list of additional modules to import.
                    Example: -I foo.tasks,bar.tasks
--autoscale=AUTOSCALE
                    Enable autoscaling by providing max_concurrency,
                    min_concurrency. Example:: --autoscale=10,3 (always
                    keep 3 processes, but grow to 10 if necessary)
--autoreload          Enable autoreloading.
--no-execv            Don't do execv after multiprocessing child fork.
--without-gossip      Do not subscribe to other workers events.
--without-mingle      Do not synchronize with other workers at startup.
--without-heartbeat   Do not send event heartbeats.
--heartbeat-interval=HEARTBEAT_INTERVAL
                    Interval in seconds at which to send worker heartbeat
-O OPTIMIZATION       
-D, --detach          
-f LOGFILE, --logfile=LOGFILE  日誌路徑
                    Path to log file. If no logfile is specified, stderr
                    is used.
--pidfile=PIDFILE     Optional file used to store the process pid. The
                    program will not start if this file already exists and
                    the pid is still alive.
--uid=UID             User id, or user name of the user to run as after
                    detaching.
--gid=GID             Group id, or group name of the main group to change to
                    after detaching.
--umask=UMASK         Effective umask (in octal) of the process after
                    detaching.  Inherits the umask of the parent process
                    by default.
--executable=EXECUTABLE
                    Executable to use for the detached process.
--version             show program's version number and exit
-h, --help            show this help message and exit

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章