Celery实践, 多队列,多优先级,任务重试

       在 Python 的 web 框架中,Flask 由于其轻量、易于扩展而得到了广泛的应用,本文主要基于 Flask 浅谈 Celery 的应用。

       Celery是一个简单、灵活且可靠的分布式系统,常用于处理大量消息,同时也提供了维护该系统所需的工具。简而言之,Celery就是一个任务队列,专注于实时处理,同时还支持任务调度。

       Celery 使用代理人(broker)在客户端(client)和工人(worker) 之间进行调度。为了启动任务,客户端向队列添加消息,然后代理将该消息传递给工作者。

1 celery 入门

1.1 选择代理人(broker)

Celery 在执行任务时需要通过一个消息中间件来接收和发送任务消息,以及存储任务结果, 一般使用 RabbitMQ 或者 Redis。本文将以Redis为例,在项目中进行配置。

1.2 安装celery

可以使用pip或easy_install快速安装

pip install celery

1.3 在项目中创建Celery实例

可以将实例创建在项目的启动文件或者初始化文件中,在本项目中,我们将实例创建在初始化文件init.py中:

from celery import Celery

celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)

celery的配置可以直接在项目配置文件config.py中统一写入,以下是常用的几项配置:

CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_BROKER_URL = 'redis://127.0.0.1:6379' 
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379'  # 它用于跟踪任务状态和结果。虽然默认情况下禁用了结果,但我在此处使用RPC结果后端,因为我演示了如何在以后检索结果,您可能希望为应用程序使用不同的后端。他们都有不同的优点和缺点。如果不需要结果,最好禁用它们。通过设置@task(ignore_result=True)选项,也可以为单个任务禁用结果。


如果有队列优先级的需求,可以配置交换机和队列,本文举例为配置低优先级和高优先级两个队列

# 配置低优先级和高优先级交换机
low_exchange = Exchange('low_priority', type='direct')
high_exchange = Exchange('high_priority', type='direct')

# 配置低优先级和高优先级队列
CELERY_QUEUES = (
    Queue(name='celery'),  # 为celery的默认队列,如果项目中不使用,可以不启用
    Queue(name='low_priority', exchange=low_exchange, routing_key='low_priority'),
    Queue(name='high_priority', exchange=high_exchange, routing_key='high_priority')
)

1.4 注册任务,注册任务指定优先级

from init import celery

@celery.task
def add(x, y):
    return x + y
   
 # 若需要自定义该任务在某一队列中的优先级,可以传入参数
 # property 共有0-9个层级,优先级 从 0 -9 越来越低
@celery.task(property=3)  
def add(x, y):
    return x + y
   

1.5 启动任务

可以使用delay方法,该方法会将任务放入celery默认队列中
>>> add.delay(2, 2)

delay 方法是 apply_async 方法的简易版本,使用apply_async我们可以传递参数来控制任务的执行,比如指定队列,指定交换机,延迟执行时间:

>>> add.apply_async((2, 2))

>>> add.apply_async((2, 2), queue='low_priority', countdown=10)

在实际项目中,为了避免队列使用混乱,我们可以将做一个统一的队列任务启动接口,方便管理,比如:

def high_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
    func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
                     queue='high_priority', **options)


def low_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
    func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
                     queue='low_priority', **options)

这样,启动任务的调用就变成了自定义方式

high_priority_task(add, args=(2, 2))

1.6 任务重试

在任务执行的过程中,总会由于偶尔的网络抖动或者其他原因造成网络请求超时或者抛出其他未可知的异常,任务中不能保证所有的异常都被及时重试处理,celery 提供了很方便的重试机制,可以配置重试次数,和重试时间间隔。
单次任务重试,可以在任务中手动重试:

from init import celery

@celery.task(bind=True)
def add(self, x, y):
    try:
        return x + y
    except Exception as exc:
        raise self.retry(countdown=5, max_retries=3, exc=exc)  # 下次重试5s以后,最多重试3次

介绍一下retry

如果觉着这种方法不方便,可以重写celery

import celery as _celery
import functools

def make_celery():
    class Celery(_celery.Celery):
    
        def task(self, *args_task, **opts_task):
            def decorator(func):
                sup = super(Celery, self).task

                @sup(*args_task, **opts_task)
                @functools.wraps(func)
                def wrapper(*args, **kwargs):
                    try:
                        func(*args, **kwargs)
                    except Test1Exception as exc:  # 如果有特定的exception不需要celery进行重试,可以统一处理
                        raise Test1Exception(exc)
                    except Exception as exc:
                        wrapper.retry(exc=exc, args=args, kwargs=kwargs, countdown=5, max_retries=3)

                return wrapper

            return decorator

    celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
    celery.conf.update(app.config)
    return celery

celery = make_celery()

2 celery 启动命令

celery -A message_queue worker -l info

参数详解:

-A APP, --app=APP     app instance to use (e.g. module.attr_name) 模块名
-b BROKER, --broker=BROKER
                    url to broker.  default is 'amqp://guest@localhost//'
--loader=LOADER       name of custom loader class to use.
--config=CONFIG       Name of the configuration module
--workdir=WORKING_DIRECTORY
                    Optional directory to change to after detaching.
-C, --no-color        
-q, --quiet           
-c CONCURRENCY, --concurrency=CONCURRENCY  进程数,一般默认为CPU核数
                    Number of child processes processing the queue. The
                    default is the number of CPUs available on your
                    system.
-P POOL_CLS, --pool=POOL_CLS
                    Pool implementation: prefork (default), eventlet,
                    gevent, solo or threads.
--purge, --discard    Purges all waiting tasks before the daemon is started.
                    **WARNING**: This is unrecoverable, and the tasks will
                    be deleted from the messaging server.
-l LOGLEVEL, --loglevel=LOGLEVEL  日志级别,一般用info就好,能打日志
                    Logging level, choose between DEBUG, INFO, WARNING,
                    ERROR, CRITICAL, or FATAL.
-n HOSTNAME, --hostname=HOSTNAME
                    Set custom hostname, e.g. 'w1.%h'. Expands: %h
                    (hostname), %n (name) and %d, (domain).
-B, --beat            Also run the celery beat periodic task scheduler.
                    Please note that there must only be one instance of
                    this service.
-s SCHEDULE_FILENAME, --schedule=SCHEDULE_FILENAME
                    Path to the schedule database if running with the -B
                    option. Defaults to celerybeat-schedule. The extension
                    ".db" may be appended to the filename. Apply
                    optimization profile.  Supported: default, fair
--scheduler=SCHEDULER_CLS
                    Scheduler class to use. Default is
                    celery.beat.PersistentScheduler
-S STATE_DB, --statedb=STATE_DB
                    Path to the state database. The extension '.db' may be
                    appended to the filename. Default: None
-E, --events          Send events that can be captured by monitors like
                    celery events, celerymon, and others.
--time-limit=TASK_TIME_LIMIT
                    Enables a hard time limit (in seconds int/float) for
                    tasks.
--soft-time-limit=TASK_SOFT_TIME_LIMIT
                    Enables a soft time limit (in seconds int/float) for
                    tasks.
--maxtasksperchild=MAX_TASKS_PER_CHILD
                    Maximum number of tasks a pool worker can execute
                    before it's terminated and replaced by a new worker.
-Q QUEUES, --queues=QUEUES  允许这个worker消费的队列
                    List of queues to enable for this worker, separated by
                    comma. By default all configured queues are enabled.
                    Example: -Q video,image
-X EXCLUDE_QUEUES, --exclude-queues=EXCLUDE_QUEUES
-I INCLUDE, --include=INCLUDE
                    Comma separated list of additional modules to import.
                    Example: -I foo.tasks,bar.tasks
--autoscale=AUTOSCALE
                    Enable autoscaling by providing max_concurrency,
                    min_concurrency. Example:: --autoscale=10,3 (always
                    keep 3 processes, but grow to 10 if necessary)
--autoreload          Enable autoreloading.
--no-execv            Don't do execv after multiprocessing child fork.
--without-gossip      Do not subscribe to other workers events.
--without-mingle      Do not synchronize with other workers at startup.
--without-heartbeat   Do not send event heartbeats.
--heartbeat-interval=HEARTBEAT_INTERVAL
                    Interval in seconds at which to send worker heartbeat
-O OPTIMIZATION       
-D, --detach          
-f LOGFILE, --logfile=LOGFILE  日志路径
                    Path to log file. If no logfile is specified, stderr
                    is used.
--pidfile=PIDFILE     Optional file used to store the process pid. The
                    program will not start if this file already exists and
                    the pid is still alive.
--uid=UID             User id, or user name of the user to run as after
                    detaching.
--gid=GID             Group id, or group name of the main group to change to
                    after detaching.
--umask=UMASK         Effective umask (in octal) of the process after
                    detaching.  Inherits the umask of the parent process
                    by default.
--executable=EXECUTABLE
                    Executable to use for the detached process.
--version             show program's version number and exit
-h, --help            show this help message and exit

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章