在 Python 的 web 框架中,Flask 由于其轻量、易于扩展而得到了广泛的应用,本文主要基于 Flask 浅谈 Celery 的应用。
Celery是一个简单、灵活且可靠的分布式系统,常用于处理大量消息,同时也提供了维护该系统所需的工具。简而言之,Celery就是一个任务队列,专注于实时处理,同时还支持任务调度。
Celery 使用代理人(broker)在客户端(client)和工人(worker) 之间进行调度。为了启动任务,客户端向队列添加消息,然后代理将该消息传递给工作者。
1 celery 入门
1.1 选择代理人(broker)
Celery 在执行任务时需要通过一个消息中间件来接收和发送任务消息,以及存储任务结果, 一般使用 RabbitMQ 或者 Redis。本文将以Redis为例,在项目中进行配置。
1.2 安装celery
可以使用pip或easy_install快速安装
pip install celery
1.3 在项目中创建Celery实例
可以将实例创建在项目的启动文件或者初始化文件中,在本项目中,我们将实例创建在初始化文件init.py中:
from celery import Celery
celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
celery的配置可以直接在项目配置文件config.py中统一写入,以下是常用的几项配置:
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_BROKER_URL = 'redis://127.0.0.1:6379'
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379' # 它用于跟踪任务状态和结果。虽然默认情况下禁用了结果,但我在此处使用RPC结果后端,因为我演示了如何在以后检索结果,您可能希望为应用程序使用不同的后端。他们都有不同的优点和缺点。如果不需要结果,最好禁用它们。通过设置@task(ignore_result=True)选项,也可以为单个任务禁用结果。
如果有队列优先级的需求,可以配置交换机和队列,本文举例为配置低优先级和高优先级两个队列
# 配置低优先级和高优先级交换机
low_exchange = Exchange('low_priority', type='direct')
high_exchange = Exchange('high_priority', type='direct')
# 配置低优先级和高优先级队列
CELERY_QUEUES = (
Queue(name='celery'), # 为celery的默认队列,如果项目中不使用,可以不启用
Queue(name='low_priority', exchange=low_exchange, routing_key='low_priority'),
Queue(name='high_priority', exchange=high_exchange, routing_key='high_priority')
)
1.4 注册任务,注册任务指定优先级
from init import celery
@celery.task
def add(x, y):
return x + y
# 若需要自定义该任务在某一队列中的优先级,可以传入参数
# property 共有0-9个层级,优先级 从 0 -9 越来越低
@celery.task(property=3)
def add(x, y):
return x + y
1.5 启动任务
可以使用delay方法,该方法会将任务放入celery默认队列中
>>> add.delay(2, 2)
delay 方法是 apply_async 方法的简易版本,使用apply_async我们可以传递参数来控制任务的执行,比如指定队列,指定交换机,延迟执行时间:
>>> add.apply_async((2, 2))
>>> add.apply_async((2, 2), queue='low_priority', countdown=10)
在实际项目中,为了避免队列使用混乱,我们可以将做一个统一的队列任务启动接口,方便管理,比如:
def high_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
queue='high_priority', **options)
def low_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
queue='low_priority', **options)
这样,启动任务的调用就变成了自定义方式
high_priority_task(add, args=(2, 2))
1.6 任务重试
在任务执行的过程中,总会由于偶尔的网络抖动或者其他原因造成网络请求超时或者抛出其他未可知的异常,任务中不能保证所有的异常都被及时重试处理,celery 提供了很方便的重试机制,可以配置重试次数,和重试时间间隔。
单次任务重试,可以在任务中手动重试:
from init import celery
@celery.task(bind=True)
def add(self, x, y):
try:
return x + y
except Exception as exc:
raise self.retry(countdown=5, max_retries=3, exc=exc) # 下次重试5s以后,最多重试3次
介绍一下retry
如果觉着这种方法不方便,可以重写celery
import celery as _celery
import functools
def make_celery():
class Celery(_celery.Celery):
def task(self, *args_task, **opts_task):
def decorator(func):
sup = super(Celery, self).task
@sup(*args_task, **opts_task)
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
func(*args, **kwargs)
except Test1Exception as exc: # 如果有特定的exception不需要celery进行重试,可以统一处理
raise Test1Exception(exc)
except Exception as exc:
wrapper.retry(exc=exc, args=args, kwargs=kwargs, countdown=5, max_retries=3)
return wrapper
return decorator
celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
return celery
celery = make_celery()
2 celery 启动命令
celery -A message_queue worker -l info
参数详解:
-A APP, --app=APP app instance to use (e.g. module.attr_name) 模块名
-b BROKER, --broker=BROKER
url to broker. default is 'amqp://guest@localhost//'
--loader=LOADER name of custom loader class to use.
--config=CONFIG Name of the configuration module
--workdir=WORKING_DIRECTORY
Optional directory to change to after detaching.
-C, --no-color
-q, --quiet
-c CONCURRENCY, --concurrency=CONCURRENCY 进程数,一般默认为CPU核数
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: prefork (default), eventlet,
gevent, solo or threads.
--purge, --discard Purges all waiting tasks before the daemon is started.
**WARNING**: This is unrecoverable, and the tasks will
be deleted from the messaging server.
-l LOGLEVEL, --loglevel=LOGLEVEL 日志级别,一般用info就好,能打日志
Logging level, choose between DEBUG, INFO, WARNING,
ERROR, CRITICAL, or FATAL.
-n HOSTNAME, --hostname=HOSTNAME
Set custom hostname, e.g. 'w1.%h'. Expands: %h
(hostname), %n (name) and %d, (domain).
-B, --beat Also run the celery beat periodic task scheduler.
Please note that there must only be one instance of
this service.
-s SCHEDULE_FILENAME, --schedule=SCHEDULE_FILENAME
Path to the schedule database if running with the -B
option. Defaults to celerybeat-schedule. The extension
".db" may be appended to the filename. Apply
optimization profile. Supported: default, fair
--scheduler=SCHEDULER_CLS
Scheduler class to use. Default is
celery.beat.PersistentScheduler
-S STATE_DB, --statedb=STATE_DB
Path to the state database. The extension '.db' may be
appended to the filename. Default: None
-E, --events Send events that can be captured by monitors like
celery events, celerymon, and others.
--time-limit=TASK_TIME_LIMIT
Enables a hard time limit (in seconds int/float) for
tasks.
--soft-time-limit=TASK_SOFT_TIME_LIMIT
Enables a soft time limit (in seconds int/float) for
tasks.
--maxtasksperchild=MAX_TASKS_PER_CHILD
Maximum number of tasks a pool worker can execute
before it's terminated and replaced by a new worker.
-Q QUEUES, --queues=QUEUES 允许这个worker消费的队列
List of queues to enable for this worker, separated by
comma. By default all configured queues are enabled.
Example: -Q video,image
-X EXCLUDE_QUEUES, --exclude-queues=EXCLUDE_QUEUES
-I INCLUDE, --include=INCLUDE
Comma separated list of additional modules to import.
Example: -I foo.tasks,bar.tasks
--autoscale=AUTOSCALE
Enable autoscaling by providing max_concurrency,
min_concurrency. Example:: --autoscale=10,3 (always
keep 3 processes, but grow to 10 if necessary)
--autoreload Enable autoreloading.
--no-execv Don't do execv after multiprocessing child fork.
--without-gossip Do not subscribe to other workers events.
--without-mingle Do not synchronize with other workers at startup.
--without-heartbeat Do not send event heartbeats.
--heartbeat-interval=HEARTBEAT_INTERVAL
Interval in seconds at which to send worker heartbeat
-O OPTIMIZATION
-D, --detach
-f LOGFILE, --logfile=LOGFILE 日志路径
Path to log file. If no logfile is specified, stderr
is used.
--pidfile=PIDFILE Optional file used to store the process pid. The
program will not start if this file already exists and
the pid is still alive.
--uid=UID User id, or user name of the user to run as after
detaching.
--gid=GID Group id, or group name of the main group to change to
after detaching.
--umask=UMASK Effective umask (in octal) of the process after
detaching. Inherits the umask of the parent process
by default.
--executable=EXECUTABLE
Executable to use for the detached process.
--version show program's version number and exit
-h, --help show this help message and exit