在 Python 的 web 框架中,Flask 由於其輕量、易於擴展而得到了廣泛的應用,本文主要基於 Flask 淺談 Celery 的應用。
Celery是一個簡單、靈活且可靠的分佈式系統,常用於處理大量消息,同時也提供了維護該系統所需的工具。簡而言之,Celery就是一個任務隊列,專注於實時處理,同時還支持任務調度。
Celery 使用代理人(broker)在客戶端(client)和工人(worker) 之間進行調度。爲了啓動任務,客戶端向隊列添加消息,然後代理將該消息傳遞給工作者。
1 celery 入門
1.1 選擇代理人(broker)
Celery 在執行任務時需要通過一個消息中間件來接收和發送任務消息,以及存儲任務結果, 一般使用 RabbitMQ 或者 Redis。本文將以Redis爲例,在項目中進行配置。
1.2 安裝celery
可以使用pip或easy_install快速安裝
pip install celery
1.3 在項目中創建Celery實例
可以將實例創建在項目的啓動文件或者初始化文件中,在本項目中,我們將實例創建在初始化文件init.py中:
from celery import Celery
celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
celery的配置可以直接在項目配置文件config.py中統一寫入,以下是常用的幾項配置:
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_BROKER_URL = 'redis://127.0.0.1:6379'
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379' # 它用於跟蹤任務狀態和結果。雖然默認情況下禁用了結果,但我在此處使用RPC結果後端,因爲我演示瞭如何在以後檢索結果,您可能希望爲應用程序使用不同的後端。他們都有不同的優點和缺點。如果不需要結果,最好禁用它們。通過設置@task(ignore_result=True)選項,也可以爲單個任務禁用結果。
如果有隊列優先級的需求,可以配置交換機和隊列,本文舉例爲配置低優先級和高優先級兩個隊列
# 配置低優先級和高優先級交換機
low_exchange = Exchange('low_priority', type='direct')
high_exchange = Exchange('high_priority', type='direct')
# 配置低優先級和高優先級隊列
CELERY_QUEUES = (
Queue(name='celery'), # 爲celery的默認隊列,如果項目中不使用,可以不啓用
Queue(name='low_priority', exchange=low_exchange, routing_key='low_priority'),
Queue(name='high_priority', exchange=high_exchange, routing_key='high_priority')
)
1.4 註冊任務,註冊任務指定優先級
from init import celery
@celery.task
def add(x, y):
return x + y
# 若需要自定義該任務在某一隊列中的優先級,可以傳入參數
# property 共有0-9個層級,優先級 從 0 -9 越來越低
@celery.task(property=3)
def add(x, y):
return x + y
1.5 啓動任務
可以使用delay方法,該方法會將任務放入celery默認隊列中
>>> add.delay(2, 2)
delay 方法是 apply_async 方法的簡易版本,使用apply_async我們可以傳遞參數來控制任務的執行,比如指定隊列,指定交換機,延遲執行時間:
>>> add.apply_async((2, 2))
>>> add.apply_async((2, 2), queue='low_priority', countdown=10)
在實際項目中,爲了避免隊列使用混亂,我們可以將做一個統一的隊列任務啓動接口,方便管理,比如:
def high_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
queue='high_priority', **options)
def low_priority_task(func, args=None, kwargs=None, task_id=None, producer=None, link=None, link_error=None, **options):
func.apply_async(args=args, kwargs=kwargs, task_id=task_id, producer=producer, link=link, link_error=link_error,
queue='low_priority', **options)
這樣,啓動任務的調用就變成了自定義方式
high_priority_task(add, args=(2, 2))
1.6 任務重試
在任務執行的過程中,總會由於偶爾的網絡抖動或者其他原因造成網絡請求超時或者拋出其他未可知的異常,任務中不能保證所有的異常都被及時重試處理,celery 提供了很方便的重試機制,可以配置重試次數,和重試時間間隔。
單次任務重試,可以在任務中手動重試:
from init import celery
@celery.task(bind=True)
def add(self, x, y):
try:
return x + y
except Exception as exc:
raise self.retry(countdown=5, max_retries=3, exc=exc) # 下次重試5s以後,最多重試3次
介紹一下retry
如果覺着這種方法不方便,可以重寫celery
import celery as _celery
import functools
def make_celery():
class Celery(_celery.Celery):
def task(self, *args_task, **opts_task):
def decorator(func):
sup = super(Celery, self).task
@sup(*args_task, **opts_task)
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
func(*args, **kwargs)
except Test1Exception as exc: # 如果有特定的exception不需要celery進行重試,可以統一處理
raise Test1Exception(exc)
except Exception as exc:
wrapper.retry(exc=exc, args=args, kwargs=kwargs, countdown=5, max_retries=3)
return wrapper
return decorator
celery = Celery(app.import_name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
return celery
celery = make_celery()
2 celery 啓動命令
celery -A message_queue worker -l info
參數詳解:
-A APP, --app=APP app instance to use (e.g. module.attr_name) 模塊名
-b BROKER, --broker=BROKER
url to broker. default is 'amqp://guest@localhost//'
--loader=LOADER name of custom loader class to use.
--config=CONFIG Name of the configuration module
--workdir=WORKING_DIRECTORY
Optional directory to change to after detaching.
-C, --no-color
-q, --quiet
-c CONCURRENCY, --concurrency=CONCURRENCY 進程數,一般默認爲CPU核數
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: prefork (default), eventlet,
gevent, solo or threads.
--purge, --discard Purges all waiting tasks before the daemon is started.
**WARNING**: This is unrecoverable, and the tasks will
be deleted from the messaging server.
-l LOGLEVEL, --loglevel=LOGLEVEL 日誌級別,一般用info就好,能打日誌
Logging level, choose between DEBUG, INFO, WARNING,
ERROR, CRITICAL, or FATAL.
-n HOSTNAME, --hostname=HOSTNAME
Set custom hostname, e.g. 'w1.%h'. Expands: %h
(hostname), %n (name) and %d, (domain).
-B, --beat Also run the celery beat periodic task scheduler.
Please note that there must only be one instance of
this service.
-s SCHEDULE_FILENAME, --schedule=SCHEDULE_FILENAME
Path to the schedule database if running with the -B
option. Defaults to celerybeat-schedule. The extension
".db" may be appended to the filename. Apply
optimization profile. Supported: default, fair
--scheduler=SCHEDULER_CLS
Scheduler class to use. Default is
celery.beat.PersistentScheduler
-S STATE_DB, --statedb=STATE_DB
Path to the state database. The extension '.db' may be
appended to the filename. Default: None
-E, --events Send events that can be captured by monitors like
celery events, celerymon, and others.
--time-limit=TASK_TIME_LIMIT
Enables a hard time limit (in seconds int/float) for
tasks.
--soft-time-limit=TASK_SOFT_TIME_LIMIT
Enables a soft time limit (in seconds int/float) for
tasks.
--maxtasksperchild=MAX_TASKS_PER_CHILD
Maximum number of tasks a pool worker can execute
before it's terminated and replaced by a new worker.
-Q QUEUES, --queues=QUEUES 允許這個worker消費的隊列
List of queues to enable for this worker, separated by
comma. By default all configured queues are enabled.
Example: -Q video,image
-X EXCLUDE_QUEUES, --exclude-queues=EXCLUDE_QUEUES
-I INCLUDE, --include=INCLUDE
Comma separated list of additional modules to import.
Example: -I foo.tasks,bar.tasks
--autoscale=AUTOSCALE
Enable autoscaling by providing max_concurrency,
min_concurrency. Example:: --autoscale=10,3 (always
keep 3 processes, but grow to 10 if necessary)
--autoreload Enable autoreloading.
--no-execv Don't do execv after multiprocessing child fork.
--without-gossip Do not subscribe to other workers events.
--without-mingle Do not synchronize with other workers at startup.
--without-heartbeat Do not send event heartbeats.
--heartbeat-interval=HEARTBEAT_INTERVAL
Interval in seconds at which to send worker heartbeat
-O OPTIMIZATION
-D, --detach
-f LOGFILE, --logfile=LOGFILE 日誌路徑
Path to log file. If no logfile is specified, stderr
is used.
--pidfile=PIDFILE Optional file used to store the process pid. The
program will not start if this file already exists and
the pid is still alive.
--uid=UID User id, or user name of the user to run as after
detaching.
--gid=GID Group id, or group name of the main group to change to
after detaching.
--umask=UMASK Effective umask (in octal) of the process after
detaching. Inherits the umask of the parent process
by default.
--executable=EXECUTABLE
Executable to use for the detached process.
--version show program's version number and exit
-h, --help show this help message and exit