背景
前段時間排查了個內存泄露的故障,花了幾天時間把Gunicorn + Django 從頭到尾看了下。在排查問題時,網上普遍都是零碎的分析文章,需要自己多處拼接與查證,纔可以勉強窺見全貌。於是萌生了寫一篇按照實際流程來梳理的博客,爲這次排查畫上句號。
由於涉及的東西較多,如Gunicorn、wsgi、Django、元類等都可單獨成文,所以將以系列文章的方式來做記錄。
框架&依賴版本如下。
Django 2.1.15
Gunicorn 20.0.4
Python3.x
從啓動命令開始
大部分文章都是直接看代碼,但我覺得不太易懂。從啓動命令開始解析我覺得會更有條理一些。
根據官方文檔所示,我們的啓動命令如下:
我們先從gunicorn這個入口開始。
gunicorn可執行命令從何而來
gunicorn這個命令是怎麼來的呢?他到底是何方神聖?
從上圖我們可以看出,gunicorn這個命令,是一個去掉了後綴的py腳本。
從這裏可以引申出另外一個知識點(構建python包,有興趣可以自行了解,具體不展開說):gunicorn的setup.py
這個gunicorn可執行文件就是從這行配置生成的。
順着代碼看下去:
class WSGIApplication(Application):
.......
# 根據配置中的路徑import django(app),也就是我們的業務代碼
def load_wsgiapp(self):
return util.import_app(self.app_uri)
def load_pasteapp(self):
from .pasterapp import get_wsgi_app
return get_wsgi_app(self.app_uri, defaults=self.cfg.paste_global_conf)
# 加載wsgi(也就是加載django框架生成的wsgi對象)
def load(self):
if self.cfg.paste is not None:
return self.load_pasteapp()
else:
return self.load_wsgiapp()
def run():
"""\
The ``gunicorn`` command line runner for launching Gunicorn with
generic WSGI applications.
"""
from gunicorn.app.wsgiapp import WSGIApplication
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
查找Application,跳到 app/base.py:
class Application(BaseApplication):
.......
def run(self):
........
if self.cfg.daemon:
util.daemonize(self.cfg.enable_stdio_inheritance)
.....
super().run()
class BaseApplication(object):
.......
def wsgi(self):
if self.callable is None:
self.callable = self.load()
return self.callable
........
def run(self):
try:
Arbiter(self).run()
except RuntimeError as e:
......
終於,我們到了其他文章一直會提到的Arbiter。
Arbiter
講Arbiter之前,大概講下pre-fork模式。
說白了,就是作爲Master的進程通過fork生成共享listen-fd/accept-fd 的 Worker。
Master保證Worker數量,同時監控Worker的工作狀態,重啓無響應的進程。
class Arbiter(object):
......
def start(self):
"""\
Initialize the arbiter. Start listening and set pidfile if needed.
"""
.....
if not self.LISTENERS:
fds = None
listen_fds = systemd.listen_fds()
if listen_fds:
self.systemd = True
fds = range(systemd.SD_LISTEN_FDS_START,
systemd.SD_LISTEN_FDS_START + listen_fds)
elif self.master_pid:
fds = []
for fd in os.environ.pop('GUNICORN_FD').split(','):
fds.append(int(fd))
self.LISTENERS = sock.create_sockets(self.cfg, self.log, fds) #創建所有子進程共享的的listen fd
........
def run(self):
"Main master loop."
self.start()
util._setproctitle("master [%s]" % self.proc_name)
try:
self.manage_workers() # 保持Worker數量,啓動後Worker數量是0,調用這個函數之後會卡在這裏開始新建子進程,直到滿足配置
while True:
self.maybe_promote_master()
sig = self.SIG_QUEUE.pop(0) if self.SIG_QUEUE else None # 讀取事件(如HUP熱重載)
if sig is None: # 沒有事件,休眠 & 殺死已經掛了的Worker & 保持進程數不變
self.sleep()
self.murder_workers()
self.manage_workers()
continue
if sig not in self.SIG_NAMES:
self.log.info("Ignoring unknown signal: %s", sig)
continue
signame = self.SIG_NAMES.get(sig)
handler = getattr(self, "handle_%s" % signame, None)
if not handler:
self.log.error("Unhandled signal: %s", signame)
continue
self.log.info("Handling signal: %s", signame)
handler()
self.wakeup()
except (StopIteration, KeyboardInterrupt):
self.halt()
except HaltServer as inst:
self.halt(reason=inst.reason, exit_status=inst.exit_status)
except SystemExit:
raise
except Exception:
self.log.info("Unhandled exception in main loop",
exc_info=True)
self.stop(False)
if self.pidfile is not None:
self.pidfile.unlink()
sys.exit(-1)
........
def manage_workers(self):
"""\
Maintain the number of workers by spawning or killing
as required.
"""
if len(self.WORKERS) < self.num_workers:
self.spawn_workers()
workers = self.WORKERS.items()
workers = sorted(workers, key=lambda w: w[1].age)
while len(workers) > self.num_workers:
(pid, _) = workers.pop(0)
self.kill_worker(pid, signal.SIGTERM)
active_worker_count = len(workers)
if self._last_logged_active_worker_count != active_worker_count:
self._last_logged_active_worker_count = active_worker_count
self.log.debug("{0} workers".format(active_worker_count),
extra={"metric": "gunicorn.workers",
"value": active_worker_count,
"mtype": "gauge"})
Master進程的功能其實很簡單,就是監控子進程的狀態 & 提供公共的數據(Listen fd)。
下面我們看下master如何拉起子進程。
def spawn_workers(self):
"""\
Spawn new workers as needed.
This is where a worker process leaves the main loop
of the master process.
"""
for _ in range(self.num_workers - len(self.WORKERS)):
self.spawn_worker()
time.sleep(0.1 * random.random())
manage函數中首先調用的是spawn_workers,從上面可以看出他就是循環調用spawn_worker,拉起後隨機退避等待 0~100ms(防止子進程同時啓動對系統造成過大壓力,每個子進程CPU資源都導致每個子進程都無法完成初始化而被kill。
下面我們看下spawn_worker。
def spawn_worker(self):
self.worker_age += 1
# 這裏的這個self.app,就是之前的WSGIApplication。
worker = self.worker_class(self.worker_age, self.pid, self.LISTENERS,
self.app, self.timeout / 2.0,
self.cfg, self.log)
self.cfg.pre_fork(self, worker)
pid = os.fork()
if pid != 0:
worker.pid = pid
self.WORKERS[pid] = worker
return pid
# Do not inherit the temporary files of other workers
for sibling in self.WORKERS.values():
sibling.tmp.close()
# Process Child
worker.pid = os.getpid()
try:
util._setproctitle("worker [%s]" % self.proc_name)
self.log.info("Booting worker with pid: %s", worker.pid)
self.cfg.post_fork(self, worker)
worker.init_process() # worker根據你選擇得種類不同,具體得實現也不相同,但他們都會在此處阻塞,之後會用gevent worker來進行講解
sys.exit(0)
except SystemExit:
raise
except AppImportError as e:
self.log.debug("Exception while loading the application",
exc_info=True)
print("%s" % e, file=sys.stderr)
sys.stderr.flush()
sys.exit(self.APP_LOAD_ERROR)
except:
self.log.exception("Exception in worker process")
if not worker.booted:
sys.exit(self.WORKER_BOOT_ERROR)
sys.exit(-1)
finally:
self.log.info("Worker exiting (pid: %s)", worker.pid)
try:
worker.tmp.close()
self.cfg.worker_exit(self, worker)
except:
self.log.warning("Exception during worker exit:\n%s",
traceback.format_exc())
調用 worker.init_process() 之後,子進程便開始了工作。init_process具體的行爲會根據worker的不同而不同,下篇文章會以gevent作爲例子來進行講解。
gunicorn其實還是很簡單的,代碼也不多,很適合拿來練習。