Python定時任務框架：APScheduler源碼剖析(三)

前言

距離上一篇APScheduler源碼分析已經間隔了一段時間，趁現在有點閒暇，趕緊寫上一篇。

這篇來分析APScheduler執行器相關的代碼。

回顧

先回憶一下APScheduler是怎麼運行起來的？回顧一下example的代碼。

    scheduler = BackgroundScheduler()
    scheduler.add_job(tick, 'interval', seconds=3) # 添加一個任務，3秒後運行
    scheduler.start()

簡單而言，實例化BackgroundScheduler，然後調用add_job方法添加任務，最後調用start方法啓動。

add_job方法通過前面文章的分析已經知道了，就是將方法存到內存dict中，interval指定觸發器爲間隔觸發器，間隔時間爲3秒。

現在看一下start方法。

start方法

BackgroundScheduler的start方法調用了BaseScheduler類的start方法，其代碼如下。

# apscheduler/schedulers/base.py/BaseScheduler
    def start(self, paused=False):
        if self.state != STATE_STOPPED:
            raise SchedulerAlreadyRunningError
         # 檢查：如果我們在uWSGI線程禁用狀態下運行時就返回相應的錯誤警報
        self._check_uwsgi()
        with self._executors_lock:
            # Create a default executor if nothing else is configured
            # 創建默認執行器
            if 'default' not in self._executors:
                self.add_executor(self._create_default_executor(), 'default')
            # Start all the executors
            for alias, executor in self._executors.items():
                executor.start(self, alias)
        with self._jobstores_lock:
            # Create a default job store if nothing else is configured
            # 創建默認的存儲器
            if 'default' not in self._jobstores:
                self.add_jobstore(self._create_default_jobstore(), 'default')
            # Start all the job stores
            for alias, store in self._jobstores.items():
                store.start(self, alias)
            # Schedule all pending jobs
            for job, jobstore_alias, replace_existing in self._pending_jobs:
                self._real_add_job(job, jobstore_alias, replace_existing)
            del self._pending_jobs[:]
        self.state = STATE_PAUSED if paused else STATE_RUNNING
        self._logger.info('Scheduler started')
        self._dispatch_event(SchedulerEvent(EVENT_SCHEDULER_START))
        if not paused:
            self.wakeup()

start方法代碼含義直觀，就是創建默認執行器以及默認的存儲器，此外還調用了想要的start方法，執行器的start方法傳入了self(調度器本身)與alias，執行器的start方法做了什麼？默認執行器的start方法BaseExecutor類中，其代碼如下。

# apscheduler/executors/base.py/BaseExecutor
    def start(self, scheduler, alias):
        self._scheduler = scheduler
        self._lock = scheduler._create_lock()
        self._logger = logging.getLogger('apscheduler.executors.%s' % alias)

可以發現，start方法其實沒做什麼。

APScheduler默認的執行器就是線程執行器

# apscheduler/schedulers/base.py/BaseScheduler
    def _create_default_executor(self):
        """Creates a default executor store, specific to the particular scheduler type."""
        return ThreadPoolExecutor()

本質就是使用ThreadPoolExecutor，但要注意其繼承了BasePoolExecutor，而BasePoolExecutor又繼承了BaseExecutor。

# apscheduler/executores/pool.py
class ThreadPoolExecutor(BasePoolExecutor):
    def __init__(self, max_workers=10):
        pool = concurrent.futures.ThreadPoolExecutor(int(max_workers))
        super().__init__(pool)

如何調用執行器？

這就要說回 _process_jobs方法了，該方法詳細分析在「Python定時任務框架：APScheduler源碼剖析(二)」中，這裏截取部分相關代碼

for job in due_jobs:
     # Look up the job's executor
     # 搜索當前任務對象的執行器
     try:
         executor = self._lookup_executor(job.executor)
     except BaseException:
         #...省略
     # 獲得運行時間
     run_times = job._get_run_times(now)
     run_times = run_times[-1:] if run_times and job.coalesce else run_times
     if run_times:
         try:
             # 提交這個任務給執行器
             executor.submit_job(job, run_times)
         except MaxInstancesReachedError:
             #...省略

大致邏輯就是從jobstore獲取job任務對象，然後通過submitjob方法將job任務對象提交到執行器中，submitjob方法的具體實現在BaseExecutor類中，其邏輯如下。

# apscheduler/executors/base.py/BaseExecutor
    def submit_job(self, job, run_times):
        # self._lock 爲 RLock
        assert self._lock is not None, 'This executor has not been started yet'
        with self._lock:
            if self._instances[job.id] >= job.max_instances:
                raise MaxInstancesReachedError(job)
            self._do_submit_job(job, run_times)
            self._instances[job.id] += 1

submit_job方法先判斷可重入鎖是否存在，存在則在加鎖的情況下使用 _do_submit_job方法執行job任務對象。

因爲默認使用是線程執行器，其dosubmit_job方法就簡單的將job任務對象提交給線程池，對應代碼如下

# apscheduler/executors/pool.py/BasePoolExecutor
    def _do_submit_job(self, job, run_times):
        def callback(f):
            exc, tb = (f.exception_info() if hasattr(f, 'exception_info') else
                       (f.exception(), getattr(f.exception(), '__traceback__', None)))
            if exc:
                self._run_job_error(job.id, exc, tb)
            else:
                self._run_job_success(job.id, f.result())
        f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
        f.add_done_callback(callback)

在 _do_submit_job方法中，一開始定義了回調函數，用於接收線程池執行任務的結果，如果成功了，則調用 _run_job_success方法，失敗了則調用 _run_job_error方法，這兩個方法都在BaseExecutor中。

_run_job_success方法代碼如下。

# apscheduler/executors/base.py/BaseExecutor
    def _run_job_success(self, job_id, events):
        """
        Called by the executor with the list of generated events when :func:`run_job` has been
        successfully called.
        """
        with self._lock:
            self._instances[job_id] -= 1
            if self._instances[job_id] == 0:
                del self._instances[job_id]
        for event in events:
            self._scheduler._dispatch_event(event)

該方法會調用事件相關的機制，將線程池執行job任務對象的結果通APScheduler事件機制分發出去。

APScheduler的事件機制下次再聊，回過頭看 f=self._pool.submit(run_job,job,job._jobstore_alias,run_times,self._logger.name)，job任務對象作爲runjob方法的參數，所以執行job的其實是runjob方法。

run_job方法

run_job方法代碼如下。

# apscheduler/executors/base.py
def run_job(job, jobstore_alias, run_times, logger_name):
    events = []
    logger = logging.getLogger(logger_name)
    for run_time in run_times:
        #  misfire_grace_time：在指定運行時間的之後幾秒仍運行該作業運行
        if job.misfire_grace_time is not None:
            difference = datetime.now(utc) - run_time
            grace_time = timedelta(seconds=job.misfire_grace_time)
            # 判斷是否超時
            if difference > grace_time:
                # 超時，則將 EVENT_JOB_MISSED 事件記錄到 events 這個 list 中
                events.append(JobExecutionEvent(EVENT_JOB_MISSED, job.id, jobstore_alias,
                                                run_time))
                logger.warning('Run time of job "%s" was missed by %s', job, difference)
                continue
        logger.info('Running job "%s" (scheduled at %s)', job, run_time)
        try:
            # 執行job任務對象
            retval = job.func(*job.args, **job.kwargs)
        except BaseException:
            exc, tb = sys.exc_info()[1:]
            formatted_tb = ''.join(format_tb(tb))
            # job任務對象執行報錯，將 EVENT_JOB_ERROR 添加到
            events.append(JobExecutionEvent(EVENT_JOB_ERROR, job.id, jobstore_alias, run_time,
                                            exception=exc, traceback=formatted_tb))
            logger.exception('Job "%s" raised an exception', job)
            # 爲了防止循環引用，導致內存泄漏
            traceback.clear_frames(tb)
            del tb
        else:
            events.append(JobExecutionEvent(EVENT_JOB_EXECUTED, job.id, jobstore_alias, run_time,
                                            retval=retval))
            logger.info('Job "%s" executed successfully', job)
    return events

在 run_job方法中，一開始先判斷當前job任務對象的運行時間是否超過了 misfire_grace_time（在指定運行時間的之後幾秒仍運行該作業運行），如果超時，則記錄到events這個list中。

然後通過 retval=job.func(*job.args,**job.kwargs)真正的執行任務對象，如果執行過程中崩潰了，也會將job任務對象執行報錯以事件的形式添加到events中。

這裏出現了一個有趣的小技巧。

job任務對象執行崩潰後，通過 exc,tb=sys.exc_info()[1:]獲取錯誤，而不是常見的將Exception中的值打印。

sys.exc_info方法會返回三個值：type(異常類別), value(異常說明，可帶參數), traceback(traceback 對象，包含更豐富的信息)，這裏只取了value與traceback信息，然後通過 traceback.format_tb方法將其格式化，記錄到日誌中後，調用 traceback.clear_frames(tb)方法回溯清除所有堆棧幀中的局部變量tb，從APScheduler對該方法的註釋是「爲了防止循環引用，導致內存泄漏」。有點意思。

結尾

本文主要剖析了APScheduler中線程執行器它的源碼，線程執行器代碼簡單，是APScheduler默認的執行器，APScheduler還有多個不同的執行器，各位有興趣可以自行探究一下，有雅緻可以聯繫我一同簡單的討論討論。

APScheduler源碼不同執行器、調度器、觸發器其設計理念是類似的，這裏就不一一去分析的，但還有個東西在前面一直出現卻沒有分析，那就是APSCheduler的「事件分發」機制，下一篇文章就來看看APScheduler的事件分發/監聽等是怎麼實現的。

如果文章對你有所幫助，點擊「在看」支持二兩，下篇文章見。

Python定時任務框架：APScheduler源碼剖析(三)

前言

回顧

start方法

run_job方法

結尾

Python 到底是強類型語言，還是弱類型語言？

Python 工匠：在邊界處思考

Python 3.10 的首個 PEP 誕生，內置類型 zip() 將迎來新特性

OrderedDict 是如何保證 Key 的插入順序的？

Python 項目提速技巧：連接複用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結