airflow 運行週期設置 schedule_interval

airflow 運行週期問題

最近開始正式使用airflow,關於 schedule_interval 和頁面上顯示的 last run一直有些不太清楚的地方,而在設置一個每週運行的任務時終於遇到了問題,任務並沒有能夠如期運行。

一系列google之後發現 airflow的 schedule_interval雖然可以使用cron表達式,但是還是和crontab有一些區別的。

關於 backfill

backfill命令是用來回填數據的,也就是說以之前的日期運行任務。

當任務是每天運行時只需要加上開始日期就可以了,例如

airflow backfill CKD_ALL_REPORT -s 2018-09-04

但是當任務時多天運行一次時這樣就不起作用了,會提示

No run dates were found for the given dates and dag interval.

這是因爲 airflow有一個窗口的概念
Airflow sets execution_date based on the left bound of the schedule period it is covering, not based on when it fires (which would be the right bound of the period)
stackoverflow上搜到比較合理的解釋,意思就是說,airflow會在start_date開始後,符合schedule_interval定義的第一個時間點記爲execution_date,但是會在下個時間點到達是纔開始運行,也就是說由於這個窗口的原因,last run會滯後一個週期。
所以如何通過jinja來查看execution_date就會發現問題

Jinja模板

JINJA表達式 含義
{{ ds }} the execution date as YYYY-MM-DD
{{ ds_nodash }} the execution date as YYYYMMDD
{{ yesterday_ds }} yesterday’s date as YYYY-MM-DD
{{ yesterday_ds_nodash }} yesterday’s date as YYYYMMDD
{{ tomorrow_ds }} tomorrow’s date as YYYY-MM-DD
{{ tomorrow_ds_nodash }} tomorrow’s date as YYYYMMDD
{{ ts }} same as execution_date.isoformat()
{{ ts_nodash }} same as ts without - and :
{{ execution_date }} the execution_date, (datetime.datetime)
{{ prev_execution_date }} the previous execution date (if available)(datetime.datetime)
{{ next_execution_date }} the next execution date (datetime.datetime)
{{ dag }} the DAG object
{{ task }} the Task object
{{ macros }} a reference to the macros package, described below
{{ task_instance }} the task_instance object
{{ end_date }} same as {{ ds }}
{{ latest_date }} same as {{ ds }}
{{ ti }} same as {{ task_instance }}
{{ params }} a reference to the user-defined params dictionary
{{ var.value.my_var }} global defined variables represented as a dictionary
{{ var.json.my_var.path }} global defined variables represented as a dictionary with deserialized JSON object, append the path to the key within the JSON object
{{ task_instance_key_str }} a unique, human-readable key to the task instance formatted {dag_id}{task_id}{ds}
conf the full configuration object located at airflow.configuration.conf which represents
run_id the run_id of the current DAG run
dag_run a reference to the DagRun object
test_mode whether the task instance was called using the CLI’s test subcommand

參考資料:
https://stackoverflow.com/questions/39612488/airflow-trigger-dag-execution-date-is-the-next-day-why/39620901#39620901

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章