你不可不知的任務調度神器-AirFlow

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Airflow 是一個編排、調度和監控workflow的平臺,由Airbnb開源,現在在Apache Software Foundation 孵化。AirFlow 將workflow編排爲tasks組成的DAGs,調度器在一組workers上按照指定的依賴關係執行tasks。同時,Airflow 提供了豐富的命令行工具和簡單易用的用戶界面以便用戶查看和操作,並且Airflow提供了監控和報警系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Airflow 使用 DAG (有向無環圖) 來定義工作流,配置作業依賴關係非常方便,從管理方便和使用簡單角度來講,AirFlow遠超過其他的任務調度工具。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Airflow 的天然優勢","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靈活易用,AirFlow 本身是 Python 編寫的,且工作流的定義也是 Python 編寫,有了 Python膠水的特性,沒有什麼任務是調度不了的,有了開源的代碼,沒有什麼問題是無法解決的,你完全可以修改源碼來滿足個性化的需求,而且更重要的是代碼都是 –human-readable 。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"功能強大,自帶的 Operators 都有15+,也就是說本身已經支持 15+ 不同類型的作業,而且還是可自定義 Operators,什麼 shell 腳本,python,mysql,oracle,hive等等,無論不傳統數據庫平臺還是大數據平臺,統統不在話下,對官方提供的不滿足,完全可以自己編寫 Operators。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優雅,作業的定義很簡單明瞭, 基於 jinja 模板引擎很容易做到腳本命令參數化,web 界面更是也非常 –human-readable ,誰用誰知道。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"極易擴展,提供各種基類供擴展, 還有多種執行器可供選擇,其中 CeleryExcutor 使用了消息隊列來編排多個工作節點(worker), 可分佈式部署多個 worker ,AirFlow 可以做到無限擴展。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"豐富的命令工具,你甚至都不用打開瀏覽器,直接在終端敲命令就能完成測試,部署,運行,清理,重跑,追數等任務,想想那些靠着在界面上不知道點擊多少次才能部署一個小小的作業時,真覺得AirFlow真的太友好了。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Airflow 是免費的,我們可以將一些常做的巡檢任務,定時腳本(如 crontab ),ETL處理,監控等任務放在 AirFlow 上集中管理,甚至都不用再寫監控腳本,作業出錯會自動發送日誌到指定人員郵箱,低成本高效率地解決生產問題。但是由於中文文檔太少,大多不夠全全,因此想快速上手並不十分容易。首先要具備一定的 Python 知識,反覆閱讀官方文檔,理解調度原理。本系列分享由淺入深,逐步細化,嘗試爲你揭開 AirFlow 的面紗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"AirFlow 的架構和組成","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c138a8c4bca4b1c2ac98349251a38981.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AirFlow的架構圖如上圖所示,包含了以下核心的組件:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據庫:這個數據庫存儲有關任務狀態的信息。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度器:Scheduler 是一種使用 DAG 定義結合元數據中的任務狀態來決定哪些任務需要被執行以及任務執行優先級的過程。調度器通常作爲服務運行。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行器:Executor 是一個消息隊列進程,它被綁定到調度器中,用於確定實際執行每個任務計劃的工作進程。有不同類型的執行器,每個執行器都使用一個指定工作進程的類來執行任務。例如,LocalExecutor 使用與調度器進程在同一臺機器上運行的並行進程執行任務。 其他像 CeleryExecutor 的執行器使用存在於獨立的工作機器集羣中的工作進程執行任務。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Workers:這些是實際執行任務邏輯的進程,由正在使用的執行器確定。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中主要的部件介紹如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Scheduler","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度器。調度器是整個airlfow的核心樞紐,負責發現用戶定義的dag文件,並根據定時器將有向無環圖轉爲若干個具體的dagrun,並監控任務狀態。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Dag","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有向無環圖。有向無環圖用於定義任務的任務依賴關係。任務的定義由算子operator進行,其中,BaseOperator是所有算子的父類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Dagrun","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有向無環圖任務實例。在調度器的作用下,每個有向無環圖都會轉成任務實例。不同的任務實例之間用dagid/ 執行時間(execution date)進行區分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Taskinstance","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"dagrun下面的一個任務實例。具體來說,對於每個dagrun實例,算子(operator)都將轉成對應的Taskinstance。由於任務可能失敗,根據定義調度器決定是否重試。不同的任務實例由 dagid/執行時間(execution date)/算子/執行時間/重試次數進行區分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Executor","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務執行器。每個任務都需要由任務執行器完成。BaseExecutor是所有任務執行器的父類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"LocalTaskJob","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"負責監控任務與行,其中包含了一個重要屬性taskrunner。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"TaskRunner","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓子進程,執行任務。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"AirFlow安裝和初體驗","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"安裝 AirFlow 需要 Pyhton環境,關於環境的安裝大家可以自行查詢,不在展開。這裏我們直接使用python的pip工具進行 AirFlow 的安裝:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"# airflow 需要 home 目錄,默認是~/airflow,\n# 但是如果你需要,放在其它位置也是可以的\n# (可選)\nexport AIRFLOW_HOME = ~/airflow\n\n# 使用 pip 從 pypi 安裝\npip install apache-airflow\n\n# 初始化數據庫\nairflow initdb\n\n# 啓動 web 服務器,默認端口是 8080\nairflow webserver -p 8080\n\n# 啓動定時器\nairflow scheduler\n\n# 在瀏覽器中瀏覽 localhost:8080,並在 home 頁開啓 example dag","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AirFlow默認使用sqlite作爲數據庫,直接執行數據庫初始化命令後,會在環境變量路徑下新建一個數據庫文件airflow.db。當然了你也可以指定 Mysql 作爲 AirFlow的數據庫,只需要修改airflow.conf 即可:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"# The executor class that airflow should use. Choices include\n# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor\nexecutor = LocalExecutor\n# The SqlAlchemy connection string to the metadata database.\n# SqlAlchemy supports many different database engine, more information\n# their website\nsql_alchemy_conn = mysql://root:xxxxxx@localhost:3306/airflow","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"安裝完畢,啓動 AirFlow我們進入 UI頁面可以看到:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/80/80dd447b6b2d6ad11451edae9844d0bb.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然我們還可以切換到樹視圖模式:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/06cf2d9107437e10832375c10f295ad6.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,還支持圖標視圖、甘特圖等模式,是不是非常高大上?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Hello AirFlow!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到此我們本地已經安裝了一個單機版本的 AirFlow,然後我們可以根據官網可以做一個Demo來體驗一下 AirFlow的強大。首先在此之前,我們要介紹一些概念和原理:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在編寫AirFlow任務時,AirFlow到底做了什麼?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先用戶編寫Dag文件","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,SchedulerJob發現新增DAG文件,根據starttime、endtime、schedule_interval將Dag轉爲Dagrun。由於Dag僅僅是一個定位依賴關係的文件,因此需要調度器將其轉爲具體的任務。在細粒度層面,一個Dag轉爲若干個Dagrun,每個dagrun由若干個任務實例組成,具體來說,每個operator轉爲一個對應的Taskinstance。Taskinstance將根據任務依賴關係以及依賴上下文決定是否執行。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,任務的執行將發送到執行器上執行。具體來說,可以在本地執行,也可以在集羣上面執行,也可以發送到celery worker遠程執行。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,在執行過程中,先封裝成一個LocalTaskJob,然後調用taskrunner開啓子進程執行任務。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼我們就需要新增一個自己的Dag文件,我們直接使用官網的例子,這是一個典型的ETL任務:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"\"\"\"\n### ETL DAG Tutorial Documentation\nThis ETL DAG is compatible with Airflow 1.10.x (specifically tested with 1.10.12) and is referenced\nas part of the documentation that goes along with the Airflow Functional DAG tutorial located\n[here](https://airflow.apache.org/tutorial_decorated_flows.html)\n\"\"\"\n# [START tutorial]\n# [START import_module]\nimport json\n\n# The DAG object; we'll need this to instantiate a DAG\nfrom airflow import DAG\n\n# Operators; we need this to operate!\nfrom airflow.operators.python import PythonOperator\nfrom airflow.utils.dates import days_ago\n\n# [END import_module]\n\n# [START default_args]\n# These args will get passed on to each operator\n# You can override them on a per-task basis during operator initialization\ndefault_args = {\n 'owner': 'airflow',\n}\n# [END default_args]\n\n# [START instantiate_dag]\nwith DAG(\n 'tutorial_etl_dag',\n default_args=default_args,\n description='ETL DAG tutorial',\n schedule_interval=None,\n start_date=days_ago(2),\n tags=['example'],\n) as dag:\n # [END instantiate_dag]\n # [START documentation]\n dag.doc_md = __doc__\n # [END documentation]\n\n # [START extract_function]\n def extract(**kwargs):\n ti = kwargs['ti']\n data_string = '{\"1001\": 301.27, \"1002\": 433.21, \"1003\": 502.22}'\n ti.xcom_push('order_data', data_string)\n\n # [END extract_function]\n\n # [START transform_function]\n def transform(**kwargs):\n ti = kwargs['ti']\n extract_data_string = ti.xcom_pull(task_ids='extract', key='order_data')\n order_data = json.loads(extract_data_string)\n\n total_order_value = 0\n for value in order_data.values():\n total_order_value += value\n\n total_value = {\"total_order_value\": total_order_value}\n total_value_json_string = json.dumps(total_value)\n ti.xcom_push('total_order_value', total_value_json_string)\n\n # [END transform_function]\n\n # [START load_function]\n def load(**kwargs):\n ti = kwargs['ti']\n total_value_string = ti.xcom_pull(task_ids='transform', key='total_order_value')\n total_order_value = json.loads(total_value_string)\n\n print(total_order_value)\n\n # [END load_function]\n\n # [START main_flow]\n extract_task = PythonOperator(\n task_id='extract',\n python_callable=extract,\n )\n extract_task.doc_md = \"\"\"\\\n#### Extract task\nA simple Extract task to get data ready for the rest of the data pipeline.\nIn this case, getting data is simulated by reading from a hardcoded JSON string.\nThis data is then put into xcom, so that it can be processed by the next task.\n\"\"\"\n\n transform_task = PythonOperator(\n task_id='transform',\n python_callable=transform,\n )\n transform_task.doc_md = \"\"\"\\\n#### Transform task\nA simple Transform task which takes in the collection of order data from xcom\nand computes the total order value.\nThis computed value is then put into xcom, so that it can be processed by the next task.\n\"\"\"\n\n load_task = PythonOperator(\n task_id='load',\n python_callable=load,\n )\n load_task.doc_md = \"\"\"\\\n#### Load task\nA simple Load task which takes in the result of the Transform task, by reading it\nfrom xcom and instead of saving it to end user review, just prints it out.\n\"\"\"\n\n extract_task >> transform_task >> load_task\n\n# [END main_flow]\n\n# [END tutorial]","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"tutorial.py這個文件需要放置在airflow.cfg設置的 DAGs 文件夾中。DAGs 的默認位置是~/airflow/dags。然後執行以下命令:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"python ~/airflow/dags/tutorial.py","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果這個腳本沒有報錯,那就證明您的代碼和您的 Airflow 環境沒有特別大的問題。我們可以用一些簡單的腳本查看這個新增的任務:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"# 打印出所有正在活躍狀態的 DAGs\nairflow list_dags\n\n# 打印出 'tutorial' DAG 中所有的任務\nairflow list_tasks tutorial\n\n# 打印出 'tutorial' DAG 的任務層次結構\nairflow list_tasks tutorial --tree","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後我們就可以在上面我們提到的UI界面中看到運行中的任務了!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AirFlow本身還有一些常用的命令:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"backfill Run subsections of a DAG for a specified date range\nlist_tasks List the tasks within a DAG\nclear Clear a set of task instance, as if they never ran\npause Pause a DAG\nunpause Resume a paused DAG\ntrigger_dag Trigger a DAG run\npool CRUD operations on pools\nvariables CRUD operations on variables\nkerberos Start a kerberos ticket renewer\nrender Render a task instance's template(s)\nrun Run a single task instance\ninitdb Initialize the metadata database\nlist_dags List all the DAGs\ndag_state Get the status of a dag run\ntask_failed_deps Returns the unmet dependencies for a task instance\n from the perspective of the scheduler. In other words,\n why a task instance doesn't get scheduled and then\n queued by the scheduler, and then run by an executor).\ntask_state Get the status of a task instance\nserve_logs Serve logs generate by worker\ntest Test a task instance. This will run a task without\n checking for dependencies or recording it's state in\n the database.\nwebserver Start a Airflow webserver instance\nresetdb Burn down and rebuild the metadata database\nupgradedb Upgrade the metadata database to latest version\nscheduler Start a scheduler instance\nworker Start a Celery worker node\nflower Start a Celery Flower\nversion Show the version\nconnections List/Add/Delete connections","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體來看,AirFlow的上手難度和特性支持都還不錯,同時還有比較不錯的擴展性。如果用戶熟悉Python能進行一些定製化開發,簡直不要太爽!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而且,Airflow 已經在 Adobe、Airbnb、Google、Lyft 等商業公司內部得到廣泛應用;國內,阿里巴巴也有使用(Maat),業界有大規模實踐經驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快來試一試吧!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzU3MzgwNTU2Mg==&mid=2247496734&idx=1&sn=bd8ee5233c880eae0a8d91b8ea0a4c0c&chksm=fd3eb28bca493b9d88d329004ada1912a8b6247acfcace9876ee3ec58ffa7db98c110c5ef241&token=808170609&lang=zh_CN#rd","title":""},"content":[{"type":"text","text":"你不可不知的任務調度神器-AirFlow","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歡迎關注,","attrs":{}},{"type":"link","attrs":{"href":"https://shimo.im/docs/jdPhrtFwVCAMkoWv","title":""},"content":[{"type":"text","text":"《大數據成神之路》","attrs":{}}]},{"type":"text","text":"系列文章","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章