從 Ray 到 Chronos:在 Ray 上使用 BigDL 構建端到端 AI 用例

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#7F7F7F","name":"user"}}],"text":"作者: "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#999999","name":"user"}}],"text":"Wesley Du, Junwei Deng, Kai Huang, Shan Yu and Shane Huang  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者是英特爾人工智能和分析團隊的解決方案架構師,該團隊一直致力於 BigDL的開發。數據科學家和數據工程師可以使用BigDL輕鬆構建端到端的分佈式 AI 應用。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.ray.io\/","title":null,"type":null},"content":[{"type":"text","text":"Ray"}]},{"type":"text","text":"是一個能夠非常快速和簡單地去構建分佈式應用的框架。"},{"type":"link","attrs":{"href":"https:\/\/github.com\/intel-analytics\/BigDL","title":null,"type":null},"content":[{"type":"text","text":"BigDL"}]},{"type":"text","text":"是一個在分佈式大數據上構建可擴展端到端 AI的開源框架,它能利用 Ray 及其本地庫(Native Libraries)來支持高級 AI 用例,如 AutoML 和自動時間序列分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這篇博客中,我們將介紹 BigDL 中的一些核心組件和展示 BigDL 如何利用 Ray 及其本地庫來構建底層基礎設施(例如 RayOnSpark、AutoML 等)以及這些將如何幫助用戶構建AI 應用(例如使用"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Chronos\/Overview\/chronos.html","title":null,"type":null},"content":[{"type":"text","text":"Chronos"}]},{"type":"text","text":" 進行自動時間序列分析)。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"RayOnSpark:在Apache Spark上無縫運行Ray程序"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ray是一個開源分佈式框架,允許用戶輕鬆高效地運行許多新興的人工智能應用,例如深度強化學習和自動化機器學習。BigDL 通過 RayOnSpark 可以將 Ray 無縫集成到大數據預處理流水線中,並已經在一些特定領域構建了多個高級的端到端 AI 應用(例如 AutoML 和 Chronos)。RayOnSpark 在基於Apache Spark的大數據集羣(例如 Apache Hadoop* 或 Kubernetes* 集羣)之上運行 Ray 的程序,這樣一來在內存中的Spark DataFrame可以直接傳輸到 Ray 程序中用於高級 AI 應用。因此藉助 RayOnSpark,用戶就可以在生產環境現有的大數據集羣上直接嘗試各種新興的人工智能應用。此外,RayOnSpark 能將Ray 的程序無縫集成到 Apache Spark 數據處理的流水線中,並直接在內存中的DataFrame 上運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5f\/5f86f91bfe18759591504c02213779b4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 1:RayOnSpark 架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 1展示了 RayOnSpark 的架構。在 Spark的實現中,Spark 程序會在 driver 節點上創建SparkSession對象,其中SparkContext 會負責在集羣上啓動多個 Spark executors以運行 Spark 任務。在 RayOnSpark 中,在Spark driver 節點上會額外創建一個RayContext對象,該對象會在同一集羣中伴隨每個Spark executor一起自動啓動 Ray 進程。RayContext同時會在每個Spark executor內部創建一個RayManager來管理 Ray 進程(例如,在程序退出時自動關閉進程)。下面的代碼塊演示了用戶如何在初始化 RayOnSpark 後,直接在標準 Spark 應用程序中編寫 Ray 代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1.  "},{"type":"text","text":"import ray  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2.  "},{"type":"text","text":"from bigdl.orca import init_orca_context  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3.  "},{"type":"text","text":"from bigdl.orca.ray import RayContext  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"4.  "},{"type":"text","text":"# Initialize SparkContext on the underlying cluster (e.g. the Hadoop\/Yarn cluster)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"5.  "},{"type":"text","text":"sc = init_orca_context(cluster_mode=\"yarn\", cores=...,memory=...,num_nodes=...)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"6.  "},{"type":"text","text":"# Initialize RayContext and launch Ray under the same cluster.  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"7.  "},{"type":"text","text":"ray_ctx = RayContext(sc, object_store_memory=...,...)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"8.  "},{"type":"text","text":"ray_ctx.init()  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"9.  "},{"type":"text","text":"@ray.remote  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"10. "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"class"},{"type":"text","text":" Counter(object)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"11. "},{"type":"text","text":"   def __init__(self):  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"12. "},{"type":"text","text":"      self.n = 0  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"13. "},{"type":"text","text":"   def increment(self):  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"14. "},{"type":"text","text":"      self.n += 1  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"15. "},{"type":"text","text":"      "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"return"},{"type":"text","text":" self.n  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"16. "},{"type":"text","text":"# The Ray actors are created across the big data cluster  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"17. "},{"type":"text","text":"counters = [Counter.remote() "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"for"},{"type":"text","text":" i in range(5)]  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"18. "},{"type":"text","text":"ray.get([c.increment.remote() "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"for"},{"type":"text","text":" c in counters])  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"19. "},{"type":"text","text":"ray_ctx.stop()  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"20. "},{"type":"text","text":"sc.stop()  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 2:RayOnSpark 的示例代碼"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"AutoML (orca.automl):使用 Ray Tune爲AI 應用程序輕鬆調參"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在機器學習或深度學習模型的準確性、性能等方面,超參數優化 (HPO) 對於數據科學家實現其目標非常重要。但是手動對超參數進行調優可能十分耗時且結果也並不能令人滿意。與此同時,分佈式超參數優化編程也是一個具有挑戰性的工作。Ray Tune是一個用於深度學習可擴展的超參數優化框架。BigDL 引入了構建在Ray Tune之上的 AutoML 功能(orca.automl),可以讓數據科學家的工作更輕鬆。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"orca.automl介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多情況下,數據科學家更願意在筆記本電腦上對他們的 AI 應用程序進行原型設計、調試和調參,如果可以將相同的代碼完整地遷移到集羣中並直接運行,這將大大提高端到端的生產力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BigDL 的 Orca 項目可幫助用戶將他們的代碼從筆記本電腦無縫擴展到大數據集羣。此外,BigDL 的 orca.automl 充分利用了 RayOnSpark 和 Ray Tune,提供了一個名爲"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/QuickStart\/orca-autoestimator-pytorch-quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"AutoEstimator"}]},{"type":"text","text":"的分佈式超參數調優 API 。得益於Ray Tune與框架無關的特性,AutoEstimator 同時適用於 PyTorch 和 TensorFlow 模型。用戶可以在他們的筆記本電腦、本地服務器、K8s 集羣、Hadoop\/YARN 集羣等上,用一致的方式對他們的模型進行調參。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"憑藉這些特性,BigDL 中的 orca.automl 可用於許多 AI 應用的自動化調優(包括模型、超參數等)。例如,我們使用 BigDL 的 orca.automl 實現了 AutoXGBoost(XGBoost with HPO)用以自動擬合和優化 XGBoost 模型。相比 Nvidia A100 上的類似解決方案,使用 AutoXGBoost 的訓練速度提高了約 1.7 倍,最終模型更加準確。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更多詳情,可參閱"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/intel-analytics-software\/scalable-autoxgboost-using-analytics-zoo-automl-30d576cb138a","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/medium.com\/intel-analytics-software\/scalable-autoxgboost-using-analytics-zoo-automl-30d576cb138a"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設計細節,可參閱"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/Overview\/distributed-tuning.html","title":null,"type":null},"content":[{"type":"text","text":"orca.automl User Guide"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際操作,可參閱"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/QuickStart\/orca-autoxgboost-quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"AutoXGBoost Quick Start"}]},{"type":"text","text":" 或者 "},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/QuickStart\/orca-autoestimator-pytorch-quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"Auto Tuning for arbitrary models"}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Chronos:在Ray上使用 AutoTS構建自動時間序列分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還開發了一個爲自動時間序列分析的應用框架,稱爲 "},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Chronos\/Overview\/chronos.html","title":null,"type":null},"content":[{"type":"text","text":"Chronos"}]},{"type":"text","text":"。它基於orca.automl在自動分析期間進行超參數優化。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼我們需要 Chronos?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時間序列(TS)分析現在被廣泛的應用於各個領域(例如電信中的網絡質量分析、數據中心運營的日誌分析、高價值設備的預測性維護等),並且變得越來越重要。在最爲常用的預測與檢測領域,傳統統計學方法在準確性與靈活性上都面臨巨大的挑戰,深度學習方法通過將時間序列任務視爲序列建模問題,在多個領域獲得了成功。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但在另一方面,爲時間序列預測\/檢測構建機器學習應用程序可能是一個費力且需要很多專業知識的過程。超參數設置、預處理和特徵工程都可能成爲影響深度學習模型表現的瓶頸。爲了提供一個高效、強大且易用的時間序列分析工具箱,我們推出了Chronos,這是一個用於構建大規模時間序列分析應用程序的框架。它可以使用 AutoML 並進行分佈式訓練,因爲它建立在 Ray Tune、Ray Train和 RayOnSpark 之上。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Chronos 架構介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Chronos 具有多個 (10+) 用於時間序列預測、檢測和模擬的內置深度學習和機器學習模型,以及大量 (70+) 數據處理和特徵工程工具。用戶可以自己調用獨立的算法和模型(預測器(Forecasters), 檢測器(Detectors), 模擬器(Simulators))以獲得最高的靈活性,或者使用我們高度集成、可擴展和自動化的時間序列工作流 (AutoTS)。推理過程也以多種方式進行了優化,包括集成"},{"type":"link","attrs":{"href":"https:\/\/onnxruntime.ai\/","title":null,"type":null},"content":[{"type":"text","text":"ONNX runtime"}]},{"type":"text","text":"。。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖展示了在 BigDL 和 Ray 之上的 Chronos 架構。本節重點介紹 AutoTS 組件。AutoTS 框架使用 Ray Tune 作爲超參數搜索引擎(運行在 RayOnSpark 之上)。在自動數據處理中,搜索引擎爲預測任務選擇最佳回看值。在自動特徵工程中,搜索引擎會從各種特徵生成工具(例如,tsfresh)自動生成的一組特徵中選擇最佳特徵子集。在自動建模中,搜索引擎會搜索超參數,例如隱藏層的維度、學習率等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/be\/be1ea67e065c39bde1104266a84e916d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 3:Chronos 架構"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Chronos AutoTS 工作流的實操示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的代碼展示了,使用 Chronos 易用且高度集成的 AutoTS 工作流的時間序列預測流水線的訓練和推理過程。這個工作流利用TSDataset上簡單的API來執行一些典型的時間序列處理(例如,填充,縮放等)和特徵生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1.  "},{"type":"text","text":"import pandas as pd  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2.  "},{"type":"text","text":"from sklearn.preprocessing import StandardScaler  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3.  "},{"type":"text","text":"from bigdl.chronos.data import TSDataset  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"4.  "},{"type":"text","text":"  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"5.  "},{"type":"text","text":"# data initialization and split  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"6.  "},{"type":"text","text":"df = pd.read_csv(\"table.csv\")  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"7.  "},{"type":"text","text":"tsdata_train, tsdata_val, tsdata_test = TSDataset.from_pandas(df,  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"8.  "},{"type":"text","text":"                      dt_col=\"StartTime\",   "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"9.  "},{"type":"text","text":"                      target_col=\"AvgRate\",  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"10. "},{"type":"text","text":"                      with_split=True,  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"11. "},{"type":"text","text":"                      val_ratio=0.1)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"12. "},{"type":"text","text":"  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"13. "},{"type":"text","text":"# data processing and feature engineering  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"14. "},{"type":"text","text":"standard_scaler = StandardScaler()  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"15. "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"for"},{"type":"text","text":" tsdata in [tsdata_train, tsdata_val, tsdata_test]:  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"16. "},{"type":"text","text":"   tsdata.gen_dt_feature()\\  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"17. "},{"type":"text","text":"         .impute(mode=\"last\")\\  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"18. "},{"type":"text","text":"         .scale(standard_scaler, fit=(tsdata is tsdata_train))  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後用戶可以通過模型名稱(model)(內置模型名稱\/爲第 3 方模型創建函數)、回看值(past_seq_len)和預測步數(future_seq_len)來進行初始化AutoTSEstimator。該AutoTSEstimator在 Ray Tune 上運行搜索工序,每運行一次生成多個trials(每個trial具有不同的超參數和特徵子集組合),並把trials分佈在 Ray 集羣中。在所有trials完成後,根據目標指標檢索最佳超參數集、優化模型和數據處理工序,用於組成最終的 TSPipeline。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1.  "},{"type":"text","text":"from bigdl.chronos.autots import AutoTSEstimator  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2.  "},{"type":"text","text":"import bigdl.orca.automl.hp as hp  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3.  "},{"type":"text","text":"  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"4.  "},{"type":"text","text":"# create a AutoTSEstimator  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"5.  "},{"type":"text","text":"auto_estimator = AutoTSEstimator(model='tcn',  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"6.  "},{"type":"text","text":"                                 past_seq_len=hp.randint(50,100),  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"7.  "},{"type":"text","text":"                                 future_seq_len=1)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"8.  "},{"type":"text","text":"  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"9.  "},{"type":"text","text":"# fit on the AutoTSEstimator with HPO, auto feature, past_seq_len selector  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"10. "},{"type":"text","text":"ts_pipeline = auto_estimator.fit(data=tsdata_train,  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"11. "},{"type":"text","text":"                                 validation_data=tsdata_val)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TSPipeline可用於預測,評估和增量擬合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1.  "},{"type":"text","text":"# predict\/evaluate with TSPipeline  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2.  "},{"type":"text","text":"y_pred = ts_pipeline.predict(tsdata_test)  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3.  "},{"type":"text","text":"test_mse = ts_pipeline.evaluate(tsdata_test, metrics = ['mse'])  "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更多詳細信息,可參閱"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Chronos\/Overview\/chronos.html","title":null,"type":null},"content":[{"type":"text","text":"Chronos User Guide"}]},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用 Chronos AutoTS 進行 5G 網絡時間序列分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Chronos 已被廣泛應用於許多領域,例如電信和 AIOps。Capgemini Engineering 在其 5G 介質訪問控制 (MAC) 中利用 Chronos AutoML 工作流和推理優化來實現認知功能,作爲智能 RAN 控制器節點的一部分。在這個項目中,Chronos 用於預測 UE 的移動性,以幫助 MAC 調度程序在 2 個關鍵 KPI 上進行有效的鏈路自適應。通過 Chronos AutoTS,Capgemini Engineering將他們的模型更改爲我們內置的 TCN 模型並選用了更加適合的回看值,成功將 AI 準確率提高了 55%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詳細信息請參考"},{"type":"link","attrs":{"href":"https:\/\/networkbuilders.intel.com\/solutionslibrary\/intelligent-5g-l2-mac-scheduler-powered-by-capgemini-netanticipate-5g-on-intel-architecture","title":null,"type":null},"content":[{"type":"text","text":"白皮書"}]},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我們介紹了 BigDL 如何利用 Ray 及其庫爲大數據構建可擴展的 AI 應用程序(使用 RayOnSpark)、提高端到端 AI 開發效率(在 Ray Tune 之上使用 AutoML)以及構建特定領域的 AI 用例(例如使用 Chronos 進行自動時間序列分析)。BigDL 在其他方面也採用了 Ray,例如BigDL Orca 項目中正在使用Ray Train,用以跨大數據集羣無縫擴展單節點 Python notebook。我們還在探索其他用例,例如推薦系統、強化學習等,這些將利用在Ray上構建的AutoML功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.anyscale.com\/blog\/from-ray-to-chronos-build-end-to-end-ai-use-cases-using-bigdl-on-top-of-ray","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/www.anyscale.com\/blog\/from-ray-to-chronos-build-end-to-end-ai-use-cases-using-bigdl-on-top-of-ray"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章