從 Ray 到 Chronos：在 Ray 上使用 BigDL 構建端到端 AI 用例

原創

2021-11-18 18:23

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#7F7F7F","name":"user"}}],"text":"作者： "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#999999","name":"user"}}],"text":"Wesley Du, Junwei Deng, Kai Huang, Shan Yu and Shane Huang "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者是英特爾人工智能和分析團隊的解決方案架構師，該團隊一直致力於 BigDL的開發。數據科學家和數據工程師可以使用BigDL輕鬆構建端到端的分佈式 AI 應用。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.ray.io\/","title":null,"type":null},"content":[{"type":"text","text":"Ray"}]},{"type":"text","text":"是一個能夠非常快速和簡單地去構建分佈式應用的框架。"},{"type":"link","attrs":{"href":"https:\/\/github.com\/intel-analytics\/BigDL","title":null,"type":null},"content":[{"type":"text","text":"BigDL"}]},{"type":"text","text":"是一個在分佈式大數據上構建可擴展端到端 AI的開源框架，它能利用 Ray 及其本地庫（Native Libraries）來支持高級 AI 用例，如 AutoML 和自動時間序列分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這篇博客中，我們將介紹 BigDL 中的一些核心組件和展示 BigDL 如何利用 Ray 及其本地庫來構建底層基礎設施（例如 RayOnSpark、AutoML 等）以及這些將如何幫助用戶構建AI 應用（例如使用"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Chronos\/Overview\/chronos.html","title":null,"type":null},"content":[{"type":"text","text":"Chronos"}]},{"type":"text","text":" 進行自動時間序列分析）。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"RayOnSpark：在Apache Spark上無縫運行Ray程序"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ray是一個開源分佈式框架，允許用戶輕鬆高效地運行許多新興的人工智能應用，例如深度強化學習和自動化機器學習。BigDL 通過 RayOnSpark 可以將 Ray 無縫集成到大數據預處理流水線中，並已經在一些特定領域構建了多個高級的端到端 AI 應用（例如 AutoML 和 Chronos）。RayOnSpark 在基於Apache Spark的大數據集羣（例如 Apache Hadoop* 或 Kubernetes* 集羣）之上運行 Ray 的程序，這樣一來在內存中的Spark DataFrame可以直接傳輸到 Ray 程序中用於高級 AI 應用。因此藉助 RayOnSpark，用戶就可以在生產環境現有的大數據集羣上直接嘗試各種新興的人工智能應用。此外，RayOnSpark 能將Ray 的程序無縫集成到 Apache Spark 數據處理的流水線中，並直接在內存中的DataFrame 上運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5f\/5f86f91bfe18759591504c02213779b4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 1：RayOnSpark 架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 1展示了 RayOnSpark 的架構。在 Spark的實現中，Spark 程序會在 driver 節點上創建SparkSession對象，其中SparkContext 會負責在集羣上啓動多個 Spark executors以運行 Spark 任務。在 RayOnSpark 中，在Spark driver 節點上會額外創建一個RayContext對象，該對象會在同一集羣中伴隨每個Spark executor一起自動啓動 Ray 進程。RayContext同時會在每個Spark executor內部創建一個RayManager來管理 Ray 進程（例如，在程序退出時自動關閉進程）。下面的代碼塊演示了用戶如何在初始化 RayOnSpark 後，直接在標準 Spark 應用程序中編寫 Ray 代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1. "},{"type":"text","text":"import ray "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2. "},{"type":"text","text":"from bigdl.orca import init_orca_context "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3. "},{"type":"text","text":"from bigdl.orca.ray import RayContext "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"4. "},{"type":"text","text":"# Initialize SparkContext on the underlying cluster (e.g. the Hadoop\/Yarn cluster) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"5. "},{"type":"text","text":"sc = init_orca_context(cluster_mode=\"yarn\", cores=...,memory=...,num_nodes=...) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"6. "},{"type":"text","text":"# Initialize RayContext and launch Ray under the same cluster. "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"7. "},{"type":"text","text":"ray_ctx = RayContext(sc, object_store_memory=...,...) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"8. "},{"type":"text","text":"ray_ctx.init() "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"9. "},{"type":"text","text":"@ray.remote "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"10. "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"class"},{"type":"text","text":" Counter(object) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"11. "},{"type":"text","text":" def __init__(self): "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"12. "},{"type":"text","text":" self.n = 0 "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"13. "},{"type":"text","text":" def increment(self): "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"14. "},{"type":"text","text":" self.n += 1 "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"15. "},{"type":"text","text":" "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"return"},{"type":"text","text":" self.n "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"16. "},{"type":"text","text":"# The Ray actors are created across the big data cluster "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"17. "},{"type":"text","text":"counters = [Counter.remote() "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"for"},{"type":"text","text":" i in range(5)] "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"18. "},{"type":"text","text":"ray.get([c.increment.remote() "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"for"},{"type":"text","text":" c in counters]) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"19. "},{"type":"text","text":"ray_ctx.stop() "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"20. "},{"type":"text","text":"sc.stop() "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 2：RayOnSpark 的示例代碼"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"AutoML (orca.automl)：使用 Ray Tune爲AI 應用程序輕鬆調參"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在機器學習或深度學習模型的準確性、性能等方面，超參數優化 (HPO) 對於數據科學家實現其目標非常重要。但是手動對超參數進行調優可能十分耗時且結果也並不能令人滿意。與此同時，分佈式超參數優化編程也是一個具有挑戰性的工作。Ray Tune是一個用於深度學習可擴展的超參數優化框架。BigDL 引入了構建在Ray Tune之上的 AutoML 功能（orca.automl），可以讓數據科學家的工作更輕鬆。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"orca.automl介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多情況下，數據科學家更願意在筆記本電腦上對他們的 AI 應用程序進行原型設計、調試和調參，如果可以將相同的代碼完整地遷移到集羣中並直接運行，這將大大提高端到端的生產力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BigDL 的 Orca 項目可幫助用戶將他們的代碼從筆記本電腦無縫擴展到大數據集羣。此外，BigDL 的 orca.automl 充分利用了 RayOnSpark 和 Ray Tune，提供了一個名爲"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/QuickStart\/orca-autoestimator-pytorch-quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"AutoEstimator"}]},{"type":"text","text":"的分佈式超參數調優 API 。得益於Ray Tune與框架無關的特性，AutoEstimator 同時適用於 PyTorch 和 TensorFlow 模型。用戶可以在他們的筆記本電腦、本地服務器、K8s 集羣、Hadoop\/YARN 集羣等上，用一致的方式對他們的模型進行調參。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"憑藉這些特性，BigDL 中的 orca.automl 可用於許多 AI 應用的自動化調優（包括模型、超參數等）。例如，我們使用 BigDL 的 orca.automl 實現了 AutoXGBoost（XGBoost with HPO）用以自動擬合和優化 XGBoost 模型。相比 Nvidia A100 上的類似解決方案，使用 AutoXGBoost 的訓練速度提高了約 1.7 倍，最終模型更加準確。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更多詳情，可參閱"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/intel-analytics-software\/scalable-autoxgboost-using-analytics-zoo-automl-30d576cb138a","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/medium.com\/intel-analytics-software\/scalable-autoxgboost-using-analytics-zoo-automl-30d576cb138a"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設計細節，可參閱"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/Overview\/distributed-tuning.html","title":null,"type":null},"content":[{"type":"text","text":"orca.automl User Guide"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際操作，可參閱"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/QuickStart\/orca-autoxgboost-quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"AutoXGBoost Quick Start"}]},{"type":"text","text":" 或者 "},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Orca\/QuickStart\/orca-autoestimator-pytorch-quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"Auto Tuning for arbitrary models"}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Chronos：在Ray上使用 AutoTS構建自動時間序列分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還開發了一個爲自動時間序列分析的應用框架，稱爲 "},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Chronos\/Overview\/chronos.html","title":null,"type":null},"content":[{"type":"text","text":"Chronos"}]},{"type":"text","text":"。它基於orca.automl在自動分析期間進行超參數優化。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼我們需要 Chronos？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時間序列（TS）分析現在被廣泛的應用於各個領域（例如電信中的網絡質量分析、數據中心運營的日誌分析、高價值設備的預測性維護等），並且變得越來越重要。在最爲常用的預測與檢測領域，傳統統計學方法在準確性與靈活性上都面臨巨大的挑戰，深度學習方法通過將時間序列任務視爲序列建模問題，在多個領域獲得了成功。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但在另一方面，爲時間序列預測\/檢測構建機器學習應用程序可能是一個費力且需要很多專業知識的過程。超參數設置、預處理和特徵工程都可能成爲影響深度學習模型表現的瓶頸。爲了提供一個高效、強大且易用的時間序列分析工具箱，我們推出了Chronos，這是一個用於構建大規模時間序列分析應用程序的框架。它可以使用 AutoML 並進行分佈式訓練，因爲它建立在 Ray Tune、Ray Train和 RayOnSpark 之上。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Chronos 架構介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Chronos 具有多個 (10+) 用於時間序列預測、檢測和模擬的內置深度學習和機器學習模型，以及大量 (70+) 數據處理和特徵工程工具。用戶可以自己調用獨立的算法和模型（預測器（Forecasters）, 檢測器（Detectors）, 模擬器（Simulators））以獲得最高的靈活性，或者使用我們高度集成、可擴展和自動化的時間序列工作流 (AutoTS)。推理過程也以多種方式進行了優化，包括集成"},{"type":"link","attrs":{"href":"https:\/\/onnxruntime.ai\/","title":null,"type":null},"content":[{"type":"text","text":"ONNX runtime"}]},{"type":"text","text":"。。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖展示了在 BigDL 和 Ray 之上的 Chronos 架構。本節重點介紹 AutoTS 組件。AutoTS 框架使用 Ray Tune 作爲超參數搜索引擎（運行在 RayOnSpark 之上）。在自動數據處理中，搜索引擎爲預測任務選擇最佳回看值。在自動特徵工程中，搜索引擎會從各種特徵生成工具（例如，tsfresh）自動生成的一組特徵中選擇最佳特徵子集。在自動建模中，搜索引擎會搜索超參數，例如隱藏層的維度、學習率等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/be\/be1ea67e065c39bde1104266a84e916d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 3：Chronos 架構"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Chronos AutoTS 工作流的實操示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的代碼展示了，使用 Chronos 易用且高度集成的 AutoTS 工作流的時間序列預測流水線的訓練和推理過程。這個工作流利用TSDataset上簡單的API來執行一些典型的時間序列處理（例如，填充，縮放等）和特徵生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1. "},{"type":"text","text":"import pandas as pd "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2. "},{"type":"text","text":"from sklearn.preprocessing import StandardScaler "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3. "},{"type":"text","text":"from bigdl.chronos.data import TSDataset "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"4. "},{"type":"text","text":" "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"5. "},{"type":"text","text":"# data initialization and split "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"6. "},{"type":"text","text":"df = pd.read_csv(\"table.csv\") "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"7. "},{"type":"text","text":"tsdata_train, tsdata_val, tsdata_test = TSDataset.from_pandas(df, "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"8. "},{"type":"text","text":" dt_col=\"StartTime\", "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"9. "},{"type":"text","text":" target_col=\"AvgRate\", "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"10. "},{"type":"text","text":" with_split=True, "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"11. "},{"type":"text","text":" val_ratio=0.1) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"12. "},{"type":"text","text":" "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"13. "},{"type":"text","text":"# data processing and feature engineering "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"14. "},{"type":"text","text":"standard_scaler = StandardScaler() "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"15. "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#006699","name":"user"}},{"type":"strong"}],"text":"for"},{"type":"text","text":" tsdata in [tsdata_train, tsdata_val, tsdata_test]: "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"16. "},{"type":"text","text":" tsdata.gen_dt_feature()\\ "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"17. "},{"type":"text","text":" .impute(mode=\"last\")\\ "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"18. "},{"type":"text","text":" .scale(standard_scaler, fit=(tsdata is tsdata_train)) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後用戶可以通過模型名稱（model）（內置模型名稱\/爲第 3 方模型創建函數）、回看值（past_seq_len）和預測步數（future_seq_len）來進行初始化AutoTSEstimator。該AutoTSEstimator在 Ray Tune 上運行搜索工序，每運行一次生成多個trials（每個trial具有不同的超參數和特徵子集組合），並把trials分佈在 Ray 集羣中。在所有trials完成後，根據目標指標檢索最佳超參數集、優化模型和數據處理工序，用於組成最終的 TSPipeline。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1. "},{"type":"text","text":"from bigdl.chronos.autots import AutoTSEstimator "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2. "},{"type":"text","text":"import bigdl.orca.automl.hp as hp "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3. "},{"type":"text","text":" "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"4. "},{"type":"text","text":"# create a AutoTSEstimator "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"5. "},{"type":"text","text":"auto_estimator = AutoTSEstimator(model='tcn', "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"6. "},{"type":"text","text":" past_seq_len=hp.randint(50,100), "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"7. "},{"type":"text","text":" future_seq_len=1) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"8. "},{"type":"text","text":" "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"9. "},{"type":"text","text":"# fit on the AutoTSEstimator with HPO, auto feature, past_seq_len selector "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"10. "},{"type":"text","text":"ts_pipeline = auto_estimator.fit(data=tsdata_train, "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"11. "},{"type":"text","text":" validation_data=tsdata_val) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TSPipeline可用於預測，評估和增量擬合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"1. "},{"type":"text","text":"# predict\/evaluate with TSPipeline "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"2. "},{"type":"text","text":"y_pred = ts_pipeline.predict(tsdata_test) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#5C5C5C","name":"user"}}],"text":"3. "},{"type":"text","text":"test_mse = ts_pipeline.evaluate(tsdata_test, metrics = ['mse']) "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更多詳細信息，可參閱"},{"type":"link","attrs":{"href":"https:\/\/bigdl.readthedocs.io\/en\/latest\/doc\/Chronos\/Overview\/chronos.html","title":null,"type":null},"content":[{"type":"text","text":"Chronos User Guide"}]},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用 Chronos AutoTS 進行 5G 網絡時間序列分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Chronos 已被廣泛應用於許多領域，例如電信和 AIOps。Capgemini Engineering 在其 5G 介質訪問控制 (MAC) 中利用 Chronos AutoML 工作流和推理優化來實現認知功能，作爲智能 RAN 控制器節點的一部分。在這個項目中，Chronos 用於預測 UE 的移動性，以幫助 MAC 調度程序在 2 個關鍵 KPI 上進行有效的鏈路自適應。通過 Chronos AutoTS，Capgemini Engineering將他們的模型更改爲我們內置的 TCN 模型並選用了更加適合的回看值，成功將 AI 準確率提高了 55%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詳細信息請參考"},{"type":"link","attrs":{"href":"https:\/\/networkbuilders.intel.com\/solutionslibrary\/intelligent-5g-l2-mac-scheduler-powered-by-capgemini-netanticipate-5g-on-intel-architecture","title":null,"type":null},"content":[{"type":"text","text":"白皮書"}]},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中，我們介紹了 BigDL 如何利用 Ray 及其庫爲大數據構建可擴展的 AI 應用程序（使用 RayOnSpark）、提高端到端 AI 開發效率（在 Ray Tune 之上使用 AutoML）以及構建特定領域的 AI 用例（例如使用 Chronos 進行自動時間序列分析）。BigDL 在其他方面也採用了 Ray，例如BigDL Orca 項目中正在使用Ray Train，用以跨大數據集羣無縫擴展單節點 Python notebook。我們還在探索其他用例，例如推薦系統、強化學習等，這些將利用在Ray上構建的AutoML功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.anyscale.com\/blog\/from-ray-to-chronos-build-end-to-end-ai-use-cases-using-bigdl-on-top-of-ray","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/www.anyscale.com\/blog\/from-ray-to-chronos-build-end-to-end-ai-use-cases-using-bigdl-on-top-of-ray"}]}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

03-爲啥大模型LLM還沒能完全替代你？

1 不具備記憶能力的它是零狀態的，我們平常在使用一些大模型產品，尤其在使用他們的API的時候，我們會發現那你和它對話，尤其是多輪對話的時候，經過一些輪次後，這些記憶就消失了，因爲它也記不住那麼多。 2 上下文窗口的限制大模型對其inpu

2024-04-23 01:07:00

WhaleScheduler爲銀行業全信創環境打造統一調度管理平臺解決方案

項目背景數字金融是數字經濟的重要支撐和驅動力。近年來，我國針對數字金融的發展政策頻頻出臺，《金融科技發展規劃（2022-2025年）》、《“十四五”數字經濟發展規劃》、《關於銀行業保險業數字化轉型的指導意見》、《金融標準化“十四五”

2024-04-19 21:18:25

入職3年-我如何做一名AI產品經理

引言從2021年校招加入京東開始，我一直從事AI產品經理的工作，有幸見證了AI行業的熱情從一臺臺服務器燒到了全世界各個角落，也見證了京東AI中臺團隊的影響力如何一步步的擴大。從21年的迷茫到24年的堅定，很慶幸我正走在適合自己的道路上，

2024-04-22 11:16:31

01-大語言模型發展

AI大模型的相關的一些基礎知識，一些背景和基礎知識。多模型強應用AI 2.0時代應用開發者的機會。 0 大綱 AI產業的拆解和常見名詞應用級開發者，在目前這樣一個大背景下的一個職業上面的一些機會實戰部分的，做這個agent，即所謂智

2024-04-22 01:12:50

用戶行爲分析模型實踐（四）—— 留存分析模型

作者：vivo 互聯網大數據團隊- Wu Yonggang、Li Xiong 本文是vivo互聯網大數據團隊《用戶行爲分析模型實踐》系列文章第4篇 -留存分析模型。本文詳細介紹了留存分析模型的概念及基本原理，並

2024-04-19 11:26:00

京東內部研效架構師訓練營，首次對外公開課，不可錯過的研效之旅！

五月繁花似錦，讓我們帶你走進京東，開啓研效實戰之旅！四大單位聯合發起本次活動由“全國雲計算技術行業產教融合共同體”發起，聯合工業和信息化部電子第五研究所、E³CI軟件研發效能度量工作委員會、京東雲共同主辦，重磅推出“卓越研效架構師”

京東雲開發者

2024-04-19 11:16:30

軟件測試從自動化到智能化，大模型開始加入

隨着科技的飛速發展，軟件行業也在不斷地演進和創新。作爲軟件行業的關鍵環節之一，軟件測試行業也在經歷着前所未有的變革。從最初的手動測試，到自動化測試，再到如今的智能化測試，軟件測試行業正在經歷一場深刻的技術革命。在這場革命中，Testin雲測

2024-04-19 00:53:25

Xmake v2.9.1 發佈，新增 native lua 模塊和鴻蒙系統支持

Xmake 是一個基於 Lua 的輕量級跨平臺構建工具。它非常的輕量，沒有任何依賴，因爲它內置了 Lua 運行時。它使用 xmake.lua 維護項目構建，相比 makefile/CMakeLists.txt，配置語法更加簡潔直觀，

2024-04-23 12:10:57

MyDumper “喜歡” 觸發器麼？

是的，但現在它更“喜歡”它們，原因如下。介紹使用 LIKE 子句過濾特定表中的觸發器或視圖很常見。但是，它可能會欺騙您，特別是如果您看不到輸出（即在非交互式會話中）。讓我們看一個簡單的例子，以及如何以更可靠的方式處理任務。還有一個指向

2024-04-22 23:19:50

雲原生週刊：Kubernetes v1.30 發佈｜ 2024.4.22

開源項目推薦 pv-migrate pv-migrate 是一個 CLI 工具/kubectl 插件，可輕鬆將一個 Kubernetes 的內容遷移 PersistentVolumeClaim 到另一個 Kubernetes。 Claudi

2024-04-22 22:46:27

活動回顧丨雲原生開源開發者沙龍北京站回放 & PPT 下載

“零信任架構” 是一種安全概念，它要求在任何時候不對任何請求默認信任，無論它的來源內部還是外部。服務安全性已成爲企業的核心關切，4 月 13 日，雲原生開源開發者沙龍在北京順利開展。阿里雲一線工程師圍繞《微服務面臨的安全挑戰、趨勢與解決方

2024-04-22 21:12:01

BizDevOps全局建設思路：橫向串聯，縱向深化

直達原文：橫向串聯，縱向深化，BizDevOps全局建設思路 01 BizDevOps概述 IT技術交付實踐方法在不斷迭代中持續優化。在工業化時代，Biz（業務）、Dev（開發）、Ops（運維）三者往往相對分離，甚至有時只有其中的兩者或僅

2024-04-19 23:22:54

寫給職場新人｜從迷茫到屢獲殊榮的技術人成長之路

在這個時代，技術的每一次飛躍都在重塑我們的工作方式。借Up技術人專欄活動寫了這篇文章，回望一下我和計算機打交道的這幾年，希望能給學生或職場新人們一些幫助。 1.錨定方向：學生生涯的一次探險如果用一個詞語概括我的本科階段，那大概就

2024-04-19 23:17:23

工程中實踐的微服務設計模式

最近在讀《微服務架構設計模式》，開始的時候我非常的好奇，因爲在我印象中，設計模式是常說的那23種設計模式，而微服務的設計模式又是什麼呢？這個問題也留給大家，在文末我會附上我對這個問題的理解。本次文章的內容主要是工作中對微服務設計模式的應

2024-04-19 23:17:23

Create 2024 分論壇：百度大模型安全解決方案護航開發者一起創造未來

4月16日，百度Create AI開發者大會在深圳國際會展中心（寶安）舉行，大會以“創造未來”爲主題，匯聚了當前科技和產業革命中的開發者先鋒力量。自去年3月16日發佈知識增強大語言模型文心一言以來，百度不斷推動文心大模型的升級迭代，每一次版

2024-04-19 21:33:25

24小時熱門文章

最新文章

從 Ray 到 Chronos：在 Ray 上使用 BigDL 構建端到端 AI 用例

最新評論文章