基於Horovod on Ray的彈性深度學習

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"user"}},{"type":"strong"}],"text":"本文最初發佈於優步工程網站,經授權由InfoQ中文站翻譯並分享。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"引言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2017年我們推出了"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/horovod\/","title":null,"type":null},"content":[{"type":"text","text":"Horovod"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",這是一個開源框架,可將深度學習訓練任務並行擴展到數百個GPU上。當時,優步的大多數深度學習用例都和自動駕駛汽車的研發有關,而在"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/michelangelo-machine-learning-platform\/","title":null,"type":null},"content":[{"type":"text","text":"米開朗基羅"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"平臺上,絕大多數生產級機器學習模型都是基於"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/productionizing-distributed-xgboost\/","title":null,"type":null},"content":[{"type":"text","text":"XGBoost"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的樹模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如今已經是2021年,深度學習領域出現了很多變化。對於表格數據問題,深度學習模型相對於基於樹的模型的性能優勢越來越大,並且越來越多的深度學習模型在從研究轉移到生產級別。因此,我們不得不重新審視現有的深度學習平臺,以適應不斷增長的需求和新的要求:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"自動縮放和容錯"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"超參數搜索"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"統一計算基礎架構"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"自動縮放和容錯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們之前的深度學習平臺分配了一個固定大小的Horovod羣集來訓練單個模型。在本地運行時,用戶經常會發現自己由於資源不足而無法開始大型訓練作業。我們提供了在雲數據中心進行訓練的選項,但由於平臺缺乏自動縮放能力,意味着只能在專用實例上運行作業,其成本通常是在可搶佔或“競價(spot)”實例上運行成本的3-5倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"就算資源容量不成問題,容錯能力也常常存在瓶頸。隨着作業訓練的epoch增多,動用的機器越來越多,失敗的機率也會增加。密集的檢查點可以幫助緩解這種情況,但是需要大量額外的定製平臺工具鏈來提供支持。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Elastic Horovod"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在Horovod v0.20中我們引入了"},{"type":"link","attrs":{"href":"https:\/\/github.com\/horovod\/horovod\/releases\/tag\/v0.20.0","title":null,"type":null},"content":[{"type":"text","text":"Elastic Horovod"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。它提供了分佈式訓練的能力,可以在整個訓練過程中動態地擴展worker的數量。只需在現有的Horovod訓練腳本中添加幾行代碼,當運行作業的機器加入或退出時作業都可以繼續訓練,幾乎不會中斷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從API的抽象級別來看,Elastic Horovod解決了自動縮放訓練過程的問題,但並沒有對其操作化。我們將Elastic Horovod編寫爲適用於任何特定編排系統或雲提供商的通用平臺,這意味着以下組件要留作待實現的接口:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"發現"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"可以添加到訓練作業(或應從訓練作業中刪除)的主機"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"當額外worker可用且作業可以利用它們時"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"請求資源"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們一開始的假設是,我們要在每個主流雲提供商(例如AWS、Azure、GCP)或編排系統(例如Kubernetes、Peloton)各自的組件中實現這些接口,以支持內部和開源用戶。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"結果我們發現,已經有一個開源解決方案可以解決多雲分佈式計算的問題了:它就是"},{"type":"link","attrs":{"href":"https:\/\/ray.io\/","title":null,"type":null},"content":[{"type":"text","text":"Ray"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Elastic Horovod on Ray"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Ray是一個用於並行和分佈式編程的分佈式執行引擎。Ray由加州大學伯克利分校開發,最初的目的是使用一個簡單的基於類\/函數的Python API來擴展機器學習負載和實驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"自誕生以來,Ray生態系統已發展爲可用於在雲上訓練ML模型的一系列功能和工具組合,包括用於分佈式超參數調優的"},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/tune\/index.html","title":null,"type":null},"content":[{"type":"text","text":"Ray Tune"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"、用於羣集配置的"},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/cluster\/index.html","title":null,"type":null},"content":[{"type":"text","text":"Ray Cluster Launcher"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",和基於負載的自動縮放("},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/cluster\/quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"load-based autoscaling"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")。現在,Ray還集成了多種機器學習庫,如RLLib、XGBoost和PyTorch。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"藉助新的"},{"type":"link","attrs":{"href":"https:\/\/horovod.readthedocs.io\/en\/stable\/ray_include.html#elastic-ray-executor","title":null,"type":null},"content":[{"type":"text","text":"ElasticRayExecutor"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" API,Horovod能夠利用Ray來簡化底層主機的發現和編排工作。要使用Ray搭配Horovod來實現彈性訓練,你首先需要啓動一個包含多個節點(Ray羣集)的Ray運行時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Ray運行時\/羣集"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":Ray程序能夠利用一個底層的Ray運行時完成並行化和分發工作。可以在一個或多個節點上啓動這個"},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/starting-ray.html","title":null,"type":null},"content":[{"type":"text","text":"Ray運行時"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",從而形成一個Ray羣集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Ray打包了一個"},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/cluster\/cloud.html#launching-cloud-clusters","title":null,"type":null},"content":[{"type":"text","text":"輕量級羣集啓動器"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",該啓動器可簡化任何雲(AWS、Azure、GCP甚至是Kubernetes和YARN等羣集管理器)上的羣集配置。這個羣集啓動器根據給定的羣集配置來配置羣集,如以下示例所示:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ab\/abde8643e8768bedff63d68b709301d8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"要在AWS上啓動一個Ray羣集,你只需將以上配置另存爲`cfg.yaml`,然後調用`rayupcfg.yaml`即可。在上面的示例中,請注意頭節點是一個純CPU節點,而worker是GPU可搶佔實例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"此外,可以將Ray羣集配置爲“自動縮放”,這意味着Ray可以在不同的負載要求下透明地配置新節點。例如,如果一個Ray程序在應用程序運行中需要一個GPU,則Ray可以配置一個便宜的GPU實例,在該實例上運行Ray任務或actor,然後在完成後終止該實例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"你可以查看Ray文檔中關於在AWS\/GCP上創建一個自動縮放Ray程序的"},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/cluster\/quickstart.html","title":null,"type":null},"content":[{"type":"text","text":"示例"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"ElasticRayExecutor API"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":ElasticRayExecutor可以利用自動縮放的Ray羣集來簡化主機的發現過程,並在適當情況下請求更多資源。要使用這個API,需要定義一個訓練函數,其中包括一個使用"},{"type":"link","attrs":{"href":"https:\/\/horovod.readthedocs.io\/en\/stable\/elastic_include.html#modifying-the-training-script-with-state-synchronization","title":null,"type":null},"content":[{"type":"text","text":"Horovod Elastic狀態同步裝飾器"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的更高級別的函數:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/61\/61704f4bdfd65b3a53491459c5318de7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"然後,你可以附加到這個底層Ray羣集並執行訓練函數:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3d\/3d9602c8ec79bc3b238e277ca53226be.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在這個示例中,ElasticRayExecutor將創建多個GPU worker,每個worker都參與一個數據並行訓練例程。Worker的數量將由羣集中GPU worker節點的數量自動確定——這意味着即使節點因爲搶佔而被隨意移除了,訓練也不會停止。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如你所見,Ray與Horovod的集成產生了一個解決方案,其簡化了在主流雲提供商之間使用Horovod運行彈性訓練作業的操作。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"基準測試和實驗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在考慮採用彈性訓練技術時,對我們來說最大的未知數之一就是收斂性。具體來說,我們不確定在訓練期間可以多久調整一次作業的規模,能調整到什麼程度,我們是否需要在訓練期間進行任何調整以消除在運行時增加和減少worker數量的影響?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了衡量動態調整worker數量對模型收斂的影響,我們使用AWS的8個v100 GPU(p3.2xlarge實例),在Cifar10數據集上運行了三個訓練ResNet50模型的作業,固定90個epoch。我們換了3種實驗配置,調整了上下增減worker的頻率以及每次添加\/刪除worker的數量:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"固定8個沒有自動縮放的GPU worker("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"8gpu_fixed"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最初8個GPU worker,每60秒增加或減少1個("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"8gpu_60_1"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最初8個GPU worker,每180秒增加或減少3個("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"8gpu_180_3"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於每次調整事件,如果worker數量已經達到最大值(8),我們都會減少數量;如果worker數量達到最小值(2),則我們都會增加數量。否則,我們會隨機以相等的概率增加或減少worker的數量。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d4\/d41e0e45d90e790a8834613656d73de0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"圖1:驗證精度對epoch數的函數(左)和實驗中每個epoch末尾對應的worker數(右)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如以上結果所示,我們觀察到隨着我們增加調整事件的幅度(通過一次添加\/刪除的主機數量來衡量)並按比例降低調整頻率,各個epoch之間模型性能的總體差異增加了,相對於基線的整體模型泛化性能得到了實際改進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"彈性訓練的另一個好處是,當調整事件的時機可以通過訓練過程控制時,它甚至可以用來減少訓練過程中的總體差異。如Smith等人在《"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1711.00489","title":null,"type":null},"content":[{"type":"text","text":"不要降低學習速度,而是要增加batch大小"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"》中所述,我們可以利用以下事實:增加batch大小會導致模擬退火效果,從而使模型訓練從較早的探索階段過渡到最終的利用階段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在實踐中,與保持訓練中worker數量不變的方法相比,通過增加worker數量來按比例擴大batch大小的這一過程讓模型能夠更快收斂,往往還有更高的泛化精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了證明這種效果,我們進行了另一組實驗:重複上面的訓練過程,但有兩個新配置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"固定2個GPU worker("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"2gpu_fixed"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"動態worker數量,每30個epoch加倍,從2開始,以8個worker結束("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"8gpu_exp"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")。"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fd\/fde47d9ce32150d3b02c0be721e531ec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"圖2:驗證精度對epoch數的函數(左)和相對掛鐘時間(從實驗開始算起,以秒爲單位)的函數(右)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"正如預期的那樣,將worker數從8減少到2改善了整體模型的收斂。這引出了數據並行分佈式訓練中,當模型針對較小的batch大小做優化時存在的一個的常見陷阱。實踐中,建議使用學習率預熱\/衰減、超參數搜索(見下文)和"},{"type":"link","attrs":{"href":"https:\/\/horovod.readthedocs.io\/en\/latest\/adasum_user_guide_include.html","title":null,"type":null},"content":[{"type":"text","text":"Adasum"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"等技術來抵消這些影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上面展示的第三個實驗說明了通過大量並行來實現良好收斂的另一種解決方案:隨着時間的推移逐步擴展並行度。這種方法不僅獲得了比2 GPU基線更少的總掛鐘時間,而且還以更高的驗證精度完成了這項工作!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"正如Smith等人在上述論文中所解釋的那樣,其背後的原理是,訓練過程將從最初的“探索”階段過渡到最後的“利用”階段,進而受益。增加並行度會增加batch大小,這有助於平滑訓練示例之間的差異,從而減少訓練後期的探索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"可以使用Horovod on Ray的Callback API將此功能添加到Elastic Horovod on Ray中:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/81\/8184cd7514fdae175e8680d778148616.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這些回調還可以用來簡化訓練的其他一些部分,比如將消息從worker轉發到驅動程序來簡化日誌記錄和指標跟蹤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"與超參數搜索結合使用時,這種方法可以提供最多的改善。與模擬退火對比(用於縮放batch大小)相似,許多現代超參數搜索算法也遵循“探索\/利用”範式,該範式可與彈性訓練結合使用,以實現最佳模型性能和最佳資源利用率。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"超參數搜索"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在開發深度學習模型的過程中,用戶在進行大規模訓練時經常需要重新調整超參數,因爲許多超參數在較大規模上會表現出不同的行爲。在自動縮放羣集中進行訓練時,這一點更加重要,因爲要考慮其他一些可能會影響訓練吞吐量、收斂性和成本的超參數,其中包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"多長時間增加一次最大worker數\/有效batch大小"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"多久提交一次共享worker狀態,以在最短的時間內獲得最多的epoch"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們允許作業一次輸入\/刪除多少worker"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"使用Horovod大規模調整超參數通常會有些棘手。執行並行超參數搜索需要一個單獨的更高級別的系統,以在可用資源上協調和調度多節點Horovod訓練作業。米開朗基羅內部支持的"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/scaling-michelangelo\/","title":null,"type":null},"content":[{"type":"text","text":"AutoTune"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"服務解決了編排長期運行的超參數搜索作業的一些基本挑戰,但不支持基於羣體的訓練和提早停止策略,這些方法本應能讓我們重新分配GPU資源以加速性能最佳的試驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Ray是基於對嵌套並行性的支持而構建的——這意味着它能夠輕鬆處理“啓動天然的分佈式任務”的分佈式程序。利用這一優勢,我們開發了一個Horovod+RayTune"},{"type":"link","attrs":{"href":"https:\/\/horovod.readthedocs.io\/en\/stable\/hyperparameter_search_include.html","title":null,"type":null},"content":[{"type":"text","text":"集成"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",以實現分佈式訓練的並行超參數調整。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2c\/2cb4ff84b0aca430740ee7013eec555c.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"使用Horovod+RayTune進行帶有嵌套超參數並行調整的分佈式訓練的示意圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"RayTune是打包進Ray的流行超參數調整庫。RayTune包含一些最新的超參數搜索算法(例如基於羣體的訓練、貝葉斯優化和超頻帶),還支持故障處理,因此用戶可以更好地利用模型性能與雲成本之間的取捨來做超參數調整。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ec\/ec43c48840f52cb7b649412e7755562a.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在時間限制下使用動態資源分配來改善模型訓練性能的圖形示意。像HyperSched這樣的算法通過動態分配更多並行資源來逐漸將探索減少到零,從而更深入地利用更少的試驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"正如Liaw等人在《"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/2001.02338","title":null,"type":null},"content":[{"type":"text","text":"HyperSched:在最後期限前完成模型開發的動態資源重新分配"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"》中所展示的那樣,通過這種方式將超參數搜索與分佈式訓練相結合,可以讓我們做到充分優化,在固定的時間內和計算資源內找到最佳模型。這對於像優步這樣的組織來說特別有用,在該優步大多數訓練都是在固定大小的本地GPU羣集上進行的。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"下一步:統一用於機器學習和深度學習的計算基礎架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們將Elastic Horovod with Ray和RayTune集成的早期結果表明,Ray這種將複雜的分佈式計算系統腳本化爲統一工作流程的方式,具有很好的靈活性和易用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除了我們先前討論的挑戰之外,機器學習平臺通常還需要集成幾種不同的技術,例如SparkML、XGBoost、SKLearn和RLlib等。這些技術運行在不同的計算平臺上。例如在優步,ML負載可以在通用的容器化計算基礎設施(Peloton、Kubernetes)、Spark(YARN、Peloton)和Ray(Peloton、Kubernetes)等平臺上運行。結果,生產級ML管道經常被分解成許多不同的任務,並由諸如Airflow或Kubeflow Pipelines等管道編排器進行編排。這增加了平臺的軟件和運維複雜性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"過去,我們投資創建了很多定製系統來配置和運行深度學習工作流,但與Horovod on Ray相比它們存在許多缺點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"可擴展性"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":因爲內部系統是爲特定應用而非通用計算而設計的,所以要增加對Elastic Horovod之類框架的支持,需要對代碼庫進行重新配置,並自行實現Ray中提供的那種自動縮放器。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"靈活性"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":爲運行分佈式深度學習而構建的系統無法輕鬆適應其他負載,例如數據處理(Dask、Modin)、超參數搜索(RayTune)或強化學習(RLlib)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"維護"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":與Ray這樣的開源系統不同,我們的內部深度學習基礎架構必須由專門的工程師團隊維護,而他們的時間和精力可以用來解決更高級別的問題。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通過在Ray上整合更多的深度學習棧,我們可以進一步優化深度學習工作流程中的更多端到端過程。例如,當前我們在特徵工程(ApacheSpark)和分佈式訓練(Horovod)之間存在清晰的界限。對於工作流的每個階段,我們必須提供需要獨立運行的單獨計算基礎架構,並將Spark流程的輸出要素化到磁盤上,以便Horovod流程使用。當我們希望爲工作流的不同階段尋找替代框架時發現,這種架構不僅很難維護,而且替換起來同樣困難。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"能夠切換不同的分佈式框架是Ray的核心優勢之一。由於Ray是通用的分佈式計算平臺,因此Ray的用戶可以"},{"type":"link","attrs":{"href":"https:\/\/www.anyscale.com\/blog\/data-processing-support-in-ray","title":null,"type":null},"content":[{"type":"text","text":"自由選擇"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"越來越多的分佈式數據處理框架(包括Spark),這些框架在Ray爲深度學習工作流提供的相同資源上運行。這簡化了我們的計算基礎架構,因爲Spark和Horovod負載之間不再存在上述的嚴格界限。實際上,根據數據量、計算強度和可用羣集資源,針對不同的負載有不同的最優框架。此外,某些框架比其他框架更容易集成到現有項目中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們利用這些能力增強的一個項目案例是"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/introducing-ludwig\/","title":null,"type":null},"content":[{"type":"text","text":"Ludwig"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",這是優步開發的開源深度學習AutoML框架。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"過去,由於Ludwig依賴於Pandas框架進行數據處理,因此僅限於處理適合單臺計算機內存的數據集。現在,在即將到來的Ludwig 0.4版本中,我們將在"},{"type":"link","attrs":{"href":"https:\/\/docs.ray.io\/en\/master\/dask-on-ray.html","title":null,"type":null},"content":[{"type":"text","text":"Dask on Ray"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上進行分佈式大內存數據預處理,在Horovod on Ray上進行分佈式訓練,並用RayTune進行超參數優化。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/01\/011a832c11b3ac4680368f7e119c325f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Ludwig在本地模式下運行(v0.4之前的版本):所有數據都需要容納在一臺計算機上的內存中。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a4\/a43f56ca9763f295c98aa7aebaca123d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Ludwig在Ray羣集上運行(版本v0.4):Ray擴展了預處理和分佈式訓練任務,可以處理大型數據集,而無需在Ludwig中編寫任何基礎架構代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"利用Dask可以擴展Ludwig現有的Pandas預處理,從而以最小的代碼更改量實現大型數據集的處理;利用Ray,我們可以將預處理、分佈式訓練和超參數搜索結合在一起,放進單個作業中運行的單個訓練腳本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於Ludwig用戶來說,這些功能不需要代碼更改或其他基礎設施配置更改,只需在命令行中添加“raySubmit”和“-backendray”即可:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/76\/7696a9bb127446c557e623347387e374.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們相信,在優步和整個行業內,Ray在爲生產級機器學習生態系統帶來行業亟需的通用基礎架構和標準化方面將繼續發揮越來越重要的作用。在接下來的幾個月中,我們將分享更多關於將Ray的功能引入優步深度學習平臺的相關工作。請查看"},{"type":"link","attrs":{"href":"https:\/\/horovod.readthedocs.io\/en\/stable\/ray_include.html#elastic-ray-executor","title":null,"type":null},"content":[{"type":"text","text":"Elastic Horovod on Ray"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",並隨時提出任何問題、意見或建議。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"致謝"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們想感謝以下人員的工作:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Richard Liaw和Anyscale團隊將Horovod與Ray集成在一起的努力,包括Horovod in RayTune,以及他們對在優步集成Ray的持續支持。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Fardin Abdi和Qiyan Zhang將Elastic Horovod與Peloton集成所做的工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"來自G-Research的Enrico Minack,他在Elastic Horovod上的工作及其與Horovod on Spark的集成工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Yi Wang,他在ElasticDL集成和Elastic Horovod基準測試方面的工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/kuangliu\/pytorch-cifar的作者提供了用於訓練模型的PyTorch模型定義","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/kuangliu\/pytorch-cifar"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的作者提供了用於訓練模型的PyTorch模型定義。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們還要感謝所有Horovod貢獻者,如果沒有他們,這項工作是不可能完成的;感謝Linux基金會對Horovod項目的持續支持;以及AWS爲我們的持續集成系統提供的支持。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"作者介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Travis Addair是優步AI的軟件工程師,領導米開朗基羅AI平臺的深度學習訓練團隊,領導Horovod開源項目,並主持Linux基金會內的技術指導委員會。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Xu Ning是優步西雅圖工程辦公室的工程經理,目前領導優步米開朗基羅機器學習平臺的多個開發團隊。他之前曾領導優步的Cherami分佈式任務隊列、Hadoop可觀察性和數據安全團隊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Richard Liaw是Anyscale的軟件工程師,他目前領導基於Ray的分佈式機器學習庫的開發工作,是開源Ray項目的維護者之一,在讀UC Berkeley的PhD。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/horovod-ray\/","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/eng.uber.com\/horovod-ray\/"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章