“一鍵”部署分佈式訓練,微軟“羣策MARO”上新集羣管理助手

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"來源 | 微軟亞洲研究院"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020年,"},{"type":"link","attrs":{"href":"http:\/\/#wechat_redirect?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","text":"微軟亞洲研究院發佈並開源了多智能體資源優化平臺“羣策 MARO”"}]},{"type":"text","text":"。爲了幫助不同需求的用戶進行更加便捷、高效的集羣管理,也希望用戶可以方便快捷地部署分佈式訓練任務,微軟亞洲研究院的研究員和工程師們基於“羣策MARO”平臺搭建了集羣管理界面:MARO CLI。本文將爲大家詳細介紹MARO CLI的功能和使用方法。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着強化學習的不斷髮展,多種多樣的算法、框架層出不窮,對計算資源的要求也與日俱增。爲了推動更大規模的訓練,也爲了獲得更高的訓練效率,如今對分佈式集羣的需求也在不斷增加。因此,微軟亞洲研究院的研究員和工程師們在此前打造的通用資源優化平臺羣策(Multi-Agent Resource Optimization Platform,"},{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/maro?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","text":"MARO"}]},{"type":"text","text":")上,構建了一套輕量級的集羣管理界面:MARO Command Line Interface(MARO CLI)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了提供高效靈活的環境組件、當前主流和前沿的強化學習算法,MARO平臺希望通過MARO CLI幫助不同需求的用戶進行更加便捷高效的集羣管理,也希望用戶可以方便快捷地部署分佈式訓練任務。基於MARO平臺,作爲構建和管理訓練用集羣的命令行接口,MARO CLI提供的主要特性包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多種方式的集羣構建,既可以創建基於Azure雲服務器或者AKS服務的遠端集羣, 也可以將已有的計算資源整合成本地集羣,從而提高計算資源的利用效率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部署任意的訓練任務到指定集羣中,並根據每個任務的資源需求和當前集羣的空閒資源進行任務分配,從而更合理地利用集羣資源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有任務均部署在容器中運行,任務之間更加獨立,也更方便支持新的強化學習框架和算法,具有更好的擴展性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提供了配套的可視化界面,包括對硬件、任務、日誌的監控。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們將詳細介紹一下MARO CLI的架構和功能,希望能幫助大家更好地利用MARO CLI來進行分佈式集羣的訓練。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MARO Process"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了讓開發者能夠平緩地從單機模式過渡到分佈式集羣模式、降低調試成本和開發成本,MARO CLI提供了Process mode,如圖1所示,這是MARO CLI中一種比較簡單的本地單機管理模式。在這種模式下,MARO CLI並不會創建真正的分佈式集羣,而是在本機中通過使用多進程來啓動訓練任務,模擬真實分佈式集羣操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/4d\/23\/4d874038dce7bd0bf04cc4477801b723.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 1. MARO Process mode 示意圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過在單機中模擬分佈式場景主要有兩個優點:易於調試和開發成本低。與單機環境不同的是,要使任務能順利地在分步式集羣中運行,需要對代碼進行一系列修改。使用MARO Process mode來測試修改好的代碼可以更直接地發現錯誤,而且不需要真正的分佈式集羣,這樣可以節省一大筆開發成本。麻雀雖小,五臟俱全,在MARO Process mode下,使用Redis和MARO服務也可以做到任務管理和監控。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MARO Grass"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Grass mode是MARO CLI中最重要的一部分,如圖2所示。在這種模式下,MARO CLI支持三種集羣的創建模式:本地單機(grass\/local), 本地集羣(grass\/on-premises) 和 Azure雲集羣(grass\/azure)。除了Grass Local模式外,其他Grass模式都會創建並管理真正的分佈式集羣,正如圖2所示,在Grass mode下,MARO CLI 會通過一系列組件來實現分佈式集羣管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/25\/71\/2534ba98b487ff03eb3dd4eb63f9c571.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 2. MARO Grass mode 示意圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與單機模式不同的是,我們將MARO Grass集羣分爲master節點和node節點。在master節點上,我們使用Redis作爲一箇中心化的數據庫來儲存運行時產生的數據,使用samba-server進行整個集羣的文件共享,並通過fluentd進行整個集羣的日誌收集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時,MARO CLI也會啓動master-agent服務來進行任務分配和集羣狀態監控,以及一個RESTFul server: master-api-server執行外部的命令,例如任務創建或者集羣狀態監控。在node節點上,則會啓動node-agent服務不斷記錄自身節點的狀態和任務容器的狀態並上傳到master上的Redis,也會運行samber-client和RESTFul server: node-api-server來與master節點進行交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在MARO Grass mode下,與集羣的交互都通過master節點來進行:可以通過ssh進行文件和數據的傳輸,也可以通過Web Client進行集羣任務管理和狀態監控。出於安全考慮,對於每次Web Client的訪問我們都會使用RSA+AES混合加密,而集羣內部的通訊則是不加密的。master節點收到加密指令後,將執行具體的操作、與node節點進行交互、將任務部署到具體某個容器中(可能被分配到不同的node節點上)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面介紹一下三種集羣創建模式的特點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地單機(grass\/local)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO Grass Local與MARO Process類似,都是本地單機的集羣模擬,但與之不同的是MARO Grass Local會將任務部署在容器內,也允許客戶自定義模擬集羣或任務的資源大小,更加貼合真實的分佈式集羣操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地集羣(grass\/on-premises)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO Grass On-Premises可以利用手邊現有的計算資源來快速創建集羣,並進行高效便捷的管理。用戶可以將在同一局域網內的資源自由加入到創建的Grass集羣中,並通過MARO CLI進行任務分配和集羣管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Azure雲集羣(grass\/azure)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO Grass Azure是一種分佈式集羣管理,主要用於基於Azure雲的遠程集羣。基於Azure CLI的部分接口,MARO CLI可以實現Azure雲集羣的自定義創建、節點的增減和集羣狀態監控。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MARO K8S"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO CLI同樣支持使用Kubernetes (K8S) 來創建集羣,如圖3所示。Kubernetes是一個開源的、用於管理雲平臺中多個主機上的容器化的應用,同時也是一個知名度很高,並被廣泛應用的集羣管理軟件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/43\/f0\/431244293ef616d51d3cc68c30ae05f0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 3. MARO K8S mode 示意圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過對Kubernetes的支持,可以滿足用戶對Kubernetes 集羣的需求,也更方便那些Kubernetes集羣用戶上手熟悉MARO CLI。依賴於Kubernetes的架構,我們可以輕鬆地創建擁有數以百計的節點的大型集羣,這賦予了MARO CLI更好的延展性和更高的穩定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在此模式下,我們使用Azure File Service在所有Kubernetes Pods下進行文件共享,同時所有的任務都會部署在Kubernetes Pods中,由Kubernetes進行維護。如果需要使用鏡像,我們則會使用Azure Container Registry來進行鏡像管理。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"舉個“栗子”:從單機到分佈式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在MARO平臺中,我們準備了很多場景和很多算法的示例,對每個示例也分別準備了單機版和分佈式版本。通過使用MARO平臺中的RL toolkit 和Communication toolkit,就可以將單機版的訓練任務改成分佈式版本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這裏我們使用示例中針對Container Inventory Management(CIM)問題的DQN算法,來說明如何通過MARO CLI一步步部署分佈式訓練任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完整示例代碼可參考:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/maro\/tree\/master\/examples\/cim\/dqn?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","text":"https:\/\/github.com\/microsoft\/maro\/tree\/master\/examples\/cim\/dqn"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於每個模式的詳細使用說明,請見文末。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"使用MARO Process模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在MARO Process 模式下,首先我們通過maro process create 命令在本地啓動MARO Process 模式,之後通過maro process template 命令來生成MARO任務模版,如圖4所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/c9\/5e\/c9bba8a28229a1cb95147a676ea01f5e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 4. MARO Process mode創建集羣模版"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在示例中,我們將DQN算法拆成了actor和learner,然後將我們需要的數量和啓動命令寫到模版的對應位置,再通過maro process job start啓動任務。我們可以通過maro process job stop\/list\/log 命令進行任務管理,也可以在可視化界面查看任務狀態。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"使用MARO Grass\/Azure 模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用MARO Grass\/Azure 模式需要用戶具有一定的Azure使用經驗,因爲在創建集羣時會對Azure有一定的權限要求。與MARO Process模式一樣的是,我們可以通過maro grass template 命令來生成集羣模版和任務模版,如圖5、圖6所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ac\/f4\/ac981690049e83e2f3c5dfe07b3b6cf4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖5. MARO Grass mode 創建集羣模版"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/00\/59\/005de210e910f920422c8034b96ca659.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 6. MARO Grass mode 創建任務模版"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先根據我們的Azure賬戶將集羣模版補充完整,然後通過maro grass create 命令啓動所需要的集羣,並用maro grass node scale來控制集羣node節點資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO Grass下的任務模版與MARO Process很不相同,因爲會將任務容器化,所以在啓動任務之前,需要通過maro grass image push命令將需要的鏡像文件部署到剛剛創建的集羣上,再通過maro grass data push將需要用到的文件傳送到集羣中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,可以給每種組件(component)分配不同的資源,最優化地利用集羣資源。在鏡像和文件都部署到集羣后,我們就可以通過maro grass job start將訓練任務部署到集羣之中了。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一目瞭然的可視化界面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO CLI提供了一個簡潔明瞭且帶有內置命令行終端的可視化界面,方便用戶進行集羣管理和任務狀態查詢。在界面中首先顯示的是當前集羣的資源信息和使用率,同時也會依訓練任務狀態來展示集羣內的任務概覽。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/a5\/b9\/a5a544e2ba75def03cb9438105730ab9.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 7. 集羣可視化界面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ac\/8e\/ac72cf29965d62ddaebc78445ea57e8e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖 8. MARO CLI 結構概覽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與其他集羣管理平臺不同,MARO CLI並不僅僅支持一種集羣,它提供了多種模式來滿足用戶對集羣的不同需求。對於剛剛接觸分佈式訓練的用戶來說,我們建議在熟悉了MARO RL toolkit和Communication toolkit之後,可以使用MARO CLI中的Process和Grass Local 模式在單機中模擬集羣操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而對於手邊有空閒計算資源並對分佈式集羣有一定了解的用戶,則可以通過MARO CLI中的 Grass On-Premises 模式快速搭建集羣,並在集羣中部署訓練任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於有一定Azure雲使用經驗的用戶,可以通過MARO CLI中的Grass Azure來構建基於Azure雲的遠端集羣。如果還有已經使用過Kubernetes的用戶,MARO CLI同樣也支持搭建Kubernetes集羣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MARO CLI仍是一個正在蓬勃發展的項目,未來將會不斷改進,變得更加簡單、快速和強大。歡迎大家關注並使用MARO平臺,也歡迎大家與我們進行技術交流!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MARO CLI文檔"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/maro.readthedocs.io\/en\/latest\/key_components\/orchestration.html?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/maro.readthedocs.io\/en\/latest\/key_components\/orchestration.html"}],"marks":[{"type":"italic"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MARO CLI相關模式使用說明:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/multi_processes_localhost_provisioning.html?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/multi_processes_localhost_provisioning.html"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/grass_azure_cluster_provisioning.html?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/grass_azure_cluster_provisioning.html"}],"marks":[{"type":"italic"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/grass_on_premises_cluster_provisioning.html?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/grass_on_premises_cluster_provisioning.html"}],"marks":[{"type":"italic"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/k8s_cluster_provisioning_on_azure.html?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/maro.readthedocs.io\/en\/latest\/installation\/k8s_cluster_provisioning_on_azure.html"}],"marks":[{"type":"italic"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MARO GitHub頁面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/maro?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/github.com\/microsoft\/maro"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MARO 0.2版本具體更新歷史"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/maro\/pull\/239?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/github.com\/microsoft\/maro\/pull\/239"}],"marks":[{"type":"italic"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/maro\/pull\/297?fileGuid=pyJwxgjJDPXgyVdY","title":"","type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/github.com\/microsoft\/maro\/pull\/297"}],"marks":[{"type":"italic"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章