性能提升最高達25倍!新型分佈式機器學習訓練加速方案RAT技術解讀

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將介紹一種在數據中心場景下的分佈式機器學習訓練加速方案,該方案主要用於加速分佈式訓練的參數交換過程。首先我們將科普性地描述分佈式訓練流程並闡述參數交換方案的定義;然後對比當前主流的參數交換方案並描述他們各自的侷限性;針對這些侷限性,我們提出了一種新型拓撲感知的參數交換方案,文中將對它的主要角色、算法及屬性展開介紹。我們已對該新型方案進行了仿真實驗驗證,結果顯示該方案在超額認購場景和網絡故障場景下分別有25倍和5.7倍的性能提升,在這兩個典型場景下具備的強彈性適應能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"本文經原作者授權轉載自轉載自《中興通訊技術》2020年第5期。"},{"type":"text","text":" "}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,深度神經網絡(DNN)被廣泛應用於計算機視覺、自然語言處理等多個應用領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DNN"},{"type":"text","text":"訓練任務可能需要數天或數週才能完成,爲了縮短訓練時間,分佈式機器學習系統被引入"},{"type":"text","text":"DNN"},{"type":"text","text":"訓練過程。因此,大量關於分佈式機器學習("},{"type":"text","text":"DML"},{"type":"text","text":")系統加速訓練的研究和方法在學術界和工業界湧現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於"},{"type":"text","text":"DML"},{"type":"text","text":"是計算密集型任務,之前大部分的研究主要集中在爲集羣計算資源設計高效的調度策略上。然而,隨着圖形處理器("},{"type":"text","text":"GPU"},{"type":"text","text":")算力的逐步提升和模型尺寸的增大,我們發現整體的訓練性能瓶頸逐漸從計算部分轉移至通信部分。例如,當在"},{"type":"text","text":"32GPU"},{"type":"text","text":"集羣中(如"},{"type":"text","text":"VGG16"},{"type":"text","text":"的大模型)訓練時,通信部分的完成時間佔據訓練任務總完成時間的"},{"type":"text","text":"90%[1]"},{"type":"text","text":"。當前已經出現了大量利用"},{"type":"text","text":"DML"},{"type":"text","text":"訓練的魯棒性,在參數同步機制"},{"type":"text","text":"[2]"},{"type":"text","text":"和減少網絡通信量"},{"type":"text","text":"[3]"},{"type":"text","text":"等方面來減緩"},{"type":"text","text":"DML"},{"type":"text","text":"通信瓶頸的研究成果,以及利用傳統數據中心網絡的流調度"},{"type":"text","text":"[4-7]"},{"type":"text","text":"和協同流調度"},{"type":"text","text":"[8-10]"},{"type":"text","text":"技術來進行通信優化的研究成果。本文中,我們主要研究"},{"type":"text","text":"DML"},{"type":"text","text":"中的參數交換過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章