微軟機器學習最新進展

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文是微軟VB實驗室\/英偉達GTC洞察力系列文章的一部分。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着人工智能和機器學習技術的飛速發展,微軟在今年"},{"type":"link","attrs":{"href":"https:\/\/www.nvidia.com\/en-us\/gtc\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"英偉達GTC活動"}]},{"type":"text","text":"中的存在感一如既往地強勢,這並不是什麼稀奇事。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微軟的公司代表在多場會議上分享了他們最新的機器學習成果,包括規模推理、在混合環境中訓練機器學習模型的新能力,以及首次亮相的、可以幫助數據科學家們更高效地分析和排除ML性能問題的新型PyTorch Profiler。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微軟的這三項創新成果均結合了微軟自己的科技(如Azure),開源工具與英偉達GPU硬件科技。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"機器推理規模化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於數據收集和機器學習模型訓練的成本,人們的討論很熱烈。的確,這些計算的開銷不是小數目,尤其是對於一些大型項目來說,高達數百萬美元的計算花費並不少見。但在這些關於AI花銷的討論中,機器推理,一個基本可以算作是訓練後ML模型的應用,卻甚少被提及。隨着深度學習模型越來越複雜,即使是在機器進行推理時,也涉及大量的數學表達式和浮點運算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器推理是人工智能中非常有趣的分支之一,因爲它是微軟Azure等團隊爲用戶提供切實體驗的階梯。其中一個例子便是,Azure團隊與英偉達合作,優化改進微軟Word中由"},{"type":"link","attrs":{"href":"https:\/\/blogs.nvidia.com\/blog\/2020\/10\/05\/microsoft-triton-ai-grammar-word\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"AI助力"}]},{"type":"link","attrs":{"href":"https:\/\/blogs.nvidia.com\/blog\/2020\/10\/05\/microsoft-triton-ai-grammar-word\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"的語法檢查器"}]},{"type":"text","text":"。這項任務的目的不是爲了訓練模型以提供更優秀的語法檢查,而是爲了增強實際執行語法檢查的推理引擎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考慮到Word龐大的用戶羣體,該語法檢查器需要進行數十億次的推理,是屬於計算密集型的任務。這就帶來了兩個互相關聯的難題:一是技術問題,二則是財務問題。如果想要降低成本,我們就需要更加強大且高效的技術。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"英偉達開發的"},{"type":"link","attrs":{"href":"https:\/\/developer.nvidia.com\/nvidia-triton-inference-server?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"Triton推理服務器"}]},{"type":"text","text":",可以充分利用其GPU的運算能力,供給"},{"type":"link","attrs":{"href":"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"Azure"}]},{"type":"link","attrs":{"href":"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"的"}]},{"type":"link","attrs":{"href":"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"機器學習"}]},{"type":"text","text":"模型進行推理。在二者的結合下,工作負載得到了優化,運行也更加流暢。該推理服務器支持所有常用框架,包括PyTorch、TensorFlow、MXNet以及"},{"type":"link","attrs":{"href":"https:\/\/onnx.ai\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"ONNX"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/onnxruntime?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"ONNX Runtime"}]},{"type":"text","text":"作爲一款高性能推理引擎,利用多種硬件加速器以達到在不同硬件配置上的最佳性能表現。在微軟與英偉達的緊密合作下,ONNX Runtime集成了用於在英偉達GPU上進行模型加速的TensorRT加速器。ONNX Runtime也被用做Triton服務器的後端之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Azure機器學習是一個託管的平臺即服務平臺,爲用戶做大部分的管理工作。這涉及到規模問題,這也是許多人工智能項目陷入困境甚至失敗的關鍵所在,也是技術問題有時會與財務問題發生衝突的地方,而Triton和Azure機器學習就是爲了解決這一痛點而建立的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Kubernetes:讓跨企業內部\/混合與多雲訓練ML模型變得更容易"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"混合訓練環境的搭建並非易事,而擴展資源密集型的ML模型訓練規模的需求則會使問題變得更加棘手。靈活、敏捷以及治理都是至關重要的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/azure.microsoft.com\/en-us\/services\/azure-arc\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"Azure Arc"}]},{"type":"text","text":"基礎設施允許擁有Kubernetes資源的客戶在“單一虛擬管理平臺”上應用策略,執行安全監控等一系列操作。Azure機器學習與Kubernetes的集成通過擴展Kubernetes API的形式,搭建在Azure Arc的基礎設施上。除此之外,通過原生Kubernetes代碼概念(如操作符和CI\/CDs)以及運行在集羣之上的“代理”,客戶得以使用Azuer機器學習進行ML模型訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論用戶混合使用了什麼集羣,Azure機器學習都讓他們可以輕鬆地切換目標。Azure機器學習的Kubernetes本地代理支持的框架包括SciKit、TensorFlow、PyTorch和MPI。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地代理也讓這套系統的運行更加順滑。它免去了數據科學家們學習Kubernetes的需要,也讓瞭解Kubernetes的IT操作員免去了學習機器學習的功夫。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"PyTorch Profiler"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全新的PyTorch Profiler是一款由微軟與Facebook合作開發的開源工具,爲常用ML框架PyTorch提供GPU的性能調試功能。這款故障排除工具有望幫助數據科學家們和開發者們以更高效的方式分析和排查大規模深度學習模型的性能問題,最大限度地提高昂貴的計算資源(硬件)的使用率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在機器學習中,分析(profiling)負責檢查模型的性能。這與模型預測的準確性不同,性能在這裏指的是模型對計算機硬件資源的使用效率與使用率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新型的Profiler以PyTorch原有的autograd分析器爲基礎構建,通過高保真的GPU分析引擎的加強,使用戶能夠捕捉並關聯有關PyTorch操作的信息和GPU硬件級的詳細信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PyTorch Profiler的配置和使用都不需要花費太多精力。它完完全全是集成的產物,結合了全新Profiler的profile模塊、全新libkineto庫,以及PyTorch Tensorboard Profiler的插件。你在Visual StudioCode將這一切全部可視化。它不僅適合初學者,也適合經驗豐富的從業者,它的應用橫跨研究到生產的各種用例,它是對英偉達更先進的"},{"type":"link","attrs":{"href":"https:\/\/developer.nvidia.com\/nsight-visual-studio-edition?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"NSight"}]},{"type":"text","text":"的補充。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PyTorch Profiler的主要功能之一是它的時間線追蹤。簡單來說,它可以顯示CPU與GPU的活動,讓用戶可以放大並觀察每個活動的具體情況。在這裏,你可以看到所有典型的PyTorch操作符,以及更高級的Python模型和GPU時間線。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶在PyTorch Profiler的可視化窗口中觀察GPU利用率的情況時,可能會注意到一些小的“缺口”。這些小缺口代表着GPU可能會有約40毫秒的空閒,而用戶會希望優化這些空閒時間,讓GPU有事可做。PyTorch Profiler可以讓用戶更深入地瞭解GPU的運作,看看有哪些依賴關係,以及在這個空閒間隙之前有哪些事件。如果將問題追溯到CPU,用戶可能會發現它纔是瓶頸所在,而GPU則乾坐在那裏等待系統的另一部分讀取完它需要的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這樣微觀的層面上檢測GPU效率或許看起來微不足道,但如果一個步驟只需要150毫秒,那麼此時GPU中的40毫秒空閒將會佔據相當大的比例。再考慮一下,如果一個項目一次運行需要數小時,甚至是數週時,那麼斤斤計較每一步中的損失就變得必要了,因爲那意味着你在計算週期中付出的金錢變得低效了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PyTorch Profiler同時還提供建議功能,用於指導模型構建者們解決常見的問題和可能遇見的情況。在本文關於GPU利用率的例子中,你要做的可能只是調整DataLoader的worker數量,以確保GPU能夠保持忙碌狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/venturebeat.com\/2021\/04\/22\/microsoft-details-the-latest-developments-in-machine-learning-at-gtc-21\/?fileGuid=VcGCRKtKhTDdrRpp","title":"","type":null},"content":[{"type":"text","text":"https:\/\/venturebeat.com\/2021\/04\/22\/microsoft-details-the-latest-developments-in-machine-learning-at-gtc-21\/"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章