谷歌的深度學習在AI芯片中找到了一條關鍵路徑

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一年前,ZDNet 與谷歌大腦總監 Jeff Dean"},{"type":"link","attrs":{"href":"https:\/\/www.zdnet.com\/article\/google-experiments-with-ai-to-design-its-in-house-computer-chips\/?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"談到"}]},{"type":"text","text":"了該公司如何使用人工智能來推進定製芯片的內部開發,從而加快軟件開發。Dean 指出,在有些情況下,與人類相比,人工智能的深度學習能夠更好地決定如何在芯片中佈置電路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e3\/8d\/e32d03fb6670370d6d375ebf1c5dd98d.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"人工智能加速器芯片的所謂搜索空間,意味着芯片的結構必須優化功能模塊。很多人工智能芯片的特點是擁有用於大量簡單數學運算的並行、相同的處理器單元,這裏稱爲“PE”,用於執行大量的矢量矩陣乘法運算,而這些運算是神經網絡處理的主要工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上個月,谷歌在 arXiv 文件服務器上發佈了一篇題爲《"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/2102.01723?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"Apollo:可遷移架構探索"}]},{"type":"text","text":"》("},{"type":"text","marks":[{"type":"italic"}],"text":"Apollo: Transferable Architecture Exploration"},{"type":"text","text":")的論文,並由主要作者 Amir Yazdanbakhsh 發表了一篇"},{"type":"link","attrs":{"href":"https:\/\/ai.googleblog.com\/2021\/02\/machine-learning-for-computer.html?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"博文"}]},{"type":"text","text":",公開展示了其中一個名爲 Apollo 的研究項目。Apollo 是一項很有意義的進展,它超越了 Dean 一年前在國際固態電路研討會(International Solid State Circuits Conference)上的正式演講以及在 ZDNet 上的發言中所暗示的含義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 Dean 當時提供的例子來看,機器學習可以被用來做一些低級的設計決定,也就是所謂的“位置和路線”。芯片設計者利用軟件來確定構成芯片操作的電路佈局,在位置和路線上,與建築物的平面圖設計相似。相反,在 Apollo 項目中,更多的是 Yazdanbakhsh 和他的同事所謂的“架構探索”,而非建築物的平面圖設計。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"芯片的架構是設計芯片的功能元素,它們如何相互作用,以及軟件程序員應該如何獲取這些功能元素。例如,典型的英特爾 x86 處理器有一定數量的片內存儲器、專用的算術邏輯單元和一些寄存器等等。這些部分的組合方式,賦予了所謂英特爾架構的意義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當被問及 Dean 的描述時,Yazdanbakhsh 通過電子郵件對 ZDNet 說:“我將看到我們的工作和位置路線項目是正交且互補的。”在談到康奈爾大學 Christopher Batten 的演講時,他對此解釋道:“架構探索遠遠高於"},{"type":"link","attrs":{"href":"https:\/\/www.csl.cornell.edu\/courses\/ece5745\/handouts\/ece5745-overview.pdf?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"計算棧"}]},{"type":"text","text":"中的位置和路線。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Yazdanbakhsh 說:“我相信,在架構探索方面,還有更大的性能提升空間。”他和他的同事把 Apollo 稱爲“第一個可遷移的架構探索基礎設施”,它是第一個可以在不同芯片上工作的程序,它對可能的芯片架構的探索能力越強,就越能把學到的東西遷移到每一個新的任務中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Yazdanbakhsh 和團隊正在開發的芯片本身就是用於人工智能的芯片,即人工智能加速器芯片。它與英偉達 A100 “Ampere” GPU、Cerebras Systems 的 WSE 芯片以及現在上市的許多其他初創公司的芯片屬於同一類。所以,使用人工智能設計芯片來運行人工智能,就是一種“對稱性”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考慮到設計人工智能芯片的任務,Apollo 項目所探索的架構適合運行神經網絡。它意味着大量的線性代數,大量的簡單的數學單元,執行矩陣乘法和結果的求和。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該團隊定義這一挑戰是爲了找出這些數學模塊的適當組合,以適應給定的人工智能任務。他們選擇了一項相當簡單的人工智能任務,一種叫做 MobileNet 的卷積神經網絡,它是一種資源高效網絡,由谷歌的 Andrew G. Howard 和他的同事在"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1704.04861?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"2017 年推出"}]},{"type":"text","text":"。另外,他們還利用內部設計的幾個網絡來測試工作負載,如對象檢測和語義分割等任務。這樣的話,目標就變成了:芯片的架構有哪些合適的參數,使得芯片能夠滿足給定的神經網絡任務的某些標準,比如速度?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該搜索涉及到超過 4.52 億個參數的排序,包括要使用多少數學單元(稱爲處理器元素),以及有多少參數內存和激活內存最適合給定模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0f\/34\/0f2a5a645b27b71bda639cd5cf7a0634.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apollo 的優勢在於,它可以將各種已有的優化方法結合起來,並觀察它們如何疊加來優化新穎的芯片設計架構。這張小提琴圖展示了相對的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"譯註"},{"type":"text","text":":小提琴圖(Violin Plot)是用來展示多組數據的分佈狀態以及概率密度。這種圖表結合了箱形圖和密度圖的特徵,主要用來顯示數據的分佈形狀。跟箱形圖類似,但是在密度層面展示更好。在數據量非常大不方便一個一個展示的時候小提琴圖特別適用。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apollo 是一種框架,它可以使用文獻中開發的各種方法進行所謂的黑盒優化,它可以根據特定的工作負載調整這些方法,並比較每種方法在解決目標方面的表現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Yazdanbakhsh 和他的同事們用一些優化方法來實現另一種對稱性,它實際上是爲開發神經網絡架構而設計的。它們包括谷歌的 QuocV. Le 及其同事在 2019 年開發的所謂"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1802.01548?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"進化方法"}]},{"type":"text","text":";"},{"type":"link","attrs":{"href":"https:\/\/openreview.net\/forum?id=HklxbgBKvr&fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"基於模型的強化學習"}]},{"type":"text","text":",以及由谷歌的 Christof Angermueller 等人開發的所謂基於羣體的方法的集成,目的是“設計” DNA 序列;以及一種貝葉斯優化方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣,Apollo 就包含了令人愉悅的對稱性的主要層次,它把神經網絡設計與生物合成設計的方法結合起來,從而設計出可反過來用於神經網絡設計與生物合成的電路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將所有這些優化進行比較,這也是 Apollo 框架的亮點。其存在的根本原因是要有條理地運用各種不同的方法,並確定哪些方法最有效。Apollo 測試的結果詳細說明了進化和基於模型的方法如何優於隨機選擇和其他方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是 Apollo 最顯著的發現是,運行這些優化方法可以讓過程比暴力搜索更加高效。舉例來說,他們比較了基於羣體的集合方法和他們稱爲體系結構方法的解決方案集的半窮舉搜索。Yazdanbakhsh 和他的同事發現,基於羣體的方法可以找到使用電路來折衷的解決方案,比如計算和內存,這通常需要了解特定領域的知識。因爲基於羣體的方法是一種學習型方法,所以它可以找到半窮盡式搜索所不能找到的解決方案:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"P3BO(基於羣集的黑盒優化)實際上是 在 3K 樣本的搜索空間中找到一個比半窮舉法稍好的設計。我們發現這個設計使用了一個很小的內存(3MB)來支持更多的 計算單元。它使用了視覺工作負載的計算密集型特性,這是最初的半窮盡式搜索空間沒有包含的特性。研究結果表明,半窮盡式算法需要人工搜索空間工程,而基於學習的優化算法利用了較大的搜索空間,減少了人工工作。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此, Apollo 可以計算出芯片設計中各種優化方法的表現。但是,它還可以做得更多,即運行所謂的遷移學習,以展示如何反過來改進這些優化方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了改進芯片的設計點,如最大芯片尺寸(以毫米爲單位),通過運行優化策略,這些實驗的結果可以作爲輸入反饋給後續的優化方法。Apollo 團隊發現,各種優化方法都是通過利用初始(或種子)優化方法的最優結果來改進它們在面積受限電路設計等任務中的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一切都要靠爲 MobileNet 或任何其他網絡或工作負載設計芯片這一事實來支持,因爲設計過程限制了特定工作負載的適用性。事實上,作者之一 Berkin Akin 曾幫助開發過 MobileNet 的一個版本 MobileNet Edge,他曾指出,優化是芯片優化和神經網絡優化的產物。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“神經網絡架構必須瞭解目標硬件架構,從而優化整體系統性能和能效。”Akin 去年與同事 Suyog Gupta 在一篇"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/2003.02838?fileGuid=3xgr169o12oUrbxS","title":"","type":null},"content":[{"type":"text","text":"論文"}]},{"type":"text","text":"中寫道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“問題很好,”Akin 在電子郵件中回答。“那得看情況了。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Akin 說, Apollo 也許可以滿足給定的工作負載,但是芯片和神經網絡之間的協同優化,將來也會帶來其他好處。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Akin 的答覆全文如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們針對給定的固定神經網絡模型套件設計了硬件,當然也有一些用例。在硬件目標應用領域,這些模型可能是已經被高度優化的代表性工作負載的一部分,也可能是用戶定製的加速器所需要的。本課題就是要解決這個問題,我們使用機器學習來爲給定的工作負載組合尋找最佳硬件架構。當然,在某些情況下,硬件設計和神經網絡體系結構可以靈活地聯合優化。實際上,我們在這種聯合協同優化方面已經取得了一些進展,我們希望能夠做出更好的權衡……"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終結論是,即使芯片設計受到人工智能的新工作負載的影響,但芯片設計的新過程也可能對神經網絡的設計產生可測量的影響,而且這種辯證關係可能在今後幾年中以有趣的方式發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tiernan Ray,畢業於普林斯頓大學,從事技術和商業報道超過 24 年。現爲 Barron 技術編輯,爲 Tech Trader 博客撰寫每日市場報道。曾供職於彭博社、SmartMoney 和 ComputerLetter,報道科技領域的風險投資。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/www.zdnet.com\/article\/googles-deep-learning-finds-a-critical-path-in-ai-chips\/"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章