達摩院成功研發存算一體AI芯片,性能提升10倍以上,突破馮·諾依曼架構性能瓶頸

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"12月3日,InfoQ獲悉,達摩院成功研發新型架構芯片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該芯片是全球首款基於DRAM的3D鍵合堆疊存算一體AI芯片,可突破馮·諾依曼架構的性能瓶頸,滿足人工智能等場景對高帶寬、高容量內存和極致算力的需求。在特定AI場景中,該芯片性能提升10倍以上,能效比提升高達300倍。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"爲何研發存算一體芯片?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從計算機誕生起,計算機系統就是在馮·諾依曼架構下運行。在馮·諾伊曼架構中,計算與內存是分離的,計算單元從內存中讀取數據,計算完成後再存回內存。然而,隨着人工智能等對性能要求極高的場景爆發,這一技術架構的短板逐漸顯露,例如功耗牆、性能牆、內存牆的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6c\/02\/6c20917efb4fd885byye595e77996b02.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"造成這一問題的原因主要有兩點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一是數據搬運帶來了巨大的能量消耗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據顯示,在傳統架構下,數據從內存單元傳輸到計算單元需要的功耗是計算本身的約200倍,因此真正用於計算的能耗和時間其實佔比很低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二是內存的發展遠遠滯後於處理器的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,處理器的算力以每兩年3.1倍的速度增長,而內存的性能每兩年只有1.4倍的提升。內存發展的速度嚴重滯後於處理器的發展速度,這就好比一個漏斗,寬的一端是處理器,而狹窄的一端則是存儲器,後者的性能極大地影響了數據傳輸的速度,這也被認爲是傳統計算機的阿克琉斯之踵。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9f\/3e\/9f2e5be2b180fec079a1aba376838f3e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存算一體芯片是目前解決該問題的最佳途徑,它類似於人腦,將數據存儲單元和計算單元融合爲一體,大幅減少數據搬運,從而極大提高計算並行度和能效。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一技術早在90年代就被提出,但受限於技術的複雜度、高昂的設計成本,以及缺少應用場景,過去幾十年業界對存算一體芯片的研究進展緩慢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着AI場景的爆發,業界迫切需要該技術來解決算力瓶頸,達摩院希望通過自研創新技術解決業界難題。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"三種實現存算一體路線"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實現存算一體有三種技術路線:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.  近存儲計算(Processing Near Memory):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算操作由位於存儲芯片外部的獨立計算芯片完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.  內存儲計算(Processing In Memory):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算操作由位於存儲芯片內部的獨立計算單元完成,存儲單元和計算單元相互獨立存在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 內存執行計算(Processing With Memory):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲芯片內部的存儲單元完成計算操作,存儲單元和計算單元完全融合,沒有一個獨立的計算單元。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,近存計算通過將計算資源和存儲資源距離拉近,實現對能效和性能的大幅度提升,被認爲是現階段能解決內存牆問題的最佳途徑。達摩院本次也是沿着這一方向進行突破。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"達摩院有哪些技術創新?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBM是將內存和計算結合在一起的主要方案之一,但該技術受限於單位容量帶寬不足和功耗高的缺點,無法有效解決內存牆問題。而混合鍵合(Hybrid Bonding)的3D堆疊技術擁有高帶寬、低成本的特點,被認爲是低功耗、近存計算的完美載體之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此次,達摩院研發的芯片首次採用混合鍵合(Hybrid Bonding)的3D堆疊技術——將計算芯片和存儲芯片face-to-face地用特定金屬材質和工藝進行互聯。最終的測試芯片顯示,這種存算技術和架構的優勢明顯,能通過拉近存儲單元與計算單元的距離增加帶寬,降低數據搬運的代價,緩解由於數據搬運產生的瓶頸,而且與數據中心的推薦系統對於帶寬\/內存的需求完美匹配。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d8\/10\/d87728dc38db4d94ff0016ee0413a410.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在設計方面,該芯片內存單元採用異質集成嵌入式DRAM,擁有超大內存容量和超大帶寬優勢;計算芯片方面,達摩院研發設計了流式的定製化加速器架構,對推薦系統進行“端到端”加速,包括匹配、粗排序、神經網絡計算、細排序等任務。這種近存架構也有效解決了帶寬受限的問題,最終內存、算法以及計算模塊完美融合,大幅提升帶寬的同時還實現了超低功耗,展示了近存計算在數據中心場景的潛力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該技術的研究成果已被芯片領域頂級會議ISSCC 2022收錄,未來可應用於VR\/AR、無人駕駛、天文數據計算、遙感影像數據分析等場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"達摩院計算技術實驗室科學家鄭宏忠表示:“存算一體是顛覆性的芯片技術,它天然擁有高性能、高帶寬和高能效的優勢,可以從底層架構上解決後摩爾定律時代的芯片性能和能耗問題,達摩院研發的芯片將這一技術與場景緊密結合,實現了內存、計算以及算法應用的完美融合。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據悉,達摩院計算技術實驗室專注研究芯片設計方法學和新型計算機體系結構技術,已擁有多項領先成果,在ISSCC、ISCA、MICRO、HPCA等頂級會議上發表多篇論文。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"存算一體芯片技術研究處在初期階段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,整個行業對存算一體芯片技術的研究依舊處於探索階段,在工藝成熟度、典型應用、生態系統等方面仍不成熟,達摩院希望逐步攻克技術難題,先研究基於三維堆疊的近存芯片,通過拉近存儲單元與計算單元的距離、增加帶寬,來降低數據搬運的代價,緩解由於數據搬運產生的瓶頸。未來,達摩院會進一步攻克存內計算技術。在應用方面,會和阿里內部業務緊密合作,未來逐步針對內部AI應用場景適配優化。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"存算一體技術將成類腦計算關鍵技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存算一體芯片在海量數據計算場景中擁有天然的優勢,在終端、邊緣端以及雲端都有廣闊的應用前景。例如VR\/AR、無人駕駛、天文數據計算、遙感影像數據分析等場景中,存算一體芯片都可以發揮高帶寬、低功耗的優勢。從長遠來看,存算一體技術還將成爲類腦計算的關鍵技術。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章