爲多樣性算力軟件開發鋪就平坦大道:華爲北冥融合架構全面解析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2019年2月的ACM通訊上,計算機架構宗師、架構領域“聖經”著作《計算機架構與量化研究方法》的作者David Patterson與John Hennessey共同發表了一篇重磅論文:《計算機架構的新黃金十年》,闡述了他們對未來十年計算體系結構發展的分析和展望。論文指出,以硬件爲中心、針對特定領域和用途設計的架構(DSA)將提供顯著的性能和能效收益,在摩爾定律走向終結的同時爲計算機帶來新的活力與發展機遇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DSA對今天的IT從業者來說並不陌生。GPGPU和AI專用加速器就是最常見的DSA硬件實現,並且已經在工業界廣泛部署,取得了突出的成就並創造了可觀的經濟價值。如今,越來越多的數據中心開始在服務器集羣中安裝GPU和AI加速硬件,爲視頻壓縮和後期處理、數據分析、個性化推薦、智能決策等大規模任務提供驚人的加速比。DSA已經逐漸成爲與傳統CPU地位並列的計算機關鍵核心組成部分,爲產業和社會帶來愈加深遠的影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,正如兩位宗師在他們的論文中所述,DSA的崛起也爲軟件開發行業帶來了嚴峻的挑戰。傳統CPU上流行的通用編程語言和工具鏈難以爲DSA硬件有效編程,後者需要專門開發的領域特定語言(DSL)才能充分發揮硬件潛能。但今天的DSL語言生態纔剛剛起步,遠未成熟,給軟件開發者帶來了諸多門檻與障礙。與蓬勃發展的DSA硬件相比,DSL開發社區的前進步伐已經明顯落後,嚴重拖累了創新硬件在行業中的應用和推廣進程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到,爲DSA硬件打造門檻更低、能力更強的語言生態是產業當下的迫切需求。開發人員更希望能有一種具備充分包容性的生態環境,能夠一站式覆蓋傳統CPU與新興DSA硬件的開發需求,徹底打通不同硬件架構之間的軟件壁壘,降低開發成本,提升開發效率,並充分發揮不同架構的硬件能力。面對這樣的需求,華爲於HC2021大會上正式發佈了“北冥多樣性計算融合架構”,爲數據中心的計算集羣跨架構、跨算力類型軟件開發提供了一站式全套解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"北冥多樣性計算融合架構解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據華爲介紹,北冥多樣性計算融合架構是主要面向多樣性算力集羣的軟件開發需求,融合開發語言、編譯器、調度器、開發框架、計算套件、開發工具鏈在內的整套多樣性算力開發解決方案與生態。方案分爲基礎使能、應用使能和開發使能三大組成部分,完整覆蓋軟件開發端到端流程,實現極簡開發、高效部署、極致性能的多樣性算力開發目標。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"基礎使能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一如《計算機架構的新黃金十年》論文所述,編程語言和編譯器在多樣性算力軟件開發流程中扮演着核心角色。爲了撫平開發人員學習曲線、提升跨算力架構開發效率,華爲在通用的C++語言基礎上研發了畢昇C++語言。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"畢昇C++支持SYCL異構編程標準,一種語言即可對ARMv8、x86處理器和昇騰AI加速器、Nvidia GPU編程。畢昇C++還將不同算力的共性特徵抽象出來,爲多種硬件上的不同計算單元提供統一的矩陣編程接口,並支持多種並行模式,一套源碼即可在多樣算力上重用。單語言、單源碼的開發方式爲開發者避免了多樣性計算系統的編程複雜性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與畢昇C++搭配,北冥架構還提供了跨多樣算力統一編譯及融合優化能力的畢昇編譯器。畢昇編譯器實現了多樣算力的統一編譯,配合畢昇C++可直接生成融合多種算力代碼的fatbin可執行程序。編譯器還融合了多種多樣算力優化技術,打破了編譯器只能對單算力編譯優化的限制,在SPEC ACCEL等基準測試中性能提升30%以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"統一API接口、具備多算力協同能力的高性能北冥融合加速庫是北冥基礎使能軟件的第三大利器。北冥融合加速庫兼容主流應用框架,爲多算力提供統一API接口,並構建了業界獨有的多算力算子層,可將計算過程拆解到最適合的算力架構上,將計算機視覺、自然語言處理等主流模型訓練效率提升1倍以上。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"應用使能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在多樣性計算集羣上開發融合應用,開發者不僅要面對大規模並行應用開發的複雜性,還需要解決融合應用跨算力部署的難題。對於大規模計算集羣而言,應用的全棧性能優化是極其複雜和困難的挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對這一開發痛點。北冥多樣性計算融合架構打造了包括多瑙統一調度器、元戎分佈式並行開發框架、昇思科學計算套件在內的應用使能軟件組合,幫助開發者實現分佈式應用的極簡開發、融合應用的高效部署以及多樣性計算系統的全棧最優效能。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"多瑙統一調度器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多瑙統一調度器可爲多樣性算力集羣提供應用與資源的最佳匹配。相比傳統集羣算力調度器,多瑙統一調度器具備諸多顯著優勢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"目前,多瑙統一調度器支持ARMv8與x86的CPU計算資源、昇騰AI加速器與Nvidia GPU計算資源的統一調度,未來還將進一步支持其他流行算力,解決了多樣性算力集羣的跨架構應用開發部署與調度難題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"多瑙統一調度器抽象出統一的資源模型,支持多樣算力資源納管,通過與系統軟件和硬件聯動,結合靜態、實時的資源與負載信息做出調度決策。多瑙的調度算法結合了多種技術的優點,除了對算力、網絡、存儲、能效的深度感知,還納入了基於應用的專家系統以及對負載的歷史數據分析。基於上述特性,多瑙統一調度器能夠自動適應集羣縮放規模,爲大規模複雜集羣提供實時精準調度,並打破了集羣資源牆,實現更加靈活的資源共享和更高的資源利用率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"多瑙統一調度器在研發之初就遵循最嚴苛的企業級安全、可靠性標準,且已在國內多家知名企業、廣泛的業務領域實踐應用,其可靠性、可維護性得到市場充分驗證。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"多瑙統一調度器爲運維人員提供了統一的web一站式運維管理平臺,包含了作業管理、用戶管理、集羣監控運維、統計報表、自定義作業提交模板、遠程可視化等豐富功能,創造極簡的運維體驗。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"在傳統單一算力集羣資源調度方面,多瑙統一調度器也具備集羣規模、調度性能、資源利用率等核心調度指標的領先優勢,可充分發揮集羣有效算力,提升投資回報。未來多瑙還將通過元調度器的分級調度能力支持跨數據中心的調度,可以基於開放的接口對接第三方調度器,爲多樣性計算算力網絡的構建提供關鍵技術支撐。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"元戎開發框架與昇思科學計算套件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元戎分佈式並行開發框架是面向多樣性計算集羣打造的分佈式並行開發框架,幫助開發人員在多樣性計算集羣上享受單機編程體驗,還能夠成倍提升視頻特效等場景的開發速度,並數量級提升金融資管風控等算法的計算效率。昇思科學計算套件則爲高性能科學應用提供了計算新範式,通過多尺度混合計算和高階混合微分兩大關鍵創新實現了融合應用的統一加速,爲多種科學計算場景帶來了可觀的效率提升。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"開發使能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲解決開發人員在多樣性計算系統下的開發效率問題,在北冥多樣性計算融合架構開發使能層,華爲通過MindStudio統一工具鏈提供了開發全流程支持:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MindStudio通過多算力硬件抽象、統一基礎平臺、統一工具鏈前端的三層解耦架構,構建全流程一站式開發基礎能力;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MindStudio爲開發人員提供統一集成開發環境,實現開發全流程連貫無斷點,並通過插件化技術支持功能靈活拼裝和開發流程自定義;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MindStudio還專門提供跨算力聯合調試、全系統協同調優、仿真環境按需集成和開發資源一鍵獲取等功能,幫助開發人員大幅提升面向多樣性算力的開發效率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在MindStudio開發社區中,開發人員能夠獲得資源下載、指導手冊、在線課程及實驗、開發者沙龍等一系列支持和幫助。華爲還爲開發人員建立了交流論壇,超過20位華爲工程師在線爲開發人員解答技術問題。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"北冥多樣性計算融合架構:爲多樣性算力集羣的軟件開發鋪平道路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在震動業界的《計算機架構的新黃金十年》論文發表後不到三年的時間裏,華爲就憑藉自身深厚的技術積累與積極創新的研發與實踐,開發出了直面多樣性算力架構軟件開發挑戰的北冥多樣性計算融合架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"北冥計算架構不僅解決了跨傳統CPU與新興DSA硬件、跨多種指令集與計算類型的軟件開發難題,還一步到位突破了大規模計算集羣的多樣性算力部署與調度障礙,一舉打通了多樣性算力的軟件開發端到端流程,實現開發效率提升、成本下降、軟件性能深度優化等一系列關鍵目標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着北冥計算架構的誕生,行業多樣性算力生態也迎來了一個全新的發展階段。可以預計,越來越多的企業將在北冥計算架構的幫助下加快多樣性算力集羣的部署,爲研發和業務部門提供強大的基礎設施支持,從而在激烈的市場競爭中取得領先優勢。與此同時,華爲與合作伙伴、開發社區也將通力合作,不斷提升北冥計算架構的各方面能力,使北冥成爲業內領先、影響廣泛的多樣性算力一流開發生態。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章