面向CPU、GPU和IPU,英特爾發佈重大技術架構的改變和創新

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"8月20日消息,在"},{"type":"link","attrs":{"href":"https:\/\/edc.intel.com\/content\/www\/us\/en\/products\/performance\/benchmarks\/architecture-day-2021\/","title":"","type":null},"content":[{"type":"text","text":"2021年英特爾架構日"}]},{"type":"text","text":"上,英特爾公司高級副總裁兼加速計算系統和圖形事業部總經理Raja Koduri連同多位英特爾架構師,介紹了面向CPU、GPU及IPU的重大技術架構改變和創新細節。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c2\/c2dcf16acacc2db6423534a85931ca8a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,英特爾第一次深入介紹了其首個性能混合架構CPU Alder Lake,它不是簡單地提供下一代更強大的CPU內核,而是重構了多核架構,集成了兩款不同的x86 內核(能效核和性能核)以及硬件線程調度器(在合適的時間把合適的線程分配給合適的內核),基於Intel 7製程工藝打造。據悉,基於Alder Lake的系列產品將在今年開始出貨。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/27\/27731379fce88b051945d2b5861f07b2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新的x86性能核(曾用代號 “Golden Cove”),是英特爾迄今爲止性能最高的CPU內核,內置AI加速技術,用於學習推理和訓練。與第11代酷睿架構(Cypress Cove內核)相比,相同頻率下,性能核在一系列工作負載上平均提升了約19%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而新的x86能效核(曾用代號“Gracemont”)爲規模化處理而設計,旨在推動每瓦多核性能突破極限。對比英特爾迄今爲止最多產的CPU微架構——Skylake,能效核在提供同樣的單線程性能時,功耗僅爲Skylake的40%不到。與運行四個線程的兩個Skylake內核相比,四個能效核在性能提升80%的同時功耗更低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值得一提的是,英特爾通過與微軟合作,專門優化了新的 Alder Lake CPU 和硬件線程調度器在Windows11上的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“在整個Windows 11的開發週期中,我的團隊一直與英特爾同事合作,改進並優化我們即將推出的操作系統,使其充分利用‘性能混合’架構的優勢,尤其是硬件線程調度器。大部分工作圍繞操作系統線程調度程序展開,內核組件決定運行哪些線程在哪裏運行。”微軟Windows內核團隊開發經理Mehmet Iyigun介紹道,除了線程調度之外,Windows 11還利用硬件線程調度器的提示決定掛起或喚醒哪些內核,以節省電量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在GPU方面,英特爾很早就與開發者和遊戲引擎廠商展開合作,共同爲遊戲發燒友設計新的獨立GPU。本次架構日英特爾發佈了全新的獨立顯卡微架構Xe HPG,專爲遊戲和創作工作負載提供發燒級的高性能。基於Xe HPG的Alchemist Soc(之前代號爲DG2)將於明年第一季度上市,並採用新的品牌名英特爾®銳炫™。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而另一個面向百億億次計算的GPU——Ponte Vecchio,是英特爾至今最複雜的SoC,基於Xe HPC微架構,並採用多種先進的半導體制程工藝、英特爾的EMIB技術以及Foveros 3D封裝技術,包含1000億個晶體管,提供業界領先的浮點運算和計算密度。英特爾官方甚至將其比喻爲“堪比登月難度創新後的一款產品”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f5\/f5aaa4e796a42dc14a37011c31830334.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在架構日上,英特爾表示,早期的 Ponte Vecchio 芯片展示了領先的性能,在流行的 AI 基準測試中創造了推理和訓練吞吐量的行業記錄。比如A0芯片性能提供了高於45 TFLOPS的FP32吞吐量,高於5 TBps的內存結構帶寬,以及高於2 TBps的連接帶寬。同時,英特爾分享了一段演示視頻,展示了ResNet推理性能超過43000張圖像\/秒和超過每秒3400張圖像\/秒的ResNet訓練,並且這兩項性能都有望實現行業領先。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ponte Vecchio已走下生產線進行上電驗證,並已開始向客戶提供限量樣品。Ponte Vecchio預計將於2022年面向HPC和AI市場發佈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,英特爾還發布了專爲數據中心設計的下一代處理器Sapphire Rapids,它的核心是一個模塊化的分區SoC架構,具有異構計算基礎設施的架構基礎,並搭配最高的計算密度和內存帶寬。Sapphire Rapids也是基於Intel 7製程工藝技術,並採用新的性能核微架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IPU方面,英特爾與一家雲服務提供商合作架構了Mount Evans來減輕基礎設施負載。Mount Evans也是英特爾首款專用ASIC IPU,融合了多代FPGA SmartNIC的經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,英特爾還提及了"},{"type":"link","attrs":{"href":"https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/oneapi.html#gs.9d5un2","title":"","type":null},"content":[{"type":"text","text":"oneAPI"}]},{"type":"text","text":",這是英特爾在2019年推出的開源軟件解決方案,它可以提供單一、開放和統一的編程模型,能夠簡化跨不同架構的開發工作。英特爾亦提供了商用部署的完整oneAPI堆棧,包括基本的oneAPI基礎工具包,它在規範語言和庫之外增加了編譯器、分析器、調試器和移植工具。據Raja介紹,自從2020年12月發佈第一個版本以來,超過20萬名開發者在獲得Xe HPC之前就已經安裝了英特爾的oneAPI產品,市場上有超過300個採用oneAPI統一編程模型的應用軟件。今年5月份發佈的1.1版臨時規範爲深度學習工作負載和高級光線追蹤庫添加了新的圖形接口,預計正式的1.1版將在年底完成。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章