面向CPU、GPU和IPU,英特尔发布重大技术架构的改变和创新

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"8月20日消息,在"},{"type":"link","attrs":{"href":"https:\/\/edc.intel.com\/content\/www\/us\/en\/products\/performance\/benchmarks\/architecture-day-2021\/","title":"","type":null},"content":[{"type":"text","text":"2021年英特尔架构日"}]},{"type":"text","text":"上,英特尔公司高级副总裁兼加速计算系统和图形事业部总经理Raja Koduri连同多位英特尔架构师,介绍了面向CPU、GPU及IPU的重大技术架构改变和创新细节。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c2\/c2dcf16acacc2db6423534a85931ca8a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,英特尔第一次深入介绍了其首个性能混合架构CPU Alder Lake,它不是简单地提供下一代更强大的CPU内核,而是重构了多核架构,集成了两款不同的x86 内核(能效核和性能核)以及硬件线程调度器(在合适的时间把合适的线程分配给合适的内核),基于Intel 7制程工艺打造。据悉,基于Alder Lake的系列产品将在今年开始出货。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/27\/27731379fce88b051945d2b5861f07b2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新的x86性能核(曾用代号 “Golden Cove”),是英特尔迄今为止性能最高的CPU内核,内置AI加速技术,用于学习推理和训练。与第11代酷睿架构(Cypress Cove内核)相比,相同频率下,性能核在一系列工作负载上平均提升了约19%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而新的x86能效核(曾用代号“Gracemont”)为规模化处理而设计,旨在推动每瓦多核性能突破极限。对比英特尔迄今为止最多产的CPU微架构——Skylake,能效核在提供同样的单线程性能时,功耗仅为Skylake的40%不到。与运行四个线程的两个Skylake内核相比,四个能效核在性能提升80%的同时功耗更低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值得一提的是,英特尔通过与微软合作,专门优化了新的 Alder Lake CPU 和硬件线程调度器在Windows11上的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“在整个Windows 11的开发周期中,我的团队一直与英特尔同事合作,改进并优化我们即将推出的操作系统,使其充分利用‘性能混合’架构的优势,尤其是硬件线程调度器。大部分工作围绕操作系统线程调度程序展开,内核组件决定运行哪些线程在哪里运行。”微软Windows内核团队开发经理Mehmet Iyigun介绍道,除了线程调度之外,Windows 11还利用硬件线程调度器的提示决定挂起或唤醒哪些内核,以节省电量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在GPU方面,英特尔很早就与开发者和游戏引擎厂商展开合作,共同为游戏发烧友设计新的独立GPU。本次架构日英特尔发布了全新的独立显卡微架构Xe HPG,专为游戏和创作工作负载提供发烧级的高性能。基于Xe HPG的Alchemist Soc(之前代号为DG2)将于明年第一季度上市,并采用新的品牌名英特尔®锐炫™。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而另一个面向百亿亿次计算的GPU——Ponte Vecchio,是英特尔至今最复杂的SoC,基于Xe HPC微架构,并采用多种先进的半导体制程工艺、英特尔的EMIB技术以及Foveros 3D封装技术,包含1000亿个晶体管,提供业界领先的浮点运算和计算密度。英特尔官方甚至将其比喻为“堪比登月难度创新后的一款产品”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f5\/f5aaa4e796a42dc14a37011c31830334.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在架构日上,英特尔表示,早期的 Ponte Vecchio 芯片展示了领先的性能,在流行的 AI 基准测试中创造了推理和训练吞吐量的行业记录。比如A0芯片性能提供了高于45 TFLOPS的FP32吞吐量,高于5 TBps的内存结构带宽,以及高于2 TBps的连接带宽。同时,英特尔分享了一段演示视频,展示了ResNet推理性能超过43000张图像\/秒和超过每秒3400张图像\/秒的ResNet训练,并且这两项性能都有望实现行业领先。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ponte Vecchio已走下生产线进行上电验证,并已开始向客户提供限量样品。Ponte Vecchio预计将于2022年面向HPC和AI市场发布。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,英特尔还发布了专为数据中心设计的下一代处理器Sapphire Rapids,它的核心是一个模块化的分区SoC架构,具有异构计算基础设施的架构基础,并搭配最高的计算密度和内存带宽。Sapphire Rapids也是基于Intel 7制程工艺技术,并采用新的性能核微架构。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IPU方面,英特尔与一家云服务提供商合作架构了Mount Evans来减轻基础设施负载。Mount Evans也是英特尔首款专用ASIC IPU,融合了多代FPGA SmartNIC的经验。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最后,英特尔还提及了"},{"type":"link","attrs":{"href":"https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/oneapi.html#gs.9d5un2","title":"","type":null},"content":[{"type":"text","text":"oneAPI"}]},{"type":"text","text":",这是英特尔在2019年推出的开源软件解决方案,它可以提供单一、开放和统一的编程模型,能够简化跨不同架构的开发工作。英特尔亦提供了商用部署的完整oneAPI堆栈,包括基本的oneAPI基础工具包,它在规范语言和库之外增加了编译器、分析器、调试器和移植工具。据Raja介绍,自从2020年12月发布第一个版本以来,超过20万名开发者在获得Xe HPC之前就已经安装了英特尔的oneAPI产品,市场上有超过300个采用oneAPI统一编程模型的应用软件。今年5月份发布的1.1版临时规范为深度学习工作负载和高级光线追踪库添加了新的图形接口,预计正式的1.1版将在年底完成。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章