Arm 與 x86 CPU 在雲計算中的性能分析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"隨着基於 Arm 的高性能 CPU 越來越多地應用於移動設備之外,對於開發者來說,瞭解 Arm 在常見服務端軟件堆棧中的性能特徵至關重要。 本文將使用 AWS 的 Arm(Graviton2) 和 x86_64 (Intel) EC2 實例來評估不同軟件運行時,包括Docker、Node.js 和 WebAssembly 的計算性能。 結論是,Arm 在雲中更具成本效益,尤其是在與底層操作系統接近的輕量級運行時中。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"麻省理工學院教授 Leiserson 和 Thompson 等人在最近在 Science 雜誌發表的一篇"},{"type":"link","attrs":{"href":"https:\/\/science.sciencemag.org\/content\/368\/6495\/eaam9744","title":null,"type":null},"content":[{"type":"text","text":"研究論"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"文《There’s plenty of room at the Top: What will drive computer performance after Moore’s law?》討論了當今計算機工程中最重大的挑戰之一,摩爾定律走向終結。像 GPU、CPU 這樣的計算機硬件已經達到了量子極限,無法再讓其更快或更小。這阻礙了40年來以生產力和經濟增長爲動力的技術創新。我們所熟知的技術革命要就此終結了嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"論文對計算機技術的未來持樂觀態度。作者認爲,軟件上的提升可以代替摩爾定律,並在未來幾年推動生產率的增長。爲了說明這一點,他們證明了將機器學習算法從 Python 改用 C\/本機代碼重寫,可以提高 60000倍的性能!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"然而,我們不能放棄現代軟件運行時及其帶來的開發者生產力的提升。1990年代的前 Java 時代,用編譯過的本地代碼運行每個應用程序。如今的開發者依賴於高級編程語言、工具,特別是內存安全和可移植的運行時,來交付高質量的軟件產品。根據"},{"type":"link","attrs":{"href":"https:\/\/science.sciencemag.org\/content\/368\/6495\/eaam9744","title":null,"type":null},"content":[{"type":"text","text":"這篇論文"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"作者的說法,軟件性能工程採用的方法是消除軟件臃腫,並將軟件定製到更高效的硬件上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"本文中,我們將通過在雲計算場景中採用輕量級、高效的軟硬件基礎設施來評估性能的提升。具體而言,我們將在基於 Arm 的高能效 CPU (AWS Graviton2)和 Intel x86 CPU 上運行幾個輕量級的軟件運行時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"出於此次研究的目的,我們將重點放在單線程性能上。大多數 web 應用框架默認運行“每個請求一個線程”。從用戶的角度來看,web 服務的性能很可能受到單個 CPU 執行速度的限制。這是一個有意簡化的測試用例,用於演示原始的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們選擇的 benchmark 如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"下面兩個 benchmark 評估"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"冷啓動"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"nop 測試啓動應用程序環境並退出。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"cat-sync 測試打開一個本地文件,向其中寫入128KB 的文本,然後退出。它評估進行操作系統調用時的性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"下面四個 benchmark 來自《"},{"type":"text","marks":[{"type":"underline"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Computer Languages Benchmarks Game"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"》。《Computer Languages Benchmarks Game》爲超過25種編程語言提供了各種 benchmark 程序。程序啓動後,這四個 benchmark 評估"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"運行時"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/benchmarksgame-team.pages.debian.net\/benchmarksgame\/performance\/nbody.html","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"nbody"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 重複了5千萬次,是一個 n-body 模擬。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/benchmarksgame-team.pages.debian.net\/benchmarksgame\/performance\/fannkuchredux.html","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"fannkuch-redux"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 重複12次,測量索引訪問整數數列。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/benchmarksgame-team.pages.debian.net\/benchmarksgame\/performance\/mandelbrot.html","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"mandelbrot"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 重複15000次,生成可移植的曼德布洛特集位圖(Mandelbrot set portable bitmap)文件。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/benchmarksgame-team.pages.debian.net\/benchmarksgame\/performance\/binarytrees.html","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"binary-trees"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 重複21次,分配和釋放大量的binary tree。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"接下來,讓我們看看確切的測試設置和一些性能數字!所有測試用例的源代碼和腳本都可以在 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/second-state\/wasm32-wasi-benchmark","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"GitHub"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 上獲取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"源代碼和腳本:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/second-state\/wasm32-wasi-benchmark","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/second-state\/wasm32-wasi-benchmark"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"軟件臃腫問題改善"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了保證軟件的安全性、安全性和跨平臺可移植性,我們在容器和虛擬機中運行 benchmark 測試。最流行的容器運行時之一是 Docker,它已經對性能進行了優化。爲了評估軟件堆棧性能,我們在 AWS"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t3\/","title":null,"type":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"underline"}],"text":"t3.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 實例上運行了以下測試用例, AWS"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t3\/","title":null,"type":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"underline"}],"text":"t3.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 實例中有一個由2個 vCPU 組成的物理 CPU 核。我們使實例閒置的時間足夠長,以積累足夠的 CPU "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#ff3633","name":"user"}}],"text":"值"},{"type":"text","text":",從而在整個外部性能測試中維持100% 的 CPU 爆發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"測試用例 #1:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了模擬 web 應用的性能,我們將 benchmark 測試作爲運行在 Docker 內部的 Node.js JavaScript 應用程序運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"測試用例 #2:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們還在 Docker 中使用 Ubuntu Server 20.04 LTS 運行 benchmark 測試 C\/C + + 本地應用程序。這種情況有點不現實,因爲很少有人能夠將他們的應用程序編譯成單一的二進制可執行文件,並且忽略了像 Node.js 這樣的運行時提供的工具和庫的生態系統。但是這個用例可以作爲我們在 Docker 下能達到的性能的比較點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"測試用例 #3:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們在"},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","text":" WebAssembly 虛擬機 "},{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 中運行 benchmark 測試。這些程序是用 Rust 編寫的,並且編譯成了 WebAssembly 字節碼。SSVM 提供了運行時安全性、基於功能的安全性、可移植性以及與 Node.js 的集成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"基於能力的安全性要求應用程序擁有並顯示授權 token 以訪問受保護的資源。在使用 SSVM 的情況下,應用程序必須明確聲明資源,例如文件系統文件夾,它需要在啓動時訪問這些資源。這個設計稱爲"},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/articles\/wasi-access-system-resources\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":" "}]},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/articles\/wasi-access-system-resources\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"underline"}],"text":"WebAssembly 系統接口 (WASI)"}]},{"type":"text","text":"。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們用於測試用例的軟件堆棧如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"運行在 EC2 實例的 Amazon Linux 2"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Docker 19.03.6-ce"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Docker 內部的 Ubuntu Server 20.04 LTS"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Docker 內部的 Node.js v14.7.0"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"本機可執行文件是由 LLVM 10.0.0和 Clang 工具鏈編譯的。在 Intel 架構中,我們使用了 Clang 的 -O3 flag(參見本節)。在 Graviton2上,我們使用 AWS "},{"type":"link","attrs":{"href":"https:\/\/github.com\/aws\/aws-graviton-getting-started\/blob\/master\/c-c++.md","title":null,"type":null},"content":[{"type":"text","text":"推薦的 LLVM 優化設置 "}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"-march=armv8.2-a+fp16+rcpc+dotprod+crypto (見下一節)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"使用 AOT (Ahead-of-Time 編譯器)優化的 SSVM 0.6.4"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Intel 架構的結果如下圖所示。所有數字表示以秒爲單位的執行時間。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"數字越小表示性能越好。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"注意: SSVM 冷啓動比 Docker 快幾個數量級,因此我們將冷啓動時間乘以50倍,以方便查看。對於只是偶爾調用的微服務來說,冷啓動問題尤其重要。例子包括許多響應偶爾事件的 serverless 函數,每個函數調用都可能涉及運行時的冷啓動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e8\/ee\/e8a06b625bb866d89yy058925d98d9ee.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"關鍵要點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 冷啓動時間小於20毫秒,而 Docker 需要700毫秒或更多。"},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 顯然快30倍。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於計算密集型的運行時任務,Docker + native 和 "},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 約比 Docker + Node.js 快兩倍。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Docker + native 是一個糟糕的選擇,因爲它的性能沒有 "},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"underline"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"好,同時也損失了 Node.js 和 JavaScript 的生態系統優勢。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SSVM 中的程序甚至比本機代碼運行得更快。這是如何實現的呢?SSVM 在運行時採用了提前(Ahead-of-Time,AOT)編譯技術。它允許編譯器專門針對當前運行的機器進行優化,而不是針對整個 CPU 架構類型進行通用優化。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從操作系統級容器"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如 Docker )切換到"},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/articles\/why-webassembly-server\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":" 應用級虛擬機"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" (例如,"},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":") 可以顯著提高性能。雖然 SSVM 確實需要對 Node.js 應用程序進行一些修改,但仍然可以爲開發者提供運行時安全性、可移植性和完整的 Node.js 生態系統。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"高效的硬件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Leiserson 和 Thompson 等教授提出的軟件性能工程解決方案並不僅僅是消除現有軟件臃腫,還要求更好地利用硬件設備的軟件。經過多年在高度受限的計算環境中的迭代設計,例如在手機上,Arm 架構提供了一個高效運行通用軟件程序的獨特機會。AWS 在優化 Arm 的服務器端虛擬化方面做出了開創性的工作,基於 AWS Gravtion2 的 Amazon EC2 實例非常有潛力進一步提高 web 應用程序性能。在本節中,我們重複了AWS"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t3\/","title":null,"type":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"underline"}],"text":"t3.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" (x86) 上和"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t4\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"t4g.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" (基於Arm的 Graviton2) 實例 的 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/second-state\/wasm32-wasi-benchmark","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"benchmark 測試"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。它們配置相似,但 "},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t4\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"t4g.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" (Arm)每小時的成本便宜約24%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t3\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"t3.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 實例類型採用 Intel Xeon Platinum 8000 系列處理器(x86_64) ,持續的 Turbo CPU 時鐘速度最高可達3.1 GHz。t3 實例提供了2個運行在1個物理內核上的 vCPU 和2GB 內存。每小時花費 0.0208 美元。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/instance-types\/t4\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"t4g.small"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 實例類型採用 AWS"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/ec2\/graviton\/","title":null,"type":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"underline"}],"text":"Graviton2"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" CPU (Arm64 或 aarch64) ,時鐘速度爲2.5 GHz。t4g 實例提供了2個運行在2個物理內核上的 vCPU 和2GB內存。每小時花費 0.0168 美元。"}]}]}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"AWS Graviton2 處理器爲多線程應用程序提供了額外的性能優勢。但是,正如我們前面討論的那樣,本文僅測試 benchmark 算法的單個線程實現。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Intel 和 Graviton2 的 benchmark 測試結果如下圖所示。所有數字都表示以秒爲單位的執行時間。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"數字越小表示性能越好"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/56\/38\/567de62717e25caab4be07bc5b787038.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"考慮到 EC2 中不同類型 CPU 的 CPU 時間和每小時費率,成本效益方面的 benchmark 結果如下圖所示。所有數字都是以0.001美分爲單位運行 benchmark 操作的成本。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"數字越小表明單位成本的性能越好"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c0\/f9\/c02e4fd4186ce45c86bb6f0a88ab4ef9.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 關鍵要點:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在這兩個 CPU 平臺上,"},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 仍然以比 Docker 快100倍的冷啓動時間,比 Docker + Node.js (即 mandelbrot 基準)快5倍的運行時速度獲得最佳性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"與 Intel x86 CPU 相比,Graviton2 性價比更佳。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Graviton2 在運行本地二進制代碼時,性能顯著優於 Intel。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Graviton2 和 Intel 之間的 Node.js 和 SSVM 的性能比較難分上下。但是考慮到 Graviton2 實例便宜24% ,Gravtion2 在成本性能比方面遙遙領先。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們假設基本的 Linux 操作系統已經針對 ARM CPU 進行了優化,允許本地二進制文件充分從 Graviton2 的性能特性中獲益。然而,由於 Arm 在服務器和雲空間中的相對新穎性,在棧中處於更高位置的框架和運行時軟件,如 Node.js 和 "},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",並沒有專門針對 Arm CPU 進行優化。Arm 版本的服務器端軟件仍然有很大的改進空間。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"我們得到的經驗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在這篇文章中,我們比較了在不同的計算機體系結構上常用的算法和 web 應用程序任務。我們還比較了傳統棧 Docker、 Node.js 與新棧 "},{"type":"link","attrs":{"href":"https:\/\/www.secondstate.io\/ssvm\/","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"SSVM"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" (WebAssembly)的性能,並觀察了性能提升最高100倍(冷啓動部分)和最高5倍(運行時部分)的情況。但這依然還有很大的改進空間,特別是對基於 Arm 的 CPU 的軟件優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在後摩爾定律時代,技術可以通過改進軟件組合繼續引導社會生產力的增長。在頂部有很大的提升空間!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/articles\/arm-vs-x86-cloud-performance\/","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/www.infoq.com\/articles\/arm-vs-x86-cloud-performance\/"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章