異構內存及其在機器學習系統的應用與優化

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四範式深耕於人工智能領域,在人工智能相關算法、應用、系統和底層架構設計等有兼具廣度和深度的理解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着近幾年先進存儲技術的飛速發展,湧現出了具有顛覆性的存儲技術,比如非易失性存儲、SSD等。基於此類技術的異構內存架構,正在顛覆傳統應用程序的設計和優化模式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四範式在異構內存架構上搶先佈局,進行了若干創新性探索研發和落地實踐,比如參數服務器[ 第四範式推出業界首個基於持久內存、支持毫秒級恢復的萬億維線上預估系統:","attrs":{}},{"type":"link","attrs":{"href":"https://www.163.com/tech/article/FGCFSO4N00099A7M.html","title":null,"type":null},"content":[{"type":"text","text":"https://www.163.com/tech/article/FGCFSO4N00099A7M.html","attrs":{}}],"marks":[{"type":"underline"}]},{"type":"text","text":" ]、內存數據庫等[ 英特爾、第四範式聯合研究成果入選國際頂會 VLDB 傲騰™ 持久內存加持 優化萬億維特徵在線預估系統:","attrs":{}},{"type":"link","attrs":{"href":"https://newsroom.intel.cn/news-releases/the-joint-research-results-of-intel-and-4paradigm-were-selected-into-the-vldb-international-conference/","title":null,"type":null},"content":[{"type":"text","text":"https://newsroom.intel.cn/news-releases/the-joint-research-results-of-intel-and-4paradigm-were-selected-into-the-vldb-international-conference/","attrs":{}}],"marks":[{"type":"underline"}]},{"type":"text","text":"]。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此篇文章將介紹異構內存架構的技術背景,以及在自動機器學習系統上的技術實踐。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"異構內存架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統上,我們所說的內存一般是指動態隨機存儲,即DRAM。此外,在CPU中還會存在小容量的快速存儲器件,我們一般會稱他們爲CPU緩存(即L1/L2 cache)。具有持久性的慢速存儲器件則構成了外存,比如磁盤等。因此,外存、內存、和CPU緩存,構成了整個存儲架構金字塔。但是,隨着具有革命性意義的非易失性內存技術的商業化落地,使得這個金字塔中的內存不再由DRAM單一組成,而是由DRAM和非易失性內存構成了異構內存架構。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,非易失性內存的出現也模糊了內存和外存之間的功能邊界,使得內存數據持久化成爲了可能。今天,非易失性內存技術已經完全成熟,由英特爾於2019年發佈的英特爾® 傲騰™ 持久內存(簡稱持久內存或者PMem),即是此技術的代表性產品。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fa/fa62f8b155c446a03f5f2c3d011f864d.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 1. 基於異構內存的存儲架構金字塔","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 1顯示了包含有異構內存的存儲架構金字塔。可以看到,在本質上,持久內存處於金字塔中DRAM和外存之間,其在容量、性能、成本都是處於兩者之間。甚至在功能上,它亦是一個DRAM和外存的混合體。它既可以直接當做內存使用(內存模式),也可以當作一個持久化設備使用(App Direct 模式,簡稱AD模式)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在內存模式中,持久內存對操作系統透明,其容量直接反應爲整體的可用內存容量;AD模式則將存儲層級暴露,由開發者完全掌控。因此,由於持久內存的特殊存在,現代內存架構不僅僅是在層級上變得更爲複雜,在功能上也出現了革命性的變化,對於如何利用好異構內存架構,開發人員需要思考更多的問題,比如:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多級存儲的優化。持久內存提供了一個性能接近於DRAM,但是成本更低的內存方案,非常有利於對於內存消耗巨大的應用。但是,多級存儲架構的引入也爲性能優化帶來了更高的挑戰。我們知道,高性能緩存在性能調優中有重大意義。一方面現實數據中往往存在熱點,緩存可以有效提升熱點數據的訪問性能;另一方面,緩存敏感數據結構(cache","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"conscious)爲了壓榨硬件性能,常常有精巧的設計。那麼,持久內存的出現使得這個存儲層級更爲複雜,對多級緩存機制、數據結構和算法的設計都提出了更高的要求。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持久化機制的利用。持久內存使得外存不再是存儲數據的唯一選擇。持久內存提供了遠比傳統外存器件更高的持久化性能,但是其容量相對較小。在某些場景中如何有效的發揮高性能持久化的特點,成爲了應用落地需要思考的新問題。比如,對於需要全天候保證服務質量的在線服務應用,內存數據持久化即能提供離線以後的快速恢復能力;另外,原本磁盤IO爲性能瓶頸的場景,也可以利用持久內存來作爲存儲介質,來提升整體系統性能。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了讓大家進一步瞭解異構內存架構如何在實際場景中發揮價值,我們將拋磚引玉,分享第四範式在異構內存架構上的實踐經驗。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"自動機器學習系統在異構內存上的優化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d22b58ef11465d1475478b86d59e7f2.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 2顯示了一個第四範式產品中一個典型的自動機器學習(AutoML)全流程。其主體上包含了離線探索以及線上推理部分。離線探索通過自動特徵工程和模型訓練,產出可以上線的特徵工程腳本以及模型。線上推理服務在接受到用戶請求以後,經過實時特徵抽取和模型推理,拿到預測結果。同時消息隊列在整個系統中起到了數據蒐集和分發的關鍵作用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從表格 1可以看到,在異構內存架構下,持久內存在不同組件中有不同的使用方法,從而達到不同的優化目的。總體來說,內存模式可以用來實現快速的低成本內存容量擴展,AD模式則帶來了更多的益處,包括快速恢復能力、提升數據存儲性能等。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/10e39e8bc36008ab391ece2ccb5dd07a.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四範式已經將基於異構內存優化的關鍵技術組建進行了解耦,並且貢獻到了開源社區,目前主要包含兩個項目:高性能消息隊列系統Pafka(","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/4paradigm/pafka%EF%BC%89%EF%BC%8C%E4%BB%A5%E5%8F%8A%E9%92%88%E5%AF%B9AI%E8%B4%9F%E8%BD%BD%E4%BC%98%E5%8C%96%E7%9A%84%E9%AB%98%E6%80%A7%E8%83%BDKV%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E","title":null,"type":null},"content":[{"type":"text","text":"https://github.com/4paradigm/pafka),以及針對AI負載優化的高性能KV存儲引擎","attrs":{}}],"marks":[{"type":"underline"}]},{"type":"text","text":" PmemStore(","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/4paradigm/pmemstore%EF%BC%89","title":null,"type":null},"content":[{"type":"text","text":"https://github.com/4paradigm/pmemstore)","attrs":{}}],"marks":[{"type":"underline"}]},{"type":"text","text":" 。以下主要展開介紹Pafka。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Pafka:基於異構內存優化的高性能消息隊列系統","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka是一個開源的分佈式事件流/消息隊列系統,用於高效,可靠地處理實時數據流,在工業界中有非常廣泛的落地應用場景。 但是,由於其持久化邏輯的存在,其性能(吞吐和延遲)常常受到外存設備(HDD/SSD)的制約。在實際使用場景中,爲了增加 Kafka 集羣的總體吞吐量,企業不得不擴大集羣規模,增加了企業的總成本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持久內存具有高速持久化的特性,能達到幾倍甚至幾十倍於傳統硬盤和SSD的持久化性能。因此,基於異構內存架構的 Kafka 的優化版本 — Pafka,正是利用了高速持久化的特性,大幅提升單節點吞吐,從而優化在集羣上的總投入成本。總體來說,相比較於傳統的Kafka解決方案,Pafka帶來了如下優勢:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比較於目前數據中心常見的 SATA SSD 的配置,基於異構內存的Pafka改進節點吞吐和延遲均達20倍。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於大幅提升了節點吞吐,因此在集羣規模總投資上,相比較於 Kafka,Pafka可以減少硬件投入成本 10 倍以上。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pafka直接基於Kafka優化,用戶原有的基於 Kafka 的業務代碼無需修改,可以零代碼改造成本遷移到Pafka系統。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們對於 Kafka 的優化集中於造成性能瓶頸的數據落盤部分。原Kafka原有的架構中,數據持久化只發生在外存(磁盤/SSD)這一層級;經過優化以後的Pafka版本,基於異構內存架構,同時把持久內存和外存用來做數據持久化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具備高性能持久化能力的持久內存作爲持久化層級的第一級,而容量更大但性能較差的外存則作爲第二級持久化介質,兩者通過一定的緩存機制進行管理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於消息隊列的生產者/消費者的使用模式,大部分場景下數據的存取都會發生在高性能的持久內存中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fa/fac4bd8c5fcaaa7470fb232691af7ba2.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 3. Pafka集羣架構如圖 3所示,一個 Kafka 服務器集羣由幾個至上百上千個的 brokers組成。Brokers 內部劃分爲了不同的 partitions,進一步劃分爲 segments,來進行消息存儲。我們對於 Kafka 的改造主要集中在 segment 的存儲數據結構上的改造。原來的 segment 只能存儲在 HDD/SSD 等外存設備上,我們使用 PMDK 來進行基於異構內存的持久化操作,引入 MixChannel 的概念,來實現 segment 既能存儲在 HDD/SSD 的外存設備,也能在持久內存上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體來說,MixChannel將普通的文件接口和持久內存的接口統一管理,其底層存儲介質對於上層組件是透明的。爲了支持基於持久內存的存儲,我們爲MixChannel引入了數據結構PMemChannel,其主要功能是把持久內存的MemoryBlock對象封裝成滿足FileChannel API的接口,從而可以讓MixChannel方便的選擇基於傳統文件的FileChannel接口,還是基於持久內存的PMemChannel。這裏我們使用了pmdk llpl的PersistentMemoryBlock,會自動爲每次寫入的數據進行持久化。同時,爲了支持zero-copy,我們還爲llpl的MemoryBlock,通過直接映射持久內存的地址到ByteBuffer,實現了zero-copy的ByteBuffer接口,從而避免了內存的多次拷貝,提升性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了維護segment和持久內存上數據的對應關係,我們爲每個segment分配一個持久內存的MemoryBlock,映射關係通過pmdk pcj的ObjectDirectory來維護。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,爲了避免MemoryBlock在Pafka正常運行時動態分配的開銷,我們會在初始化的時候預先分配固定一定比例的內存池空間,用於寫數據的時候MemoryBlock的快速分配。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"性能比較","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f6/f615e653365c3f06691dde3f381a13e9.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 4顯示,相比較於數據中心中常用的基於SATA SSD進行持久化的Kafka,基於異構內存優化的Pafka在吞吐和延遲的性能表現上均可以達到20倍的改進。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"成本比較","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設我們的目標是提供20 GB /秒的整體吞吐率,我們將異構持久內存的 Pafka 與基於 SATA SSD 的Kafka 進行了比較。圖 5顯示,爲了實現20 GB /秒的總吞吐率,基於 SATA SSD 的服務器和基於異構內存的服務器的數量分別爲 45 和 3。 此外,就硬件成本而言,傳統的Kafka(SATA SSD)需要花費爲 45 萬美元,而我們的Pafka解決方案僅需花費 4.05 萬美元。Pafka解決方案將硬件成本大大降低到傳統Kafka解決方案的9%。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/94/94f353ba533dc35aea9a6548d34089c7.png","alt":"在這裏插入圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 5. 20 GB/sec 吞吐的性能下,Pafka和Kafka方案的成本比較","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"更多信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pafka爲第四範式的開源項目,具體使用方式、技術支持、以及完整性能報告可以通過以下渠道瞭解更多:-代碼Github repo:","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/4paradigm/pafka","title":null,"type":null},"content":[{"type":"text","text":"https://github.com/4paradigm/pafka","attrs":{}}],"marks":[{"type":"underline"}]},{"type":"text","text":"-Slack channel:","attrs":{}},{"type":"link","attrs":{"href":"https://join.slack.com/t/memarkworkspace/shared_invite/zt-o1wa5wqt-euKxFgyrUUrQCqJ4rE0oPw","title":null,"type":null},"content":[{"type":"text","text":"https://join.slack.com/t/memarkworkspace/shared_invite/zt-o1wa5wqt-euKxFgyrUUrQCqJ4rE0oPw","attrs":{}}],"marks":[{"type":"underline"}]},{"type":"text","text":"-MemArk 異構存儲技術論壇:","attrs":{}},{"type":"link","attrs":{"href":"https://discuss.memark.io/","title":null,"type":null},"content":[{"type":"text","text":"https://discuss.memark.io/","attrs":{}}],"marks":[{"type":"underline"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章