Apache Kudu在網易的實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要介紹Apache Kudu及在網易實時數據採集、維表數據關聯、實時數倉ETL、ABtest等場景的實踐應用。主要內容包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統概述:認識kudu,理解Kudu的系統設計與定位"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生產實踐:分享網易內部的典型使用場景"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遇到的問題:實際使用過程中遇到的問題和問題的排障過程"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"功能展望:對Kudu功能特性的展望"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kudu定位與架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu是一個存儲引擎,可以接入Impala、Presto、Spark等Olap計算引擎進行數據分析,容易融入Hadoop社區。Kudu整合了隨機讀寫和大數據分析能力,具有低延遲的隨機讀寫能力和高吞吐量的批量查詢能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與HBase、Casandra不同,Kudu要求聲明Schema。Schema可以爲上層計算引擎提供更多元數據,進行計算優化。Kudu的每個字段有主鍵、列名和列類型。拿到列類型信息後能夠對不同列進行編碼和壓縮,優化存儲空間,減少磁盤開銷。Kudu支持bitshuffle、運行長度編碼、字典編碼等列編碼方式,這些編碼會根據列的類型不同做不同設計。比如對於重複值多、重複值變化不大的數據的壓縮率很好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu使用列式存儲給Kudu帶來了如下特性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 存儲上可以節約空間"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 可以對查詢做更多優化,如將過濾條件下推到kudu執行,節約計算資源"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 支持向量化操作"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Kudu的Schema和列存"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f8\/f8ccc966df5663eda813dc5635a26e44.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu數據存儲在Table中,Tablet是Kudu的讀寫單元,Table內的數據會劃分到各個Tablet進行管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建Table時,需要指定Table的分區方式。Kudu 提供了兩種類型的分區方式range partitioning ( 範圍分區 ) 、 hash partitioning ( 哈希分區 ),這兩種分區方式可以組合使用。分區的目的是把Table內的數據預先定義好分散到指定的片數量內,方便Kudu集羣均勻寫入數據和查詢數據。範圍分區支持查詢時快速定位數據,哈希分區可以在寫入時避免數據熱點,可以適應各個場景下的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0a\/0ab16baabab778ab4bcb41480c8a09a9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu有管理節點(Master)和數據節點(Tablet Server)。管理節點管理元數據,管理表到分片映射關係、分片在數據節點內的位置的映射關係,Kudu客戶端最終會直接鏈接數據節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/54\/54be453b9c677971269934bc00de0e9f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu作爲分佈式系統,爲了保障數據可用性和高可用,支持多副本。Kudu 使用 Raft 協議來實現分佈式環境下副本之間的數據一致性。Raft算法數據不依賴其他存儲和文件系統,優勢在於可以保證服務高可用、服務可用性、一致性的均衡。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Kudu的update設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Olap中對update的設計會影響到Olap性能。update操作可能引發數據多版本問題和update引發的數據merge問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/96\/9602a35d3b80494aac667634738c82be.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8a\/8a52d89f9e118ec3d5899cf0c7cfc560.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tablet是Kudu數據讀寫單元,Tablet下更細分的數據存儲單元是 RowSet。RowSet有兩種, 分別是MemRowSet 和 DiskRowSet,不同RowSet維護了不同組件範圍內的數據。內存中的 MemRowSet 在到達一定大小後會刷盤成爲DiskRowSet。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/44\/44d4a0c6f6af9cb15a9fe9e43d671f29.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8f\/8f9637e652fa079eba657f7e6fb35666.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kudu把更新操作當作一條新操作,而不是寫一條新日誌。更新操作是Undo\/Redo記錄,這些內存中的更新操作會被整合爲DeltaMemstore持久化。Base數據、Undo數據、Redu數據寫在同一個RowSet中。這樣的存儲設計優點是可以在更新時候快速找到數據,缺點是查詢時需要確認查詢的主鍵在哪個RowSet位置中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu也使用了LSM的結構。Kudu的comopaction有多種:MinorDeltaCompaction、MajorDeltaCompaction、MergingCompaction。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/02\/02b2b0eb4fa447e88d4c98f07a90195b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kudu的update是一個多版本操作,目的是寫入和讀取時互相不干擾、不需要讀時額外加鎖。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"小結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu Update設計特點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 更新已經flush的數據和寫入新數據走不通的處理邏輯,原始數據和更新位於同一個Rowset,不用跨Rowset進行merge"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 通過base數據的RowID和更新時間戳作爲REDO\/UNDO數據的key,讀取更新高效"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• Key大小固定,存儲和比較效率高"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 不需要查詢出主鍵數據也能獲取更新數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 在大多數使用場景下能夠實現更高效的讀取"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 如果返回的結果不要求順序,直接從RowSet中讀出數據,不用merge"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 如果更新較少,REDO會快速merge到base數據,這時在讀取最新數據時,可以不進行apply REDO的操作"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"生產實踐"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實時數據採集場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時數據分析中,一些用戶行爲數據有更新的需求。沒有引入Kudu前,用戶行爲數據會首先通過流式計算引擎寫入HBase,但HBase不能支撐聚合分析。爲了支撐分析和查詢需求,還需要把HBase上的數據通過Spark讀取後寫入其他OLAP引擎。使用Kudu後,用戶行爲數據會通過流式計算引擎寫入Kudu,由Kudu完成數據更新操作。Kudu可以支持單點查詢,也可以配合計算引擎做數據分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/85\/8573d80c80b780e4c48fc9f8a4d8c86b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"維表數據關聯應用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些場景中,日誌的事件表還需要和MySQL內維度表做關聯後進行查詢。使用Kudu,可以利用NDC同步工具,將MySQL中數據實時同步導入Kudu,使Kudu內數據表和MySQL中的表保持數據一致。這時Kudu配合計算引擎就可以直接對外提供結果數據,如產生報表和做在線分析等。省去了MySQL中維度表和數據合併的一步,大大提升了效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f1\/f1ca94a022a372f8bec98de6939ea111.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實時數倉ETL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu作爲分佈式數據存儲引擎,可以和Hadoop生態更好結合,因此在生產中我們採用了使用Kudu替換Oracle的做法,提升了擴展性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1e\/1e98c2695467d478dd9f275f192a658f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"ABTEST"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們的ABTest業務中有兩種日誌,行爲日誌和用戶分流日誌。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/52\/52f5d29f860b087f4750bc6093c65169.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"架構升級前,我們採用了比較傳統的模式,將用戶行爲日誌和用戶分流日誌分別寫入HDFS作爲存儲的ODS層,通過Spark做清洗、轉換後導入HDFS作爲存儲的DWD層,再通過Spark進行一步清洗、按照時間或其他緯度做簡單聚合後寫入DWS層。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個架構的問題是數據產出時間比較長,數據延遲在天級別。業務方需要更及時地拿到ABTest結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bb\/bbad50d504b15eb7376ee98e13bd3a0d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 架構升級後,使用Kafka作爲ODS、DWD層存儲。Flink在ODS層數據的基礎上繼續做一層整理和過濾,寫入DWD形成明細表數據;DWD層在Flink中做簡單聚合後寫入DWS層,Kudu在DWS層作爲數據存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink開窗口實時修正實驗數據,這一操作在Kudu完成;超出了Flink時間窗口的數據更新則由離線補數據的操作在Kudu中完成修正。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"架構升級後,數據延遲大大降低,能夠讓ABTest業務方更實時地拿到結果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"我們遇到的問題 "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"問題1: 節點負載不均衡"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些大表場景下會有負載不均衡問題。Kudu不會把range下的哈希分片當作一張表,而是把整個表的分片當成了平等的表進行處理。而在真實使用場景中,range基本是時間字段;需要讓range的hash分片更均勻地分佈在各節點上,防止數據傾斜。下圖是數據傾斜的情況展示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e8\/e84403b0e6b63f6b9e930a953107b3ec.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的解決方案是實現了一套優化版本的負載均衡算法,這個算法能夠把range表當作單獨的表做負載均衡,解決了數據傾斜。下圖是優化後效果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3b\/3bb9a9acd084019a3f2ad44ddc370dc5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"問題2: 表結構設計複雜"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"問題3: 沒有二級索引,只能通過控制主鍵順序和分區鍵來優化某幾種查詢模式"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"問題4: 創建表時需要根據業務場景專門設計表結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"問題2-4,對業務方要求比較高,經常需要專人介入引導業務方導入數據。爲了解決問題,我們內部設計了二級索引來解決上述問題。二級索引可以滿足查詢性能的要求,同時減少用戶設計表時候的複雜度:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過支持二級索引來優化包含非主鍵列過濾的查詢"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持二級索引能夠降低業務設計表結構的複雜度"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"社區對二級索引的支持進度KUDU-2038:Add b-tree or inverted index on value field"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kudu功能展望"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"BloomFilter"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BloomFilter成本較低、效率較高。Join場景下,小表動態生成BloomFilter下推到存儲層,防止大表在Join層做數據過濾。最近的Kudu中已經支持了BloomFilter作爲過濾條件。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"靈活分區哈希"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu每個range的hash bucket數量是固定的。考慮到時間和業務增長,在項目實施前期階段要給Kudu哈希桶數量設置略大,但是數據量較小的場景下過大的分片個數對資源是一種浪費,社區也不推薦hash bucket設置得比較大。期望後續Kudu可以更靈活地適配hash bucket數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" > KUDU-2671:Change hash number for range partitioning"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"多行事務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kudu暫時不能支持多行事務。目前更新主鍵需要業務自己實現邏輯檢測。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"> KUDU-2612:Implement multi-row transactions"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Flexible Schema"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些業務場景下業務沒有唯一主鍵,但只希望利用Kudu的大批量寫入、聚合分析查詢的特性。接入業務時Kudu對Schema的要求比較高,一些業務場景無法支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"> KUDU-1879:Support table without a primary key"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:Datafuntalk(ID:datafuntalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/mH2bRoMhBYnNHr6gtbxGPA","title":"xxx","type":null},"content":[{"type":"text","text":"Apache Kudu在網易的實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章