Flink+ 數據湖 Iceberg 的體驗

原創

Qunar技术沙龙

2021-07-04 11:03

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"本文作者：餘東，2021年加入Qunar，主要負責數據平臺Flink的運維與平臺開發。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/52/527d8ef75bb65347aa9378b5d0c9c122.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"本文導讀","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/62/6293d2b03e41380278fec2504e82b0d0.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一. 背景及特點","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在使用 Flink 做實時數倉以及數據傳輸過程中，遇到了一些問題：比如 Kafka 數據丟失，Flink 結合 Hive 的近實時數倉性能等。Iceberg 0.11 的新特性解決了這些業務場景碰到的問題。對比 Kafka 來說，Iceberg 在某些特定場景有自己的優勢。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 原架構方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原先的架構採用 Kafka 存儲實時數據。然後用 Flink SQL 或者 Flink datastream 消費數據進行流轉。內部自研了提交 SQL 和 Datastream 的平臺，通過該平臺提交實時作業。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 痛點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 存儲成本高且數據量大。Kafka 由於壓力大將數據過期時間設置的比較短，當數據產生反壓，積壓等情況時，如果在一定的時間內沒消費數據導致數據過期，會造成數據丟失。Flink 在 Hive 上做了近實時的讀寫支持。爲了分擔 Kafka 壓力，將一些實時性不太高的數據放入 Hive，讓 Hive 做分鐘級的分區。但是隨着元數據不斷增加，Hive metadata 的壓力日益顯著，查詢也變得更慢，且存儲 Hive 元數據的數據庫壓力也變大。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二. 背景及特點","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/ae35d4d52301ec45f9ee60bdcaa57cb7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 術語解析","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據文件 ( data files )","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iceberg 表真實存儲數據的文件，一般存儲在data目錄下，以\".parquet\"結尾。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"清單文件 ( Manifest file ）","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每行都是每個數據文件的詳細描述，包括數據文件的狀態、文件路徑、分區信息、列級別的統計信息（比如每列的最大最小值、空值數等）、通過該文件、可過濾掉無關數據、提高檢索速度。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快照（ Snapshot ）","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快照代表一張表在某個時刻的狀態。每個快照版本包含某個時刻的所有數據文件列表。Data files 是存儲在不同的 manifest files 裏面， manifest files 是存儲在一個 Manifest list 文件裏面，而一個 Manifest list 文件代表一個快照。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. Iceberg 查詢計劃","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢計劃是在表中查找查詢所需文件的過程。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據過濾","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"清單文件包括分區數據元組和每個數據文件的列級統計信息。在計劃期間，查詢謂詞會自動轉換爲分區數據上的謂詞，並首先應用於過濾數據文件。接下來，使用列級值計數，空計數，下限和上限來消除與查詢謂詞不匹配的文件。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Snapshot ID","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個Snapshot ID會關聯到一組manifest files、而每一組manifest files包含很多manifest file。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"manifest files文件列表","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個manifest files又記錄了當前data數據塊的元數據信息，其中就包含了文件列的最大值和最小值，然後根據這個元數據信息，索引到具體的文件塊，從而更快的查詢到數據。","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三. 痛點一：Kafka 數據丟失","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 痛點介紹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常我們會選擇 Kafka 做實時數倉，以及日誌傳輸。Kafka 本身存儲成本很高，且數據保留時間有時效性，一旦消費積壓，數據達到過期時間後，就會造成數據丟失且沒有消費到。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 解決方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將實時要求不高的業務數據入湖、比如說能接受 1-10 分鐘的延遲。因爲 Iceberg 0.11 也支持 SQL 實時讀取，而且還能保存歷史數據。這樣既可以減輕線上 Kafka 的壓力，還能確保數據不丟失的同時也能實時讀取。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 爲什麼 Iceberg 只能做近實時入湖？","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/76/76e965976011aed11935df2a63b39ebe.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① Iceberg 提交 Transaction 時是以文件粒度來提交。這就沒法以秒爲單位提交 Transaction，否則會造成文件數量膨脹；","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 沒有在線服務節點。對於實時的高吞吐低延遲寫入，無法得到純實時的響應；","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ Flink 寫入以 checkpoint 爲單位，物理數據寫入 Iceberg 後並不能直接查詢，當觸發了 checkpoint 纔會寫 metadata 文件，這時數據由不可見變爲可見。checkpoint 每次執行都會有一定時間。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. Flink 入湖分析","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/111c47445828da78b6f74bdba801478d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"組件介紹","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IcebergStreamWriter","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要用來寫入記錄到對應的 avro、parquet、orc 文件，生成一個對應的 Iceberg DataFile，併發送給下游算子；另外一個叫做 IcebergFilesCommitter，主要用來在 checkpoint 到來時把所有的 DataFile 文件收集起來，並提交 Transaction 到 Apache iceberg，完成本次 checkpoint 的數據寫入，生成DataFile。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IcebergFilesCommitter","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲每個checkpointId 維護了一個 DataFile 文件列表，即 map，這樣即使中間有某個 checkpoint的transaction 提交失敗了，它的 DataFile 文件仍然維護在 State 中，依然可以通過後續的 checkpoint 來提交數據到 Iceberg 表中。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. Flink SQL Demo","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/db/db9c5d50749676f9d3e3590381971778.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"5.1 前期工作","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓實時讀寫功能 set execution.type = streaming開啓 table sql hint 功能來使用 OPTIONS 屬性 set table.dynamic-table-options.enabled=true註冊 Iceberg catalog 用於操作 Iceberg 表","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CREATE CATALOG Iceberg_catalog WITH (\" +","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" \" 'type'='Iceberg',\" +","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" \" 'catalog-type'='Hive',\" +","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" \" 'uri'='thrif://localhost:9083'\" +","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" \");","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 實時數據入湖","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"insert into Iceberg_catalog.Iceberg_db.tbl1 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" select * from Kafka_tbl;","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據湖之間實時流轉 tbl1 -> tbl2","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"insert into Iceberg_catalog.Iceberg_db.tbl2 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" select * from Iceberg_catalog.Iceberg_db.tbl1 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" /*+ OPTIONS('streaming'='true', ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"'monitor-interval'='10s',snapshot-id'='3821550127947089987')*/","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"5.2 參數解釋","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"monitor-interval 連續監視新提交的數據文件的時間間隔(默認值：1s)。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"start-snapshot-id 從指定的快照 ID 開始讀取數據、每個快照 ID 關聯的是一組 manifest file 元數據文件，每個元數據文件映射着自己的真實數據文件，通過快照 ID，從而讀取到某個版本的數據。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6. 踩坑記錄","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我之前在 SQL Client 寫數據到 Iceberg，data 目錄數據一直在更新，但是 metadata 沒有數據，導致查詢的時候沒有數，因爲 Iceberg 的查詢是需要元數據來索引真實數據的。SQL Client 默認沒有開啓 checkpoint，需要通過配置文件來開啓狀態。所以會導致 data 目錄寫入數據而 metadata 目錄不寫入元數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PS：無論通過 SQL 還是 Datastream 入湖，都必須開啓 Checkpoint。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7. 數據樣例","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面兩張圖展示的是實時查詢 Iceberg 的效果，一秒前和一秒後的數據變化情況。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"一秒前的數據","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5a/5ae3025dfff9d24c01f76c46ec76f029.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"一秒後刷新的數據","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/04/048fd0deb6492fa501c400f8cddbcac2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四. 痛點二：Flink 結合 Hive 的近實時越來越慢","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 痛點介紹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"選用 Flink + Hive 的近實時架構雖然支持了實時讀寫，但是這種架構帶來的問題是隨着表和分區增多，將會面臨以下問題：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據過多 Hive 將分區改爲小時 / 分鐘級，雖然提高了數據的準實時性，但是 metestore 的壓力也是顯而易見的，元數據過多導致生成查詢計劃變慢，而且還會影響線上其他業務穩定。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫壓力變大隨着元數據增加，存儲 Hive 元數據的數據庫壓力也會增加，一段時間後，還需要對該庫進行擴容，比如存儲空間。","attrs":{}}]}]}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e0e011f5726071ba85419224f9e34043.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f8/f8eddeeb1fbbc5de1dcf0e244cb3f930.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 解決方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將原先的 Hive 近實時遷移到 Iceberg。爲什麼 Iceberg 可以處理元數據量大的問題，而 Hive 在元數據大的時候卻容易形成瓶頸？","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iceberg 是把 metadata 維護在可拓展的分佈式文件系統上，不存在中心化的元數據系統；Hive 則是把 partition 之上的元數據維護在 metastore 裏面(partition 過多則給 mysql 造成巨大壓力)，而 partition 內的元數據其實是維護在文件內的（啓動作業需要列舉大量文件才能確定文件是否需要被掃描，整個過程非常耗時）。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f39678658d98224490f1dd32ae258cf1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五. 優化實踐","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 小文件處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iceberg 0.11 以前，通過定時觸發 batch api 進行小文件合併，這樣雖然能合併，但是需要維護一套 Actions 代碼，而且也不是實時合併的。","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Table table = findTable(options, conf);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Actions.forTable(table).rewriteDataFiles()","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" .targetSizeInBytes(10 * 1024) // 10KB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" .execute();","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iceberg 0.11 新特性，支持了流式小文件合併。通過分區/存儲桶鍵使用哈希混洗方式寫數據、從源頭直接合並文件，這樣的好處在於，一個 task 會處理某個分區的數據，提交自己的 Datafile 文件，比如一個 task 只處理對應分區的數據。這樣避免了多個 task 處理提交很多小文件的問題，且不需要額外的維護代碼，只需在建表的時候指定屬性 write.distribution-mode，該參數與其它引擎是通用的，比如 Spark 等。","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CREATE TABLE city_table ( ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" province BIGINT,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" city STRING","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":") PARTITIONED BY (province, city) WITH (","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 'write.distribution-mode'='hash' ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":");","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. Iceberg 0.11 排序","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.1 排序介紹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Iceberg 0.11 之前，Flink 是不支持 Iceberg 排序功能的，所以之前只能結合 Spark 以批模式來支持排序功能，0.11 新增了排序特性的支持，也意味着，我們在實時也可以體會到這個好處。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"排序的本質是爲了掃描更快，因爲按照 sort key 做了聚合之後，所有的數據都按照從小到大排列，max-min 可以過濾掉大量無效的數據。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/40/40dd0331c7c80871ec093a7f64148523.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2 排序 demo","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"insert into Iceberg_table select days from Kafka_tbl order by days, province_id;","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. Iceberg 排序後 manifest 詳解","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8d/8d7793fb4ba6900d4517e352eef03afd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"參數解釋：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"file_path：物理文件位置。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"partition：文件所對應的分區。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"lowerbounds：該文件中，多個排序字段的最小值，下圖是我的 days 和 province_id 最小值。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"upperbounds：該文件中，多個排序字段的最大值，下圖是我的 days 和 provinceid 最大值。通過分區、列的上下限信息來確定是否讀取 filepath 的文件，數據排序後，文件列的信息也會記錄在元數據中，查詢計劃從 manifest 去定位文件，不需要把信息記錄在 Hive metadata，從而減輕 Hive metadata 壓力，提升查詢效率。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用 Iceberg 0.11 的排序特性，將天作爲分區。按天、小時、分鐘進行排序，那麼 manifest 文件就會記錄這個排序規則，從而在檢索數據的時候，提高查詢效率，既能實現 Hive 分區的檢索優點，還能避免 Hive metadata 元數據過多帶來的壓力。","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相較於之前的版本來說，Iceberg 0.11 新增了許多實用的功能，對比了之前使用的舊版本，做以下總結：","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. Flink + Iceberg 排序功能","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Iceberg 0.11 以前，排序功能集成了 Spark，但沒有集成 Flink，當時用 Spark + Iceberg 0.10 批量遷移了一批 Hive 表。在 BI 上的收益是：原先 BI 爲了提升 Hive 查詢速度建了多級分區，導致小文件和元數據過多，入湖過程中，利用 Spark 排序 BI 經常查詢的條件，結合隱式分區，最終提升 BI 檢索速度的同時，也沒有小文件的問題，Iceberg 有自身的元數據，也減少了 Hive metadata 的壓力。Icebeg 0.11 支持了 Flink 的排序，是一個很實用的功能點。我們可以把原先 Flink + Hive 的分區轉移到 Iceberg 排序中，既能達到 Hive 分區的效果，也能減少小文件和提升查詢效率。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 實時讀取數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過 SQL 的編程方式，即可實現數據的實時讀取。好處在於，可以把實時性要求不高的，比如業務可以接受 1-10 分鐘延遲的數據放入 Iceberg 中，在減少 Kafka 壓力的同時，也能實現數據的近實時讀取，還能保存歷史數據。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 實時合併小文件","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Iceberg 0.11以前，需要用 Iceberg 的合併 API 來維護小文件合併，該 API 需要傳入表信息，以及定時信息，且合併是按批次這樣進行的，不是實時的。從代碼上來說，增加了維護和開發成本；從時效性來說，不是實時的。0.11 用 Hash 的方式，從源頭對數據進行實時合併，只需在 SQL 建表時指定 ('write.distribution-mode'='hash') 屬性即可，不需要手工維護。","attrs":{}}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

前端面試題 - 說一下原型和原型鏈？

前端面試題 - 說一下原型和原型鏈？ JavaScript 中，萬物皆對象，對象分爲普通對象和函數對象。所有的函數都是函數對象（typeof f === 'function'），其他都是普通對象（typeof o === 'object'

2024-04-24 23:51:10

前端面試題 - JS的垃圾回收機制？

前端面試題 - JS的垃圾回收機制？有兩種垃圾回收策略：標記清除：標記階段即爲所有活動對象做上標記，清除階段則把沒有標記（也就是非活動對象）銷燬。引用計數：它把對象是否不再需要簡化定義爲對象有沒有其他對象引用到它。如果沒有引用指向該

2024-04-24 23:51:03

數據結構筆記淺記（十三）哈希表

「哈希表 hash table」，又稱「散列表」，它通過建立鍵 key 與值 value 之間的映射，實現高效的元素查詢。具體而言，我們向哈希表中輸入一個鍵 key ，則可以在 𝑂(1) 時間內獲取對應的值 value 。從本質上看，哈

2024-04-24 23:39:16

一則 TCP 緩存超負荷導致的 MySQL 連接中斷的案例分析

除了 MySQL 本身之外，如何分析定位其他因素的可能性？作者：龔唐傑，愛可生 DBA 團隊成員，主要負責 MySQL 技術支持，擅長 MySQL、PG、國產數據庫。愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯繫小編並註

2024-04-24 23:20:53

離開工位老是忘記鎖屏？試着讓電腦自動完成這事吧！

1.場景說明公司要求離開工位要立刻鎖定電腦屏幕防止信息泄露，但無論是使用鎖屏快捷鍵還是設置觸發角，總感覺不得勁。想想汽車現在基本都是自動鎖車了，電腦它就不能自己鎖屏嗎？於是抽空蒐羅了一些自動化的解決方案，並按照Win和Mac進行分類。

2024-04-24 23:17:17

京東廣告研發 —— 京東推薦廣告排序機制演化

1、序言：廣告排序機制的前世今生 1.1、簡介：廣告排序機制在線廣告是國內外各大互聯網公司的重要收入來源之一，而在線廣告與傳統廣告最大的區別就在於其超大規模的實時競價環境：數以萬計的廣告主在一天內可以參與億級別的流量競拍。在這複雜的實

2024-04-24 23:17:14

高可用 - 隔離原則

前言當討論高可用時，那麼必然有與之對應的低可用甚至不可用，但無論是哪種可用描述，其中都暗含了一個大衆共識，即不存在永久穩定運行的系統程序。事實上，幾十年前圖靈也論證過類似的問題，稱爲“停機問題”，具體的描述是：能否爲A計算機編程，使得

2024-04-24 23:17:13

DataGear 5.0.0 發佈，數據可視化分析平臺

DataGear 企業版 1.1.0 已發佈，歡迎瞭解試用！ http://datagear.tech/pro/ DataGear 5.0.0 發佈，核心功能重構，新增圖表追加更新模式，具體更新內容如下：重構：【圖表數據集】概念和設計

2024-04-24 21:42:05

界面控件DevExpress VCL v24.1預覽 - 支持RAD Studio 12.1、圖表新功能

DevExpress VCL Controls是Devexpress公司旗下最老牌的用戶界面套包，所包含的控件有：數據錄入、圖表、數據分析、導航、佈局等。該控件能幫助您創建優異的用戶體驗，提供高影響力的業務解決方案，並利用您現有的VCL技能

2024-04-24 11:35:34

「Java開發指南」如何利用MyEclipse啓用Spring DSL？（二）

本教程將引導您通過啓用Spring DSL和使用Service Spring DSL抽象來引導Spring和Spring代碼生成項目，本教程中學習的技能也可以很容易地應用於其他抽象。在本教程中，您將學習如何：爲Spring DSL初始化

2024-04-24 11:35:31

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

大家好，我是Python進階者。一、前言前幾天在Python白銀交流羣【Jethro Shen】問了一個Python谷歌驅動下載的問題。二、實現過程這裏【Kim】和【Crazy】給了一個指導，如上圖所示。說來奇怪，在鏈接中看了沒有

2024-04-24 09:48:52

如何從根本上避免釣魚--安全意識的重要性

一、什麼是網絡釣魚（Phishing） “網絡釣魚（Phishing）攻擊者利用欺騙性的電子郵件和僞造的 Web 站點來進行網絡詐騙活動，受騙者往往會泄露自己的私人資料，如信用卡號、銀行卡賬戶、身份證號等內容。詐騙者通常會將自己僞裝成網

2024-04-23 23:16:04

【微電平臺】-高併發實戰經驗-奇葩問題解決及流程優化之旅

微電平臺微電平臺是集電銷、企業微信等於一體的綜合智能SCRM SAAS化系統，涵蓋多渠道管理、全客戶生命週期管理、私域營銷運營等主要功能，承接了京東各業務線服務，專注於爲業務提供職場外包式的一站式客戶管理及一體化私域運營服務。

2024-04-23 23:16:01

MySQL死鎖排查，原來我一直沒懂。。。

喜大普奔，微信給我的公衆號開了留言功能！！！有緣看到這篇文章的朋友，可以留個言互動下，謝謝～最近線上偶發MySQL的死鎖異常，發現原來很多理論都只背了個結論，細節都是魔鬼。比如，MySQL在RR級別用gap lock防止幻讀，

2024-04-23 23:10:58

沙特2030年願景和對中國IT企業的市場機會分析

沙特2030年願景和對中國IT企業的市場機會分析前言：最近“開源老DJ，帶你去沙特”欄目第一期已經播出，收到了不錯的反響。見COPU官網的回顧。（https://mp.weixin.qq.com/s/3B0jNVhybxTF1xPiy

2024-04-23 22:24:54

24小時熱門文章

最新文章

最新評論文章