ClickHouse核心引擎MergeTree解讀

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ClickHouse"},{"type":"text","text":" 是俄羅斯最大的搜索引擎Yandex在2016年開源的數據庫管理系統(DBMS),主要用於聯機分析處理(OLAP)。其採用了面向列的存儲方式,性能遠超傳統面向行的DBMS,近幾年受到廣泛關注。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將介紹 "},{"type":"text","marks":[{"type":"strong"}],"text":"ClickHouse MergeTree系列表引擎"},{"type":"text","text":" 的相關知識,並通過示例分析MergeTree存儲引擎的數據存儲結構。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 MergeTree表引擎簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MergeTree(合併樹)系列表引擎是ClickHouse提供的最具特色的存儲引擎。MergeTree引擎支持數據按主鍵、數據分區、數據副本以及數據採樣等特性。官方提供了包括MergeTree、ReplacingMergeTree、SummingMergeTree、AggregatingMergeTree、CollapsingMergeTree、VersionedCollapsingMergeTree、GraphiteMergeTree等7種不同類型的MergeTree引擎的實現,以及與其相對應的支持數據副本的MergeTree引擎(Replicated*)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e6\/15\/e6940e6b994e7df1fc72be8328d18a15.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先來介紹一下 "},{"type":"text","marks":[{"type":"strong"}],"text":"MergeTree核心引擎"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ReplacingMergeTree"},{"type":"text","text":": 在後臺數據合併期間,對具有相同排序鍵的數據進行去重操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"SummingMergeTree"},{"type":"text","text":": 當合並數據時,會把具有相同主鍵的記錄合併爲一條記錄。根據聚合字段設置,該字段的值爲聚合後的彙總值,非聚合字段使用第一條記錄的值,聚合字段類型必須爲數值類型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AggregatingMergeTree"},{"type":"text","text":": 在同一數據分區下,可以將具有相同主鍵的數據進行聚合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"CollapsingMergeTree"},{"type":"text","text":": 在同一數據分區下,對具有相同主鍵的數據進行摺疊合併。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"VersionedCollapsingMergeTree"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於CollapsingMergeTree引擎,增添了數據版本信息字段配置選項。在數據依據ORDER BY設置對數據進行排序的基礎上,如果數據的版本信息列不在排序字段中,那麼版本信息會被隱式的作爲ORDER BY的最後一列從而影響數據排序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"GraphiteMergeTree"},{"type":"text","text":": 用來存儲時序數據庫Graphites的數據。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MergeTree是該系列引擎中最核心的引擎,其他引擎均以MergeTree爲基礎,並在數據合併過程中實現了不同的特性,從而構成了MergeTree表引擎家族。下面我們通過MergeTree來具體瞭解MergeTree表系列引擎。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2 MergeTree引擎"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 表創建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建MergeTree的DDL如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] \n(\n name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1], \n name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2], \n ... \n) ENGINE = MergeTree()\n ORDER BY expr \n [PARTITION BY expr] \n [PRIMARY KEY expr] \n [SAMPLE BY expr] \n [TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'], ...] \n [SETTINGS name=value, ...\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏說明一下MergeTree引擎的主要參數:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"必填選項"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ENGINE"},{"type":"text","text":" :引擎名字,MergeTree引擎無參數。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ORDER BY"},{"type":"text","text":" :排序鍵,可以由一列或多列組成,決定了數據以何種方式進行排序,例如ORDER BY(CounterID, EventDate)。如果沒有顯示指定PRIMARY KEY,那麼將使用ORDER BY作爲PRIMARY KEY。通常只指定ORDER BY即可。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"選填選項"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"PARTITION BY"},{"type":"text","text":" :分區鍵,指明表中的數據以何種規則進行分區。分區是在一個表中通過指定的規則劃分而成的邏輯數據集。分區可以按任意標準進行,如按月、按日或按事件類型。爲了減少需要操作的數據,每個分區都是分開存儲的。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"PRIMARY KEY"},{"type":"text","text":" :主鍵,設置後會按照主鍵生成一級索引(primary.idx),數據會依據索引的設置進行排序,從而加速查詢性能。默認情況下,PRIMARY KEY與ORDER BY設置相同,所以通常情況下直接使用ORDER BY設置來替代主鍵設置。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"SAMPLE BY"},{"type":"text","text":" :數據採樣設置,如果顯示配置了該選項,那麼主鍵配置中也應該包括此配置。例如 ORDER BY CounterID \/ EventDate \/ intHash32(UserID)、SAMPLE BY intHash32(UserID)。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"TTL"},{"type":"text","text":" :數據存活時間,可以爲某一字段列或者一整張表設置TTL,設置中必須包含Date或DateTime字段類型。如果設置在列上,那麼會刪除字段中過期的數據。如果設置的是表級的TTL,那麼會刪除表中過期的數據。如果設置了兩種類型,那麼按先到期的爲準。例如,TTL createtime + INTERVAL 1 DAY,即一天後過期。使用場景包括定期刪除數據,或者定期將數據進行歸檔。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"index_granularity"},{"type":"text","text":" :索引間隔粒度。MergeTree索引爲稀疏索引,每index_granularity個數據產生一條索引。index_granularity默認設置爲8092。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"enable_mixed_granularity_parts"},{"type":"text","text":" :是否啓動index_granularity_bytes來控制索引粒度大小。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"index_granularity_bytes"},{"type":"text","text":" :索引粒度,以字節爲單位,默認10Mb。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"merge_max_block_size"},{"type":"text","text":" :數據塊合併最大記錄個數,默認8192。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"merge_with_ttl_timeout"},{"type":"text","text":" :合併頻率最小時間間隔,默認1天。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 數據存儲結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先創建一個test表,DDL如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"CREATE TABLE test.test \n( \n id UInt64, \n type UInt8, \n create_time DateTime \n) ENGINE = MergeTree() \n PARTITION BY toYYYYMMDD(create_time) \n ORDER BY (id) \n SETTINGS index_granularity = 4;\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"test表包括id、type、create等三個字段,其中以create_time日期字段作爲分區鍵,並將日期格式轉化爲YYYYMMDD。按照id字段進行排序。由於沒有顯式設置主鍵,所以引擎默認使用ORDER BY設置的id列作爲索引字段,並生成索引文件。index_granularity設置爲4,意味着每4條數據產生一條索引數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"插入一條測試數據:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"insert into test.test(id, type, create_time) VALUES (1, 1, toDateTime('2021-03-01 00:00:00'));\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用如下命令查看test表分區相關信息:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":" SELECT \n database, \n table, \n partition, \n partition_id, \n name, \n active, \n path \n FROM system.parts \n WHERE table = 'test' \n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"返回結果如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/43\/f1\/43f14146a075b8a09417b34ec1abyyf1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上圖中可以看到test表中返回了一條partitionid爲20210301的數據分區的記錄,從name字段中我們可以得知,此分區的目錄名爲20210301_8_8_0。20210301_8_8_0這個目錄名字到底有什麼含義呢?下面來介紹一下分區規則以及分區目錄的命名規則。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.1 數據分區ID生成規則"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據分區規則由分區ID決定,分區ID由PARTITION BY分區鍵決定。根據分區鍵字段類型,ID生成規則可分爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"未定義分區鍵"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒有定義PARTITION BY,默認生成一個目錄名爲all的數據分區,所有數據均存放在all目錄下。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"整型分區鍵"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區鍵爲整型,那麼直接用該整型值的字符串形式做爲分區ID。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"日期類分區鍵"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區鍵爲日期類型,或者可以轉化成日期類型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"其他類型分區鍵"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"String、Float類型等,通過128位的Hash算法取其Hash值作爲分區ID。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面我們插入一條日期爲2021-03-01 00:00:00的數據,對該字段格式化後生成的數據分區id就是20210301。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2 數據分區目錄命名規則"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目錄命名規則如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"PartitionId_MinBlockNum_MaxBlockNum_Level\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"PartitionID"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區id,例如20210301。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MinBlockNum"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最小分區塊編號,自增類型,從1開始向上遞增。每產生一個新的目錄分區就向上遞增一個數字。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MaxBlockNum"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最大分區塊編號,新創建的分區MinBlockNum等於MaxBlockNum的編號。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Level"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合併的層級,被合併的次數。合併次數越多,層級值越大。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/da\/b5\/da1e10c251c73085ddffe598db5e07b5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上圖可知,此分區的分區id爲20210301,當前分區的MinBlockNum和MinBlockNum均爲8,而level爲0,表示此分區沒有合併過。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3 數據分區文件組織結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在瞭解了分區目錄名字的生成規則後,下面來看看數據分區目錄下的文件組織結構。以2021030188_0分區爲例:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6c\/f3\/6cc0b607ce0a9c35bc2aecc0e2e495f3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖中可以看到,目錄中的文件主要包括bin文件、mrk文件、primary.idx文件以及其他相關文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"bin文件"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據文件,存儲的是某一列的數據。數據表中的每一列都對應一個與其字段名相同的bin文件,例如id.bin存儲的是表test中id列的數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"mrk文件"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標記文件,每一列都對應一個與其字段名相同的標記文件,標記文件在idx索引文件和bin數據文件之間起到了橋樑作用。以mrk2結尾的文件,表示該表啓用了自適應索引間隔。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"primary.idx文件"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主鍵索引文件,用於加快查詢效率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"count.txt"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據分區中數據總記錄數。上述20210301_8_8_0的數據分區中,該文件中的記錄總數爲1。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"columns.txt"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表中所有列數的信息,包括字段名和字段類型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"partion.dat"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用於保存分區表達式的值。上述20210301_8_8_0的數據分區中該文件中的值爲20210301。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"minmax_create_time.idx"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區鍵的最大最小值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"checksums.txt"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"校驗文件,用於校驗各個文件的正確性。存放各個文件的size以及hash值。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.1 數據文件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MergeTree中,每列都對應一個bin文件單獨存放該列數據。例如,id.bin存放的是id列的數據。所有數據都經過數據壓縮、排序,最後以數據塊的形式寫入bin文件中。bin中數據以壓縮數據塊爲單位寫入文件中。每個數據塊由頭信息和壓縮數據組成。頭部信息包括校驗和、數據壓縮算法、數據壓縮前大小和壓縮後大小組成。壓縮數據由granule組成,granule大小與index_granularity相關。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.2 索引文件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MergeTree索引爲稀疏索引,它並不索引單條數據,而是索引一定範圍的數據。也就是從已排序的全量數據中,間隔性的選取一些數據記錄主鍵字段的值來生成primary.idx索引文件,從而加快表查詢效率。間隔設置參數爲index_granularity。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/bb\/4c\/bbc302f044278bbfcc3c16d63f15a34c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們向表test中插入9條數據,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"insert into test.test(id, type, create_time) VALUES (1, 1, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (1, 2, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (1, 3, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (2, 1, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (2, 1, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (3, 1, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (3, 1, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (4, 1, toDateTime('2021-03-01 00:00:00')); \ninsert into test.test(id, type, create_time) VALUES (5, 1, toDateTime('2021-03-01 00:00:00'));\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲index_granularity設置爲4,所以每4條數據就會生成一條索引記錄,即使用插入的第1、5、9條數據id字段的值生成索引文件記錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d6\/9c\/d6ef6c6d12d669a97b0940e852caf29c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.3 標記文件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mrk標記文件在primary.idx索引文件和bin數據文件之間起到了橋樑作用。primary.idx文件中的每條索引在mrk文件中都有對應的一條記錄。一條記錄的組成包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"offset-compressed bin file"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示指向的壓縮數據塊在bin文件中的偏移量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"offset-decompressed data block"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示指向的數據在解壓數據塊中的偏移量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"row counts"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代表數據記錄行數,小於等於index_granularity所設置的值。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/57\/d6\/57c2c50d02c4bf7637007fb3b337d6d6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引,標記和數據文件下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b2\/85\/b2dabc9da1d48abca91627a46c08dd85.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:TalkingData 張凱"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考文檔:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.https:\/\/clickhouse.tech\/docs"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.http:\/\/www.clickhouse.com.cn\/topic\/5ffec51eba8f16b55dd0ffe4"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.《ClickHouse原理解析與應用實踐》(機械工業出版社出版,作者朱凱)"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號TalkingData(ID:Talkingdata)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/YWSmRqOOC3F5KeDBPg2g1A","title":"","type":null},"content":[{"type":"text","text":"ClickHouse核心引擎MergeTree解讀"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章