kafka日誌存儲以及清理機制

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近在準備國慶節服務容量預警準備,恰好碰到kafka存儲容量以及日誌清理的一些知識,所以花了點時間研究了一下相關的知識點。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要聚焦kafka的日誌存儲以及日誌清理相關。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"日誌存儲結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們來看一張kafak的存儲結構圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/93/932656acffd7b96b98362527873632f3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示、kafka中消息是以主題topic爲基本單位進行歸類的,這裏的topic是邏輯上的概念,實際上在磁盤存儲是根據分區存儲的,每個主題可以分爲多個分區、分區的數量可以在主題創建的時候進行指定。例如下面kafka命令創建了一個topic爲test的主題、該主題下有4個分區、每個分區有兩個副本保證高可用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"./bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --replication-factor 2 --partitions 4 --topic test\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區的修改除了在創建的時候指定。還可以動態的修改。如下將kafka的test主題分區數修改爲12個","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"./kafka-topics.sh --alter --zookeeper 127.0.0.1:2181 --topic test --partitions 12\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區內每條消息都會被分配一個唯一的消息id,也就是我們通常所說的offset, 因此kafak只能保證每一個分區內部有序性,不能保證全局有序性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果分區設置的合理,那麼所有的消息都可以均勻的分佈到不同的分區中去,這樣可以實現水平擴展。不考慮多副本的情況下,一個分區對應一個log日誌、如上圖所示。爲了防止log日誌過大,kafka又引入了日誌分段(LogSegment)的概念,將log切分爲多個LogSegement,相當於一個巨型文件被平均分配爲相對較小的文件,這樣也便於消息的維護和清理。事實上,Log和LogSegement也不是純粹物理意義上的概念,Log在物理上只是以文件夾的形式存儲,而每個LogSegement對應於磁盤上的一個日誌文件和兩個索引文件,以及可能的其他文件(比如以\".txindex\"爲後綴的事務索引文件)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kafak中的Log對應了一個命名爲- 的文件夾。舉個例子、假如有一個test主題,此主題下游3個分區,那麼在實際物理上的存儲就是 \"test-0\",\"test-1\",\"test-2\" 這三個文件夾。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"向Log中寫入消息是順序寫入的。只有最後一個LogSegement才能執行寫入操作,在此之前的所有LogSegement都不能執行寫入操作。爲了方便描述,我們將最後一個LogSegement成爲\"ActiveSegement\",即表示當前活躍的日誌分段。隨着消息的不斷寫入,當ActiveSegement滿足一定的條件時,就需要創建新的activeSegement,之後在追加的消息寫入新的activeSegement。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/37/379fba05ed9e66735e6dff450c1ec81d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了便於消息的檢索,每個LogSegement中的日誌文件(以\".log\" 爲文件後綴)都有對應的兩個文件索引:偏移量索引文件(以\".index\" 爲文件後綴)和時間戳索引文件(以\".timeindex\"爲文件後綴)。每個LogSegement都有一個“基準偏移量” baseOffset,用來標識當前LogSegement中第一條消息的offset。偏移量是一個64位的長整形。日誌文件和兩個索引文件都是根據基準偏移量(baseOffset)命名的,名稱固定爲20位數字,沒有達到的位數則用0填充。比如第一個LogSegment的基準偏移量爲0,對應的日誌文件爲00000000000000000000.log","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4f/4ffab9de0f6b8ec84b0abd688517ab7e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"示例中第2個LogSegment對應的基準位移是256,也說明了該LogSegment中的第一條消息的偏移量爲256,同時可以反映出第一個LogSegment中共有256條消息(偏移量從0至254的消息)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意每個LogSegment中不只包含“.log”“.index”“.timeindex”這3種文件,還可能包含“.deleted”“.cleaned”“.swap”等臨時文件,以及可能的“.snapshot”“.txnindex”“leader-epoch-checkpoint”等文件。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"日誌清理機制","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於kafak是把消息存儲 在磁盤上,爲了控制消息的不斷增加我們就必須對消息做一定的清理和壓縮。kakfa中的每一個分區副本都對應的一個log日誌文件。而Log又分爲多個LogSegement日誌分段。這樣也便於日誌清理。kafka內部提供了兩種日誌清理策略。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"日誌刪除","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按照一定的保留策略直接刪除不符合條件的日誌分段。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"基於時間","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以通過broker端參數log.cleanup.policy來設置日誌清理策略,此參數的默認值爲“delete”,即採用日誌刪除的清理策略。如果要採用日誌壓縮的清理策略,就需要將log.cleanup.policy設置爲“compact”,並且還需要將log.cleaner.enable(默認值爲true)設定爲true。通過將log.cleanup.policy參數設置爲“delete,compact”,還可以同時支持日誌刪除和日誌壓縮兩種策略。日誌清理的粒度可以控制到主題級別,比如與log.cleanup.policy 對應的主題級別的參數爲cleanup.policy,爲了簡化說明,本文只採用broker端參數做陳述。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌刪除任務會檢查當前日誌文件中是否有保留時間超過設定的閾值(retentionMs)來尋找可刪除的日誌分段文件集合(deletableSegments),如圖下圖所示。retentionMs可以通過broker端參數log.retention.hours、log.retention.minutes和log.retention.ms來配置,其中 log.retention.ms 的優先級最高,log.retention.minutes 次之,log.retention.hours最低。默認情況下只配置了log.retention.hours參數,其值爲168,故默認情況下日誌分段文件的保留時間爲7天。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1e97c8284f7b087fcb741202888d3c09.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查找過期的日誌分段文件,並不是簡單地根據日誌分段的最近修改時間lastModifiedTime來計算的,而是根據日誌分段中最大的時間戳largestTimeStamp 來計算的。因爲日誌分段的lastModifiedTime可以被有意或無意地修改,比如執行了touch操作,或者分區副本進行了重新分配,lastModifiedTime並不能真實地反映出日誌分段在磁盤的保留時間。要獲取日誌分段中的最大時間戳 largestTimeStamp 的值,首先要查詢該日誌分段所對應的時間戳索引文件,查找時間戳索引文件中最後一條索引項,若最後一條索引項的時間戳字段值大於 0,則取其值,否則才設置爲最近修改時間lastModifiedTime.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若待刪除的日誌分段的總數等於該日誌文件中所有的日誌分段的數量,那麼說明所有的日誌分段都已過期,但該日誌文件中還要有一個日誌分段用於接收消息的寫入,即必須要保證有一個活躍的日誌分段 activeSegment,在此種情況下,會先切分出一個新的日誌分段作爲activeSegment,然後執行刪除操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刪除日誌分段時,首先會從Log對象中所維護日誌分段的跳躍表中移除待刪除的日誌分段,以保證沒有線程對這些日誌分段進行讀取操作。然後將日誌分段所對應的所有文件添加上“.deleted”的後綴(當然也包括對應的索引文件)。最後交由一個以“delete-file”命名的延遲任務來刪除這些以“.deleted”爲後綴的文件,這個任務的延遲執行時間可以通過file.delete.delay.ms參數來調配,此參數的默認值爲60000,即1分鐘。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"基於日誌大小","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌刪除任務會檢查當前日誌的大小是否超過設定的閾值(retentionSize)來尋找可刪除的日誌分段的文件集合(deletableSegments),如下圖所示。retentionSize可以通過broker端參數log.retention.bytes來配置,默認值爲-1,表示無窮大。注意log.retention.bytes配置的是Log中所有日誌文件的總大小,而不是單個日誌分段(確切地說應該爲.log日誌文件)的大小。單個日誌分段的大小由broker 端參數 log.segment.bytes 來限制,默認值爲1073741824,即1GB。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/86/86ad0d2b1bb020e6dcc923d0941eae22.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於日誌大小的保留策略與基於時間的保留策略類似,首先計算日誌文件的總大小size和retentionSize的差值diff,即計算需要刪除的日誌總大小,然後從日誌文件中的第一個日誌分段開始進行查找可刪除的日誌分段的文件集合deletableSegments。查找出 deletableSegments 之後就執行刪除操作,這個刪除操作和基於時間的保留策略的刪除操作相同,這裏不再贅述。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章