Kafka怎麼不丟數據？

原創

纯洁的眼神

2020-11-27 21:23

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在Kafka中broker一般有多個，它們組成一個高容錯的集羣。Broker的主要職責是接受producer和consumer的請求，並把消息持久化到本地磁盤。Broker以topic爲單位將消息分佈到不同的分區，每個分區可以有多個副本，通過數據冗餘的方式實現容錯。當partition存在多個副本時，只有一個是leader對外提供讀寫請求，其它均是follower不對外提供讀寫服務，只是同步leader中的數據，並在leader出現問題時，通過選舉算法將其中的某一個提升爲leader。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Broker以追加方式將消息寫到磁盤文件，且每個分區中的消息被賦予了唯一整數標識偏移量offset，broker僅提供基於offset的讀取方式，不維護各consumer當前已消費消息的offset值，而是由consumer各自維護當前offset。Consumer向broker讀取數據時發送起始offset值，broker將之後的消息流式發送過去，Broker中保存的數據是有有效期的，一旦超過了有效期，對應的數據將被移除以釋放磁盤空間，只要數據在有效期內，consumer可以重複消費。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kafka Broker能夠保證同一topic下同一partition內部的消息是有序的，但無法保證partition之間的消息全局有序，這意味着一個consumer讀取某個topic下多個分區的消息時，和寫入的順序不一致。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"一、存儲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Broker中以/tmp/kafka_logs來設置消息文件存儲目錄，如果有多個分區將會在/tmp/kafka_logs下生成多個partition目錄，partition目錄的命名規則：topic名_分區序號。每個partition物理上是由多個segment組成，segment文件由index和log文件組成，每個log文件的大小由log.segment.bytes控制，寫滿後兩個文件的命名規則：當前文件最小的offset值爲名，20位數字字符長度，不足用0補充，例如：00000000000000000123.index。索引文件中的元數據指向數據文件中message的物理偏移位置，例如：該index文件中元數據[6,856]表示在分區中第123+6=129個消息，物理偏移offset是865。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 每個分區內部的數據都是有序的，用一個offset來標識每一條數據的位置，但只是僅限於分區內的有序，offset不是message在分區中的實際存儲位置，而是邏輯上的一個值，它唯一確定分區中的一條Message：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相對offset：是該segment中的相對offset","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Position：表示該條Message在數據文件中的絕對位置","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 每次有消息進入就往log文件中寫入，那麼就會造成大量的磁盤隨機寫入，所以引入數據緩存，將要寫入的數據先緩存起來再統一寫入，從而提升寫入效率。kafka採用OS級別緩存pageCache，OS將閒置的memory用作disk caching，當數據寫入時OS將數據寫入pageCache，同時標記該page爲dirty，當讀取數據時，先從pageCache中查找，如果沒有查到(發生缺頁)則去磁盤中讀取，由於數據在內存，存在系統down機內存數據丟失的風險。而對於broker來說數據只需要存儲到內存，如果OS緩存失效就會導致kafka客戶端commit的數據丟失，可以通過log.flush.interval.messages(一定的條數)和log.flush.interval.ms(一定的時間)設置強制寫入到磁盤,所以kafka保證它存在於多個replica內存中，不保證被持久化到磁盤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"二、切分文件","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" index和log文件會存在多個文件，切分規則如下：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前日誌分段文件的大小超過了 log.segment.bytes 配置的值(默認1GB)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前日誌分段中消息的最大時間戳與系統的時間戳的差值大於log.roll.ms或log.roll.hours 的值(7天)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引文件的大小達到 log.index.size.max.bytes 配置的值默認值(10MB)","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"三、清理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 清理日誌實際上是清理過期的segment或者日誌文件太大了需要刪除最舊的數據，使得整體日誌文件大小不超過指定的值，以維持日誌文件的固定大小。log.cleanup.policy=delete表示採用刪除日誌的清理機制，默認5分鐘執行一次日誌刪除，清理日誌有三種策略：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於時間","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 配置log.retention.hours/minutes/ms(默認7天)參數來設定清除的時間，日誌任務會清理超過指定閾值時間的segment文件，但是並不是根據最近修改時間(lastModifiedTime)來計算，而是根據Segment日誌中最大的時間戳(largestTimeStamp)來計算,因爲segment日誌的lastModifiedTime可以被修改(分區副本進行了rebalance)，lastModifiedTime並不能真實地反映出日誌分段在磁盤的保留時間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於日誌大小","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 配置log.retention.bytes參數來設定清除的大小，日誌任務會清理超過指定閾值大小的segment文件，該參數是日誌文件的總大小，並不是單個Segment文件的大小。首先計算日誌文件的總大小size和log.retention.size的差值diff，即需要刪除的日誌總大小，然後從日誌文件中查找可刪除的文件集合deletableSegments，之後就執行刪除操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於起始偏移量","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 基於日誌起始偏移量(logStartOffset)的刪除策略依據是某segment日誌的下一個segment的offset是否>=logStartOffset，若是則加入deletablesSegments。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b0ece2bd64947986b8194f65374616fe.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 上圖中第1個和第2個Segment會加入deletablesSegments集合，然後被刪除。刪除策略是以topic爲級別的，所以不同的topic可以設置不同的刪除策略。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"三、壓縮","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 日誌壓縮確保Kafka會爲一個topic分區數據日誌中保留message key的最後一個值，它解決了應用crash或應用在操作期間重啓來重新加載緩存的場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e00cc79fc411b0e3bf1a43480e114607.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kafka日誌壓縮機制是細粒度key級別的保留機制，而不是基於時間的粗粒度。壓縮後的offset可能是不連續的，比如上圖中沒有5和7，因爲這些offset的消息被merge了，當需要消費這些offset消息時，將會拿到比這個offset大的offset對應的消息。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 日誌壓縮提供瞭如下保證：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有跟上消費的consumer能消費到所有寫入的消息，這些消息有連續的序列號。Topic的min.compaction.lag.ms用於保證消息寫入多久後纔會被壓縮","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息的順序會被保留。壓縮不會重排序消息，只是移除其中一部分","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息的offset不會變更。這是消息在日誌中的永久標誌","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何從頭開始處理日誌的consumer至少會拿到每個key的最終狀態","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kafka支持GZIP、Snappy、LZ4 三種壓縮算法，可通過compression.type 設定具體的壓縮算法。通過設置log.cleaner.enable=true啓用cleaner(默認false)，log.cleanup.policy=compact啓用壓縮策略(默認delete)。壓縮算法是要佔用很大一部分cpu資源的並且耗時也是不小的，而壓縮的目的很大程度上是爲了提升網絡傳輸的性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b4b665750724dfed2fc4ede1199d853.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"四、分區","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kafka中可以將Topic從物理上劃分成一個或多個分區(Partition)，被分佈在集羣中的多個broker上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/40/40cc8bfa5c28e3d5185445f55d6a94c2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分區機制","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Producer在生產數據時可以爲每條消息指定Key，消息被髮送到broker時會根據分區規則選擇被存儲到哪一個分區，如果分區規則設置的合理，那麼所有的消息將會被均勻的分佈到不同的分區中，這樣就實現了負載均衡和水平擴展。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分區規則","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認情況下，Kafka根據消息的key來進行分區的分配，即hash(key) % numPartitions，如下圖所示：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ed/ed9be9cd9c04a24c7f7be36e2d0f5e03.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中numPartitions就是Tpoic的分區總數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在創建Topic時候可以配置num.partitions來指定默認的分區數。爲topic創建分區時，分區數最好是broker數量的整數倍，這樣才能使一個topic的分區均勻的分佈在Kafka集羣中。如果消息key爲null時，那麼producer將會把這條消息發送給隨機的一個partition，然後把這個分區加入到緩存中以備後續的null key直接使用，但是會每隔10分鐘清除緩存。當往Broker發送消息時修改了topic的分區數，producer可以在最多topic.metadata.refresh.interval.ms的時間之後動態感知到分區數的變化，並且可以將數據發送到新添加的分區中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"確定分區","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過將topic的消息分散到多個broker的多個分區，理論上一個topic的分區越多，整個集羣的吞吐量就越大，但是每個分區都是自身的消耗，所以並不是分區越多越好，但是可以遵循一定的步驟來確定分區數：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建一個只有1個分區的topic，然後測試這個topic的producer吞吐量和consumer吞吐量，假設它們的值分別是Tp和Tc，單位可以是MB/s，總的目標吞吐量是Tt，那麼分區數 = Tt / max(Tp, Tc)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"消費均衡","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Consumer要確定從哪一個分區去取數據來消費，選擇規則如下：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c3/c36e431b1ec541d2de6ceb2a44fdd2ea.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"五、副本","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 分區的副本被稱爲replica，每個分區可以有多個副本，並且在副本集中會存在一個leader和多個follower，均勻的分佈在多個broker。當leader節點掛掉之後，會從副本集中重新選出一個副本作爲leader繼續提供服務實現故障轉移。副本的個數可以通過broker配置文件來設定，leader處理所有的read-write請求，follower只需要與leader同步數據即可。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/94/94654ffd0f29a711f5b5d2f81d0318eb.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 副本同步中有ISR副本的概念，ISR副本是leader和所有能夠與leader保持基本同步的follower副本集合，由leader維護。由於消息複製延遲受到最慢同步副本的限制，因此快速檢測出慢副本並將其從 ISR 中刪除非常重要。如果follow副本和leader數據同步速度過慢導致【落後太多】，該follower將會被剔除出ISR副本，由min.insync.replicas設置ISR中follower的最小個數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"同步機制","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Producer將消息發送到partition的leader上並寫入其本地log後，其他follower將從leader pull數據，producer需要等待request.required.acks個副本同步完成纔算成功提交消息，該參數有如下配置：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks=0 生產者無需等待leader返回確認,但是無法保證消息是否被leader收到","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks=1 生產者需要等待leader副本成功寫入日誌。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks=-1 leader副本和所有follower都寫入日誌纔會向producer發送確認信息","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"副本同步的主要參數：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"num.replica.fetchers：從一個broker同步數據的拉取線程數，可增加該broker的IO並行度。默認值：1","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"replica.fetch.wait.max.ms：對於follower replica而言，每個fetch請求的最大間隔時間，這個值應該比replica.lag.time.max.ms要小，否則對於吞吐量特別低的topic可能會導致ISR頻繁抖動。默認值：500","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"replica.lag.time.max.ms：超時時間，即當follower在該時間內沒有發送fetch請求或者在這個時間內沒有追上leader當前的log end offset，則將這個follower從ISR中移除。默認值：10000(10S)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"replica.fetch.min.bytes：每次fetch請求最少拉取的數據量，如果不滿足這個條件，那麼要等待 replica.fetch.wait.max.ms。默認值：1","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"不同步的原因：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"慢副本","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在一定週期時間內follower不能追趕上leader。最常見的原因之一是I/O瓶頸導致follower追加複製消息速度慢於從leader拉取速度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"卡住副本","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在一定週期時間內follower停止從leader拉取請求。follower replica卡住了是由於GC暫停或follower失效或死亡。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新啓動副本","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當用戶給topic增加副本因子時，新的follower不在同步副本列表中，直到他們完全趕上了leader日誌。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"宕機恢復","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"少部分副本宕機","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當leader宕機了，會從follower選擇一個作爲leader,當宕機的舊leader重新恢復時，會把之前commit的數據清空，重新從新leader中pull數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全部副本宕機","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當全部副本宕機了有兩種恢復方式：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 等待ISR集合中的副本恢復後選舉leader。等待時間較長，降低可用性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 選擇第一個恢復的副本作爲新的leader，無論是否在ISR中，但是並未包含之前leader commit的數據，因此造成數據丟失","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"offset存儲","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kafka新版本已默認將消費的offset遷入到了 Kafka 一個名爲 __consumer_offsets 的Topic中。以消費的Group、Topic、Partition做爲組合 Key。所有的消費offset都提交寫入到上述的Topic中。因爲這部分消息是非常重要，以至於是不能容忍丟數據的，所以消息的 acking 級別設置爲了 -1，生產者等到所有的 ISR 都收到消息後纔會得到ACK。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}}],"text":"更多文章請加入公衆號","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e9/e907e7f7f6deb51b782e7173adc49b56.jpeg","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

嘉爲藍鯨WeOps與DeepFlow強強聯合，共同打造拓展性運維平臺

直達原文：嘉爲藍鯨WeOps x DeepFlow | 強強聯合，共同打造拓展性運維平臺運維管理在企業信息化建設中扮演着至關重要的角色，嘉爲藍鯨WeOps一體化運維平臺致力於爲客戶提供更全面、智能的運維能力。在探索創新的過程中，我們深刻

2024-04-26 23:23:22

如何從0到1設計診斷系統

引言在整車電子電氣體系中，診斷系統的設計扮演着至關重要的角色，負責支持整車的刷寫、故障排查和EOL(End of Line)等關鍵操作。這一重要性在於這些操作的實現都依賴於診斷系統的全面支持。因此，在設計診斷系統時，必須確保

2024-04-26 22:43:26

Sealos 雲主機正式上線，便宜，便宜，便宜！

我們基於 Sealos 雲開發的能力，僅用三天時間就上線 Sealos 的雲主機能力，現在不太懂容器的同學也可以在 Sealos 上開心的使用虛擬機了，本文先說 Sealos 雲主機的優勢，再聊聊我們是怎麼這麼快實現上線的，以及爲什麼我們要

2024-04-26 21:14:40

從零開始學架構V2-架構設計流程-2

一、架構設計流程架構的設計的是爲了降低整體的複雜性，那麼架構設計的第一步就是熟悉業務，識別其中的核心訴求，僅考慮技術的話就是識別複雜度。 1.1 識別複雜度架構的複雜度主要來源於第一節中介紹的“高性能”“高可用”“可擴展”等幾個方面，實

2024-04-25 23:56:26

從零開始學架構V2-初識架構設計-1

一、架構設計的主要目的爲了解決軟件系統複雜度帶來的問題二、複雜性來源軟件的架構設計是一個非常複雜的過程；基於業務&技術現狀、公司成本、團隊規模、團隊技術能力、近三年業務發展規模預測、技術發展趨勢等條件篩選出合適的技術、編寫多種架構設計

2024-04-25 23:56:25

京東廣告研發——效率爲王：廣告統一檢索平臺實踐

1、系統概述實踐證明，將互聯網流量變現的在線廣告是互聯網最成功的商業模式，而電商場景是在線廣告的核心場景。京東服務中國數億的用戶和大量的商家，商品池海量。平臺在兼顧用戶體驗、平臺、廣告主收益的前提推送商品具有挑戰性。京東廣告檢索平臺

2024-04-25 23:17:47

三十分鐘入門基礎Go（Java小子版）

前言 Go語言定義 Go（又稱 Golang）是 Google 的 Robert Griesemer，Rob Pike 及 Ken Thompson 開發的一種靜態、強類型、編譯型語言。Go 語言語法與 C 相近，但功能上有：內存安

2024-04-25 23:17:43

Stable Diffusion中的embedding

Stable Diffusion中的embedding 嵌入，也稱爲文本反轉，是在 Stable Diffusion 中控制圖像樣式的另一種方法。在這篇文章中，我們將學習什麼是嵌入，在哪裏可以找到它們，以及如何使用它們。什麼是嵌入embe

2024-04-25 21:31:13

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

LightningChartJS是Web上性能特高的圖表庫，具有出色的執行性能 - 使用高數據速率同時監控數十個數據源。 GPU加速和WebGL渲染確保您的設備的圖形處理器得到有效利用，從而實現高刷新率和流暢的動畫，常用於貿易，工程，航空航

2024-04-25 11:36:06

詳解數倉的向量化執行引擎

本文分享自華爲雲社區《GaussDB(DWS)向量化執行引擎詳解》，作者： yd_212508532。前言適用版本：【基線功能】傳統的行執行引擎大多采用一次一元組的執行模式，這樣在執行過程中CPU大部分時間並沒有用來處理數據，更

2024-04-25 10:33:17

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Blackhat Asia 2024於4月中旬在新加坡隆重舉行。此次大會聚集了業界最傑出的信息安全專業人士和研究者，爲參會人員提供了安全領域最新的研究成果和發展趨勢。在本次大會上，百度安全共有三篇技術議題被大會收錄，主要圍繞自動駕駛控制器安

2024-04-25 09:33:19

前端面試題 - 說一下原型和原型鏈？

前端面試題 - 說一下原型和原型鏈？ JavaScript 中，萬物皆對象，對象分爲普通對象和函數對象。所有的函數都是函數對象（typeof f === 'function'），其他都是普通對象（typeof o === 'object'

2024-04-24 23:51:10

前端面試題 - JS的垃圾回收機制？

前端面試題 - JS的垃圾回收機制？有兩種垃圾回收策略：標記清除：標記階段即爲所有活動對象做上標記，清除階段則把沒有標記（也就是非活動對象）銷燬。引用計數：它把對象是否不再需要簡化定義爲對象有沒有其他對象引用到它。如果沒有引用指向該

2024-04-24 23:51:03

數據結構筆記淺記（十三）哈希表

「哈希表 hash table」，又稱「散列表」，它通過建立鍵 key 與值 value 之間的映射，實現高效的元素查詢。具體而言，我們向哈希表中輸入一個鍵 key ，則可以在 𝑂(1) 時間內獲取對應的值 value 。從本質上看，哈

2024-04-24 23:39:16

一則 TCP 緩存超負荷導致的 MySQL 連接中斷的案例分析

除了 MySQL 本身之外，如何分析定位其他因素的可能性？作者：龔唐傑，愛可生 DBA 團隊成員，主要負責 MySQL 技術支持，擅長 MySQL、PG、國產數據庫。愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯繫小編並註

2024-04-24 23:20:53

24小時熱門文章

最新文章

最新評論文章