Implementation
Log
- 現有一個topic“my_topic”,兩個分區,則在配置選項“log.dirs”指定的存儲日誌文件的目錄下有兩個文件夾:my_topic_0和my_topic_1。
- 每個文件夾下面有.index和.log兩種類型的文件,.log由本文件內第一條消息的offset命名;每個日誌文件大小不得超過指定值;
- log文件由一系列“log entries”組成,
- log entry:4字節的int整數表明該條消息的長度+1字節的magic value+4字節的CRC校驗碼+n字節的payload消息內容。
- 每條消息由一個64-bit即8字節的offset唯一標識,同時表明該消息在partition中的位置;
Writes
- 生產者向log追加消息,當一個log文件的大小達到指定值時(log.segment.bytes),就會產生新的文件。
- log.flush.interval.messages:累積的消息條數,之後強制OS將消息批量寫入磁盤;
- log.flush.interval.ms:將消息寫磁盤的時間間隔;消息數量或者時間任一滿足條件,就會寫磁盤;
- 如果兩個值設置太小,則會頻繁寫磁盤,影響性能,優點是當系統crash,不會丟失太多數據,增加了穩定性。
Reads
- 讀消息時指定offset和max-chunk-size,即一次消費的最大字節數。
- 理論上max-chunk-size應該比任意一個消息大,但是如果出現異常大的消息,會嘗試讀多次,每次使buffer-size加倍,直到成功讀取該條消息;
- 應該可以設置消息允許的最大值以及最大的buffer size,從而讓broker拒絕接受太大的消息;
- 消費的具體過程:先根據offset定位log segment file,之後從文件中取消息。The search is done as a simple binary search variation against an in-memory range maintained for each file.
- 發送給consumer的消息格式:
- MessageSetSend (fetch result):
total length : 4 bytes
error code : 2 bytes
message 1 : x bytes
...
message n : x bytes
MultiMessageSetSend (multiFetch result):
total length : 4 bytes
error code : 2 bytes
messageSetSend 1
...
messageSetSend n
Deletes
爲了在刪除日誌文件時可以正常生產,寫文件,採用了寫時複製copy-on-write技術;
Distribution
- Consumer Offset Tracking:kafka支持一個group內的所有consumer向一個指定的broker(稱爲offset manager)提交offset信息。當offset manager收到OffsetCommitRequest時,便會給一個壓縮的topic“__consumer_offsets”發送請求,當該topic的所有broker都受到消息後給予commit成功的應答。如果commit失敗了,consumer重試。offset
manager在內存中保存着offset與consumer的映射表,從而能快速響應consumer的請求。
- 早期的kafka版本,offset存儲在zookeeper裏面,可以將offset從zookeeper轉移至kafka;
- broker啓動後,會向zookeeper註冊一個臨時節點(broker shutdown或者crash之後節點消息),作爲 /brokers/ids的子節點。一個broker對應一個id,可以移動至其他物理主機,但是對外的id是不變的。
- Consumer registration algorithm:一個consumer啓動之後,執行以下操作:
- 向其所屬的group註冊一個新節點;
- 在consumer id節點上註冊一個watcher on changes(類似於數據庫的觸發器),監控consumer節點的增刪,從而觸發group內的rebalance(一個consumer宕掉,就會rebalance,由其他consumer繼續消費該分區,新增一個consumer之後也會rebalance);
- 在broker id節點上註冊一個watcher on changes,監控broker節點的增刪,觸發所有consumer group中所有consumer的rebalance;
- If the consumer creates a message stream using a topic filter, it also registers a watch on changes (new topics being added) under the broker topic registry. (Each change will trigger re-evaluation of the available topics to
determine which topics are allowed by the topic filter. A new allowed topic will trigger rebalancing among all consumers within the consumer group.)
- Force itself to rebalance within in its consumer group.強制自己在組內進行rebalance;
- Consumer rebalancing algorithm:決定哪個consumer消費該topic的哪一個partition,目的是使每個consumer和儘可能少的broker聯繫。
- 1.For each topic T that Ci subscribes to
2. let PT be all partitions producing topic T
3. let CG be all consumers in the same group as Ci that consume topic T
4. sort PT (so partitions on the same broker are clustered together)
5. sort CG
6. let i be the index position of Ci in CG and let N = size(PT)/size(CG)
7. assign partitions from i*N to (i+1)*N - 1 to consumer Ci
8. remove current entries owned by Ci from the partition owner registry
9. add newly assigned partitions to the partition owner registry
(we may need to re-try this until the original partition owner releases its ownership)