Kafka目前爲java提供了兩種consumer的API:
- high level consumer api
該consumer api 封裝了很多consumer需要的高級功能,如
- Auto/Hidden Offset Management
- Auto(Simple) Partition Assignment
- Broker Failover => Auto Rebalance
- Consumer Failover => Auto Rebalance
- If user do not want any of these, then simple consumer is sufficient
- If user want to control over offset management with others unchanged, one option is to expose the current ZK implementation of the high-level consumer to users and allow them to override; another option is to change the high-level consumer API to return the offset vector associated with messages
- If user want to control partition assignment, one option is to change the high-level consumer API to allow such config info be passed in while creating the stream; another option is ad-hoc: just make a single-partition topic and assign it to the consumer.
- If user just want the automatic partition assignment be more "smart" with co-location consideration, etc, one option is to store the host/rack info in ZK and let the rebalance algorithm read them while doing the computation.
該consumer默認會把自己的信息寫在zk路徑 /consumers/<groupId>,其中包括
- offsets 該topic的<partition_num>上的offset的值
- owners 當前<topic>的每個partition,在該<groupId>下能收取數據的consumer的唯一ID
- ids 當前<groupId>的所有consumer列表
正常情況下,High level consumer可以滿足我們日常大多數用途。
- simple consumer api
只有最基本的鏈接、讀取功能,可以自己去讀offset,並指定offset的讀取方式。適合於各種自定義。
Kafka的監控目前有兩種方式:
1. JMX
Kafka內置有一個Mx4jLoader的程序,該程序如果在classpath中發現了mx4j-tools.jar,就會加載該jar,在8082 可以查看MX4J提供的網頁信息。
除該內置的接口外,也可以自行修改Java啓動命令,加入jmx。然後基於jmx集成到各大監控系統,如Zabbix, Ganglia等。後者直接github上直接有一個項目(猛擊這裏)
2. zookeeper
典型監控有kafkamonitor 和 kafka-web-console
兩者的安裝都比較簡單。這裏就不再多寫了,可直接參見。
看官方wiki說,0.9開始似乎要對consumer的api有大改動,個人是比較支持的。目前consumer的api看上去是有點要麼過於簡單、要麼封裝過深。
wiki:https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design