kafka文檔(9)----0.10.1-Document-文檔(1)-Getting Started

1. GETTING STARTED

1.1 Introduction

http://blog.csdn.net/beitiandijun/article/details/53671269


1.2 Use Cases

http://blog.csdn.net/beitiandijun/article/details/53693409


1.3 Quick Start

http://blog.csdn.net/beitiandijun/article/details/53690132



1.4 Ecosystem

There are a plethora of tools that integrate with Kafka outside the main distribution. The ecosystem page lists many of these, including stream processing systems, Hadoop integration, monitoring, and deployment tools.

1.4 生態系統

除了主要的分佈式功能外,kafka生態圈還有很多工具可用。ecosystem page列舉了很多工具,包括流式處理系統工具,Hadoop集成工具、監控工具以及部署工具。


1.5 Upgrading From Previous Versions

Upgrading from 0.8.x, 0.9.x or 0.10.0.X to 0.10.1.0

0.10.1.0 has wire protocol changes. By following the recommended rolling upgrade plan below, you guarantee no downtime during the upgrade. However, please notice the Potential breaking changes in 0.10.1.0 before upgrade. 
Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters before upgrading your clients (i.e. 0.10.1.x clients only support 0.10.1.x or later brokers while 0.10.1.x brokers also support older clients).


1.5 和以前版本的對比

從0.8.x,0.9.x 或者 0.10.0.x 到0.10.1.0的更新

0.10.1.0有協議的改變。按照下面建議的滾動升級方案,可以保證無宕機升級。然而,在升級前需要注意0.10.1.0中隱含的變化點

注意:由於引入了新協議,重要的是在升級你客戶端之前,需要升級kafka集羣(kafka集羣是向後兼容的,客戶端一般是向前兼容的,例如,0.10.1.x客戶端僅支持0.10.1.x或者更高版本的brokers,但是0.10.1.x版本的broker也支持老版本的客戶端)


For a rolling upgrade:

  1. Update server.properties file on all brokers and add the following properties:
  2. Upgrade the brokers one at a time: shut down the broker, update the code, and restart it.
  3. Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version and setting it to 0.10.1.0.
  4. If your previous message format is 0.10.0, change log.message.format.version to 0.10.1 (this is a no-op as the message format is the same for both 0.10.0 and 0.10.1). If your previous message format version is lower than 0.10.0, do not change log.message.format.version yet - this parameter should only change once all consumers have been upgraded to 0.10.0.0 or later.
  5. Restart the brokers one by one for the new protocol version to take effect.
  6. If log.message.format.version is still lower than 0.10.0 at this point, wait until all consumers have been upgraded to 0.10.0 or later, then change log.message.format.version to 0.10.1 on each broker and restart them one by one.

對於滾動升級來說:

    1.更新所有brokers上的server.properties文件,並增加以下配置

    2.逐臺升級brokers:停掉broker,更新代碼,然後重啓

    3.一旦所有集羣節點都升級了,下面就可以逐步升級protocol版本-通過inter.broker.protocol.version這個配置選項,設置爲0.10.1.0

    4.如果原有消息版本是0.10.0,可以直接將log.message.format.version設定爲0.10.1(這是個空指令,因爲0.10.0和0.10.1的消息格式是相同的)。如果原有消息版本低於0.10.0,則不能直接改變log.message.format.version,在改變這個參數之前,應該將所有consumer客戶端都升級到0.10.0.0或者更高版本。

   5.逐臺重啓brokers,使新協議生效

   6.如果此時log.message.format.version依然低於0.10.0,則需要等待所有consumer都升級到0.10.0或者更高版本,然後才能改變每臺broker的log.message.format.version到0.10.1,最後逐臺重啓


Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.

Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.


注意:如果你可以接受宕機,那你停掉所有brokers之後再更新代碼然後重啓它們。啓動將默認採用新協議。

注意:設定版本協議以及重啓需要在所有brokers都升級之後進行。不需要某臺機子升級之後立馬重啓。


Potential breaking changes in 0.10.1.0
  • The log retention time is no longer based on last modified time of the log segments. Instead it will be based on the largest timestamp of the messages in a log segment.
  • The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms
  • The open file handlers of 0.10.0 will increase by ~33% because of the addition of time index files for each segment.
  • The time index and offset index share the same index size configuration. Since each time index entry is 1.5x the size of offset index entry. User may need to increase log.index.size.max.bytes to avoid potential frequent log rolling.
  • Due to the increased number of index files, on some brokers with large amount the log segments (e.g. >15K), the log loading process during the broker startup could be longer. Based on our experiment, setting the num.recovery.threads.per.data.dir to one may reduce the log loading time.
0.10.1.0隱含的變更點

-日誌刪除時間不再基於日誌文件的最後變更時間,而是基於日誌文件中消息的最大的時間戳

-日誌滾動時間不在基於日誌創建時間。而是基於消息的時間戳。需要特別指出的是,如果第一條消息的時間戳是T,當新消息的時間戳大於或者等於T+log.roll.ms時,將會滾動創建新的日誌文件

-0.10.0打開文件的句柄數大約會增加33%,這是因爲因爲每個日誌文件都會存在相應的時間索引文件

-時間索引以及offset索引將使用相同的索引尺寸配置。因爲每個時間索引記錄大小是offset索引記錄大小的1.5倍。用戶可能需要提高log.index.size.max.bytes以避免可能頻繁出現的日誌滾動。

-由於增大了索引文件的數量,某些brokers可能生成大量的日誌文件,則broker啓動期間日誌加載過程會加長。基於我們的實驗,設置num.recovery.threads.per.data.dir爲1可以降低日誌加載時間。


Notable changes in 0.10.1.0
  • The new Java consumer is no longer in beta and we recommend it for all new development. The old Scala consumers are still supported, but they will be deprecated in the next release and will be removed in a future major release.
  • The --new-consumer/--new.consumer switch is no longer required to use tools like MirrorMaker and the Console Consumer with the new consumer; one simply needs to pass a Kafka broker to connect to instead of the ZooKeeper ensemble. In addition, usage of the Console Consumer with the old consumer has been deprecated and it will be removed in a future major release.
  • Kafka clusters can now be uniquely identified by a cluster id. It will be automatically generated when a broker is upgraded to 0.10.1.0. The cluster id is available via the kafka.server:type=KafkaServer,name=ClusterId metric and it is part of the Metadata response. Serializers, client interceptors and metric reporters can receive the cluster id by implementing the ClusterResourceListener interface.
  • The BrokerState "RunningAsController" (value 4) has been removed. Due to a bug, a broker would only be in this state briefly before transitioning out of it and hence the impact of the removal should be minimal. The recommended way to detect if a given broker is the controller is via the kafka.controller:type=KafkaController,name=ActiveControllerCount metric.

0.10.1.0中的顯著變化

-  java新版本的consumer不再是beta版本的,推薦在所有新開發中使用這個版本。老的Scala版本的consumers依然支持,但是下一個release版本中將會丟棄,並且在將來主要release版本中都會移除scala版本。

-  --new-consumer或者--new.consumer選項:在MirrorMaker工具中不再是必須的,在終端consumer中也不再是必須的;需要傳遞kafka broker給consumer,而不能再使用zookeeper集羣中的配置。另外,終端無法使用舊版本的consumer,將來主要release版本中也不再提供對舊版本consumer的支持。

- Kafka 集羣可以使用cluster id用來唯一標識。當broker升級到0.10.1.0版本時,會自動產生一個這樣的cluster id。通過配置選項kafka.server:type=KafkaServer,name=ClusterId就可以使用cluster id,同時cluster id會作爲metadata返回信息的一部分回饋給客戶端。Serializers,client interceptors 和metric reportes可以通過ClusterResourceListener接口獲取cluster id。

- BrokerState“RunningAsController”已經移除了。由於某個bug,broker可能會短暫的處於這種狀態,因此移除這個狀態造成的影響很小。推薦通過kafka.controller:type=KafkaController,name=ActiveControllerCount這個指標來檢查某個broker是否爲controller。


  • The new Java Consumer now allows users to search offsets by timestamp on partitions.
  • The new Java Consumer now supports heartbeating from a background thread. There is a new configuration max.poll.interval.ms which controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). The value of the configuration request.timeout.ms must always be larger thanmax.poll.interval.ms because this is the maximum time that a JoinGroup request can block on the server while the consumer is rebalancing, so we have changed its default value to just above 5 minutes. Finally, the default value of session.timeout.ms has been adjusted down to 10 seconds, and the default value of max.poll.records has been changed to 500.
  • When using an Authorizer and a user doesn't have Describe authorization on a topic, the broker will no longer return TOPIC_AUTHORIZATION_FAILED errors to requests since this leaks topic names. Instead, the UNKNOWN_TOPIC_OR_PARTITION error code will be returned. This may cause unexpected timeouts or delays when using the producer and consumer since Kafka clients will typically retry automatically on unknown topic errors. You should consult the client logs if you suspect this could be happening.
  • Fetch responses have a size limit by default (50 MB for consumers and 10 MB for replication). The existing per partition limits also apply (1 MB for consumers and replication). Note that neither of these limits is an absolute maximum as explained in the next point.
  • Consumers and replicas can make progress if a message larger than the response/partition size limit is found. More concretely, if the first message in the first non-empty partition of the fetch is larger than either or both limits, the message will still be returned.
  • Overloaded constructors were added to kafka.api.FetchRequest and kafka.javaapi.FetchRequest to allow the caller to specify the order of the partitions (since order is significant in v3). The previously existing constructors were deprecated and the partitions are shuffled before the request is sent to avoid starvation issues.
-新版本的java consumer目前支持擁護通過timestamp查詢offsets

-新版本的java consumer目前支持通過後臺線程進行心跳聯繫。新配置選項max.poll.interval.ms是用來檢查consumer是否脫離consumer group的時間間隔(默認爲5分鐘)。配置選項request.timeout.ms一般大於max.poll.interval.ms,因爲request.timeout.ms是當consumer進行重新負載均衡時,JoinGroup請求可以阻塞server的最大時間,因此我們只需要使request.timeout.ms大於5分鐘就可以了。最後,session.timeout.ms的默認值已經降低到10秒鐘了,max.poll.records的默認值已經調整爲500條了。

-當使用授權模式時,如果用戶對某個topic沒有描述授權,broker不再返回TOPIC_AUTHORIZATION_FAILED錯誤,因爲有可能會泄漏topic名字,相反,會返回錯誤碼UNKNOWN_TOPIC_OR_PARTITION。當使用producer或者consumer的客戶端遇到unknown topic error時一般會自動重試,所以這可能會造成超時或者延遲。如果想要確認客戶端是否發生了這些錯誤,需要查詢客戶端日誌。

-抓取請求的應答一般有默認大小的限制(默認情況下,consumers是50mb,備份是10mb)。針對每個partition的應答也有限制-consumers和備份是1mb。需要注意的是,這兩種限制的任何一個都不是絕對的最大值,下面將會解釋。

-即使消息大小大於response/partitions的默認大小限制,consumers和replicas依然會繼續獲取消息。更加直白的說,如果某次消息抓取中,第一個非空partition的第一個消息大於以上兩個限制的任何一個,這條消息仍然會被返回,所以上面才說不是絕對的最大值。

-負載過重的構造者(?)將會被添加到kafka.api.FetchRequest和kafka.javaapi.FetchRequest中,以允許調用者可以指定partitions的次序(因爲order在v3中很重要)。原有的構造者(?)將會被丟棄,在請求發送之前,爲了避免出現空閒的鏈接,partitions將會重新隨機分配。


New Protocol Versions
  • ListOffsetRequest v1 supports accurate offset search based on timestamps.
  • MetadataResponse v2 introduces a new field: "cluster_id".
  • FetchRequest v3 supports limiting the response size (in addition to the existing per partition limit), it returns messages bigger than the limits if required to make progress and the order of partitions in the request is now significant.
  • JoinGroup v1 introduces a new field: "rebalance_timeout".
新協議版本

-ListOffsetRequest v1支持準確的基於timestamps的offset查詢。

-MetadataResponse v2引入新的域:“cluster_id”

-FetchRequest v3 支持限制應答尺寸(還有針對每個partition的限制),如果要求broker繼續執行,則返回的消息可能大於上面的限制。同時,請求中partitions的次序也變得很重要了。

-JoinGroup v1引入新的域:rebalance_timeout


Upgrading from 0.8.x or 0.9.x to 0.10.0.0

0.10.0.0 has potential breaking changes (please review before upgrading) and possible performance impact following the upgrade. By following the recommended rolling upgrade plan below, you guarantee no downtime and no performance impact during and following the upgrade. 
Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters before upgrading your clients.

Notes to clients with version 0.9.0.0: Due to a bug introduced in 0.9.0.0, clients that depend on ZooKeeper (old Scala high-level Consumer and MirrorMaker if used with the old consumer) will not work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded to 0.9.0.1 before brokers are upgraded to 0.10.0.x. This step is not necessary for 0.8.X or 0.9.0.1 clients.


從0.8.x或者0.9.x升級到0.10.0.0

0.10.0.0有隱含的變更點(請在升級前回顧這些變更點)以及升級可能引發的潛在的性能影響。通過下面建議的升級策略,可以保證在升級期間無宕機以及沒有性能影響。

注意:由於引入了新協議。在升級客戶端之前升級kafka clusters很重要

注意:客戶端版本0.9.0.0:由於0.9.0.0引入了bug,依賴於zookeeper的客戶端(老的scala的高水位consumer以及使用老版本consumer的MirroMaker)可能無法與0.10.0.x版本的broker正常工作。因此,在brokers升級到0.10.0.x之前,0.9.0.0客戶端應當升級到0.9.0.1。對於0.8.x或者0.9.0.1客戶端來說,這是不需要的。


For a rolling upgrade:

  1. Update server.properties file on all brokers and add the following properties:
  2. Upgrade the brokers. This can be done a broker at a time by simply bringing it down, updating the code, and restarting it.
  3. Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version and setting it to 0.10.0.0. NOTE: You shouldn't touch log.message.format.version yet - this parameter should only change once all consumers have been upgraded to 0.10.0.0
  4. Restart the brokers one by one for the new protocol version to take effect.
  5. Once all consumers have been upgraded to 0.10.0, change log.message.format.version to 0.10.0 on each broker and restart them one by one.

Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.

Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.


對於滾動升級來說:

      1.升級所有brokes上的server.properties文件,並增加以下新配置:

            -inter.broker.protocol.version=CURRENT_KAFKA_VERSION(例如,0.8.2 或者0.9.0.0)

            -log.message.format.version=CURRENT_KAFKA_VERSION(查看升級中潛在的性能影響 ,查看更多細節信息)

     2.升級brokers。每次升級一臺broker:停掉broker,升級代碼,然後重啓它

     3.一旦全部cluster都升級了,可以通過編輯inter.broker.protocol.version爲0.10.0.0,來升級協議版本。

          注意:不能輕易改動log.message.format.version--這個選項只能當所有consumers已經升級到0.10.0.0之後才能升級。

     4.逐臺重啓brokers,使新協議生效

     5.一旦所有consumers都升級到0.10.0,將每臺broker升級log.message.format.version到0.10.0,然後重啓

注意:如果可以接受宕機,那可以先停掉所有broker,然後更新代碼之後,重啓所有broker。它們會默認採用新協議。

注意:升級協議版本以及重啓可以在brokers升級之後的任何時候來做,不需要升級brokers之後立馬就做。


Potential performance impact following upgrade to 0.10.0.0

The message format in 0.10.0 includes a new timestamp field and uses relative offsets for compressed messages. The on disk message format can be configured through log.message.format.version in the server.properties file. The default on-disk message format is 0.10.0. If a consumer client is on a version before 0.10.0.0, it only understands message formats before 0.10.0. In this case, the broker is able to convert messages from the 0.10.0 format to an earlier format before sending the response to the consumer on an older version. However, the broker can't use zero-copy transfer in this case. Reports from the Kafka community on the performance impact have shown CPU utilization going from 20% before to 100% after an upgrade, which forced an immediate upgrade of all clients to bring performance back to normal. To avoid such message conversion before consumers are upgraded to 0.10.0.0, one can set log.message.format.version to 0.8.2 or 0.9.0 when upgrading the broker to 0.10.0.0. This way, the broker can still use zero-copy transfer to send the data to the old consumers. Once consumers are upgraded, one can change the message format to 0.10.0 on the broker and enjoy the new message format that includes new timestamp and improved compression. The conversion is supported to ensure compatibility and can be useful to support a few apps that have not updated to newer clients yet, but is impractical to support all consumer traffic on even an overprovisioned cluster. Therefore, it is critical to avoid the message conversion as much as possible when brokers have been upgraded but the majority of clients have not.


升級到0.10.0.0之後潛在的性能影響:

0.10.0中的消息格式包含一個新加的時間戳,並對壓縮的消息採用相對offsets。磁盤上的消息格式可以通過server.properties文件中log.message.format.version來配置。磁盤上消息格式的默認版本是0.10.0.如果consumer客戶端版本早於0.10.0.0,那麼只能識別早於0.10.0版本的消息。這種情況下,broker可以將消息格式從0.10.0版本轉換爲更早的版本,以兼容更早版本的consumer。然而,broker一旦轉換則無法使用零拷貝。來自kafka 社區的報告稱:升級前後,cpu的利用率由20%提升到100%,這就可以使原本由於拷貝產生性能降低恢復正常。在consumers升級到0.10.0.0之前,爲了避免這樣的消息轉換,可以設置log.message.format.version爲0.8.2或者0.9.0.這種方法,broker可以依然使用零拷貝轉換,以發送對應格式消息給老的consumers。一旦consumers升級之後,可以在broker上更新消息格式到0.10.0,這樣consumer就可以獲取包含新增時間戳的消息格式以及經過改善的消息壓縮。消息格式的升級保證了兼容性,可以有效的支持一些沒有更新版本的客戶端。因此,對於避免broker已經更新但是大部分客戶端沒有更新時broker出現的消息轉換,這是非常關鍵的。


For clients that are upgraded to 0.10.0.0, there is no performance impact.

Note: By setting the message format version, one certifies that all existing messages are on or below that message format version. Otherwise consumers before 0.10.0.0 might break. In particular, after the message format is set to 0.10.0, one should not change it back to an earlier format as it may break consumers on versions before 0.10.0.0.

Note: Due to the additional timestamp introduced in each message, producers sending small messages may see a message throughput degradation because of the increased overhead. Likewise, replication now transmits an additional 8 bytes per message. If you're running close to the network capacity of your cluster, it's possible that you'll overwhelm the network cards and see failures and performance issues due to the overload.

Note: If you have enabled compression on producers, you may notice reduced producer throughput and/or lower compression rate on the broker in some cases. When receiving compressed messages, 0.10.0 brokers avoid recompressing the messages, which in general reduces the latency and improves the throughput. In certain cases, however, this may reduce the batching size on the producer, which could lead to worse throughput. If this happens, users can tune linger.ms and batch.size of the producer for better throughput. In addition, the producer buffer used for compressing messages with snappy is smaller than the one used by the broker, which may have a negative impact on the compression ratio for the messages on disk. We intend to make this configurable in a future Kafka release.


對於客戶端來說,升級到0.10.0.0,沒有任何性能影響。

注意:通過設置消息格式版本,可以確認識別所有原有的消息或者是當前消息格式版本下的消息

否則,0.10.0.0之前的consumer可能會意外中止消費。特別是,在消息格式升級爲0.10.0之後,不應當再將消息格式降級爲低版本,否則可能會導致consumers意外中止。


注意:由於每條消息中新引入的時間戳,生產者在發送小消息時會發現吞吐率會降低,這是由於消息頭增大了。而且,備份時每條消息也會增加額外的8字節。如果原本已經逼近集羣的網絡性能瓶頸,則時間戳的引入可能會超出網絡負載能力,並引發發送失敗。


注意:如果producer可以壓縮,可能會發現producer的吞吐率下降,或者broker壓縮性能下降。當接收到壓縮後的消息,0.10.0版本的broker會避免重壓縮,這可以減少延遲並改善吞吐率。在某些情況下,這可能會降低producer的批量發送量,這將導致較差的吞吐率。如果發生這種情況,用戶更改linger.ms以及batch.size來獲取更高的吞吐率。另外,如果在producer端採用snappy方式壓縮,則比在broker端進行壓縮耗費的內存要小,而且向磁盤上寫消息對壓縮率有副作用。我們打算在將來的kafka release版本中可以配置在哪一方進行壓縮。


Potential breaking changes in 0.10.0.0
  • Starting from Kafka 0.10.0.0, the message format version in Kafka is represented as the Kafka version. For example, message format 0.9.0 refers to the highest message version supported by Kafka 0.9.0.
  • Message format 0.10.0 has been introduced and it is used by default. It includes a timestamp field in the messages and relative offsets are used for compressed messages.
  • ProduceRequest/Response v2 has been introduced and it is used by default to support message format 0.10.0
  • FetchRequest/Response v2 has been introduced and it is used by default to support message format 0.10.0
  • MessageFormatter interface was changed from def writeTo(key: Array[Byte], value: Array[Byte], output: PrintStream) to def writeTo(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]], output: PrintStream)
  • MessageReader interface was changed from def readMessage(): KeyedMessage[Array[Byte], Array[Byte]] to def readMessage(): ProducerRecord[Array[Byte], Array[Byte]]
0.10.0.0中隱含的改變點

-從kafka 0.10.0.0開始,消息格式版本也代表了kafka版本。例如,消息格式0.9.0即kafka 0.9.0可以支持的最高版本。

-消息格式0.10.0已經引入並且默認情況下會使用。它包含了時間戳域,以及用來壓縮消息的相對的offset。

-ProduceRequest/Response v2已經引入,默認情況下支持消息格式0.10.0

-FetchRequest/Response v2已經引入,默認情況下可以支持消息格式0.10.0

-MessageFormatter接口從 def writeTo(key : Array[Byte], value : Array[Byte], output:PrintStream)  變爲def writeTo(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]], output: PrintStream)

- MessageReader接口從def readMessage(): KeyedMessage[Array[Byte], Array[Byte]]變爲 def readMessage() : ProducerRecord[Array[Byte], Array[Byte]]


  • MessageFormatter's package was changed from kafka.tools to kafka.common
  • MessageReader's package was changed from kafka.tools to kafka.common
  • MirrorMakerMessageHandler no longer exposes the handle(record: MessageAndMetadata[Array[Byte], Array[Byte]]) method as it was never called.
  • The 0.7 KafkaMigrationTool is no longer packaged with Kafka. If you need to migrate from 0.7 to 0.10.0, please migrate to 0.8 first and then follow the documented upgrade process to upgrade from 0.8 to 0.10.0.
  • The new consumer has standardized its APIs to accept java.util.Collection as the sequence type for method parameters. Existing code may have to be updated to work with the 0.10.0 client library.
  • LZ4-compressed message handling was changed to use an interoperable framing specification (LZ4f v1.5.1). To maintain compatibility with old clients, this change only applies to Message format 0.10.0 and later. Clients that Produce/Fetch LZ4-compressed messages using v0/v1 (Message format 0.9.0) should continue to use the 0.9.0 framing implementation. Clients that use Produce/Fetch protocols v2 or later should use interoperable LZ4f framing. A list of interoperable LZ4 libraries is available at http://www.lz4.org/
-MessagesFormatter's包名從kafka.tools變爲kafka.common

-MessageReader's包從kafka.tools變爲kafka.common

-MirroMakerMesageHandler 不在暴露接口: handle(record: MessageAndMetadata[Array[Byte], Array[Byte]]),因爲此方法從來沒調用過。

-0.7 KafkaMigrationTool不在由kafka封裝了。如果你想從0.7遷移到0.10.0,請首先遷移到0.8,然後下面會說如何從0.8遷移到0.10.0

-新版本的consumer已經規範化它的APIs,以接受 java.util.Collection作爲方法參數的序列類型。已存的代碼可能需要更新到0.10.0客戶端庫。

-LZ4-壓縮消息處理已經改變,採用可共同操作框架說明(LZ4f v1.5.1)。爲了與老版本客戶端的兼容性,這個改變只在0.10.0或者更高的版本中使用。使用v0/v1(0.9.0的消息格式)生產或者抓取LZ4壓縮消息的客戶端應當繼續使用0.9.0的框架說明。可共同操作的LZ4庫的說明可以查看官網。


Notable changes in 0.10.0.0
  • Starting from Kafka 0.10.0.0, a new client library named Kafka Streams is available for stream processing on data stored in Kafka topics. This new client library only works with 0.10.x and upward versioned brokers due to message format changes mentioned above. For more information please read this section.
  • The default value of the configuration parameter receive.buffer.bytes is now 64K for the new consumer.
  • The new consumer now exposes the configuration parameter exclude.internal.topics to restrict internal topics (such as the consumer offsets topic) from accidentally being included in regular expression subscriptions. By default, it is enabled.
  • The old Scala producer has been deprecated. Users should migrate their code to the Java producer included in the kafka-clients JAR as soon as possible.
  • The new consumer API has been marked stable.
0.10.0.0中顯著的變化

-從kafka 0.10.0.0開始,一個新的客戶端庫-Kafka Streams在流式處理中開始可用。這個新增kafka客戶端庫只能工作在0.10.x或者更高kafka版本上。細節信息請閱讀

-配置選項receive.buffer.bytes的默認值目前變爲64k

-新版本consumer提供了exclude.internal.topics用於嚴格區分內部使用的topic,防止正則表達式包含內部使用的topics。默認情況是生效的。

-老得Scala版本的producer已經廢棄了。用戶需要遷移它們的代碼到Java producer。

-新版本的consumer API已經標註穩定了。


Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X to 0.9.0.0

0.9.0.0 has potential breaking changes (please review before upgrading) and an inter-broker protocol change from previous versions. This means that upgraded brokers and clients may not be compatible with older versions. It is important that you upgrade your Kafka cluster before upgrading your clients. If you are using MirrorMaker downstream clusters should be upgraded first as well.

For a rolling upgrade:

  1. Update server.properties file on all brokers and add the following property: inter.broker.protocol.version=0.8.2.X
  2. Upgrade the brokers. This can be done a broker at a time by simply bringing it down, updating the code, and restarting it.
  3. Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version and setting it to 0.9.0.0.
  4. Restart the brokers one by one for the new protocol version to take effect

Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.

Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.


從0.8.0,0.8.1或者0.8.2.x升級到0.9.0.0

0.9.0.0有一些潛在的變更點(更新前請閱讀),和以前的版本相比,內部協議也發生了一些變化。這就意味着,升級後的brokers和客戶端可能與老版本並不兼容。在升級客戶端之前重要的是升級kafka 集羣版本。如果使用MirroMaker獲取消息,則也應當首先升級。

對於滾動升級來說:

   1.更新所有brokers上的server.properties文件:inter.broker.protocol.version=0.8.2.x

   2.更新brokers。這一步可以一次性做完:停掉broker,更新代碼,然後重啓它

   3.一旦所有集羣節點都更新完了,逐步更新inter.broker.protocol.version到0.9.0.0.

   4.逐臺重啓所有brokers,使新協議生效。


注意:如果你可以接受宕機,那你可以先停掉所有brokers,然後更新代碼,最後重啓所有brokers。重啓後將默認採用新協議。

注意:逐步更新協議可以在brokers升級後的任何時間做,而不是升級之後立馬做。



Potential breaking changes in 0.9.0.0
  • Java 1.6 is no longer supported.
  • Scala 2.9 is no longer supported.
  • Broker IDs above 1000 are now reserved by default to automatically assigned broker IDs. If your cluster has existing broker IDs above that threshold make sure to increase the reserved.broker.max.id broker configuration property accordingly.
  • Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
  • Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.
  • Compacted topics no longer accept messages without key and an exception is thrown by the producer if this is attempted. In 0.8.x, a message without key would cause the log compaction thread to subsequently complain and quit (and stop compacting all compacted topics).
  • MirrorMaker no longer supports multiple target clusters. As a result it will only accept a single --consumer.config parameter. To mirror multiple source clusters, you will need at least one MirrorMaker instance per source cluster, each with its own consumer configuration.
  • Tools packaged under org.apache.kafka.clients.tools.* have been moved to org.apache.kafka.tools.*. All included scripts will still function as usual, only custom code directly importing these classes will be affected.
  • The default Kafka JVM performance options (KAFKA_JVM_PERFORMANCE_OPTS) have been changed in kafka-run-class.sh.
  • The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero exit code on failure.
  • The kafka-topics.sh script (kafka.admin.TopicCommand) will now print a warning when topic names risk metric collisions due to the use of a '.' or '_' in the topic name, and error in the case of an actual collision.
  • The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) will use the Java producer instead of the old Scala producer be default, and users have to specify 'old-producer' to use the old producer.
  • By default, all command line tools will print all logging messages to stderr instead of stdout.

0.9.0.0潛在的變更點

-不再支持java1.6

-不再支持scala2.9

-大於1000的broker ids默認情況是保留的,用於自動分配broker ids。如果集羣已經有大於這個界線的broker ids,確保reserved.broker.max.id適當的增加,以防衝突。

-配置選項replica.lag.max.messages刪除了。partition leaders在決定是否刪除活躍備份節點時不在關注落後的消息條數。

-壓縮的topic不在接受沒有key的消息,如果producer嘗試發送此類消息,會收到異常的應答。在0.8.x中,沒有key的消息將造成日誌壓縮線程抱怨收到的消息並放棄壓縮(即停止壓縮所有已經壓縮的topic)

-MirroMaker不在支持多個目標集羣。只接受一個單獨的-consumer.config 參數。想要映射多個源集羣,需要每個源集羣都有一個MirrorMaker實例,每個實例都有自己的consumer配置

-org.apache.kafka.clients.tools.*下的工具遷移到org.apache.kafka.tools.*.所有包含的腳本均會保留原來的功能,只有導入這些類的客戶端代碼會受影響。

-默認的kafka JVM性能參數(KAFKA_JVM_PERFORMANCE_OPTS)在kafka-run-class.sh中已經改變了

-kafka-topic.sh腳本(kafka.admin.TopicCommand)失敗時會返回非0值。

-kafka-topic.sh腳本(kafka.admin.TopicCommand)不在打印警告信息,當topic名字包含一些諸如'.'或者‘_'等非法字符時,而是在實際崩潰時打印錯誤信息。

-kafka-console-producer.sh腳本(kafka.tools.ConsolProducer)將使用java版本的producer,而不再使用老版本的Scala producer

-默認情況下,所有命令行工具都將在stderr中打印日誌信息而不是stdout


Notable changes in 0.9.0.1
  • The new broker id generation feature can be disabled by setting broker.id.generation.enable to false.
  • Configuration parameter log.cleaner.enable is now true by default. This means topics with a cleanup.policy=compact will now be compacted by default, and 128 MB of heap will be allocated to the cleaner process via log.cleaner.dedupe.buffer.size. You may want to review log.cleaner.dedupe.buffer.size and the other log.cleaner configuration values based on your usage of compacted topics.
  • Default value of configuration parameter fetch.min.bytes for the new consumer is now 1 by default.
0.9.0.1中顯著的變化

-新增的broker id創建特徵可以通過設置broker.id.generation.enable爲false來禁用

-配置選項log.cleaner.enable目前默認情況是true。這意味着使用cleanup.policy=compact的topics將默認被壓縮。 cleaner進程通過log.cleaner.dequpe.buffer.size將會分配128mb的堆空間。可以回顧一下log.cleaner.dedupe.buffer.size以及其它log.cleaner配置值來確認一下如何使用壓縮策略。

-配置選項fetch.min.bytes的默認值在新版本的consumer中爲1.


Deprecations in 0.9.0.0
  • Altering topic configuration from the kafka-topics.sh script (kafka.admin.TopicCommand) has been deprecated. Going forward, please use the kafka-configs.sh script (kafka.admin.ConfigCommand) for this functionality.
  • The kafka-consumer-offset-checker.sh (kafka.tools.ConsumerOffsetChecker) has been deprecated. Going forward, please use kafka-consumer-groups.sh (kafka.admin.ConsumerGroupCommand) for this functionality.
  • The kafka.tools.ProducerPerformance class has been deprecated. Going forward, please use org.apache.kafka.tools.ProducerPerformance for this functionality (kafka-producer-perf-test.sh will also be changed to use the new class).
  • The producer config block.on.buffer.full has been deprecated and will be removed in future release. Currently its default value has been changed to false. The KafkaProducer will no longer throw BufferExhaustedException but instead will use max.block.ms value to block, after which it will throw a TimeoutException. If block.on.buffer.full property is set to true explicitly, it will set the max.block.ms to Long.MAX_VALUE and metadata.fetch.timeout.ms will not be honoured
0.9.0.0中刪除的東西

-kafka-topics.sh腳本(kafka.admin.TopicCommand)中變更topic的配置廢棄了。以後可以使用kafka-configs.sh腳本(kafka.admin.ConfigCommand)實現這個功能。

-kafka-consumer-offset-checker.sh(kafka.tools.ConsumerOffsetChecer)已經廢棄了。以後可以使用kafka-consumer-groups.sh(kafka.admin.ConsumerGroupCommand)來實現這個功能

-kafka.tools.ProducerPerformance類已經移除了。以後可以使用org.apache.kafka.tools.ProducerPerformance使用這個功能(kafka-producer-perf-test.sh將使用新的類)

-producer的配置選項block.on.buffer.full已經廢棄了,將會在以後的release版本中刪除。當前默認值已經變爲false了。KafkaProducer不在拋出BufferExhaustedException異常了,而是使用max.block.ms值進行阻塞,以後會拋出TimeoutException異常。如果blcok.buffer.full特徵設置成隱式調用,將設置max.block.ms爲Long.MAX_VALUE以及metadata.fetch.timeout.ms不再尊敬(?)


Upgrading from 0.8.1 to 0.8.2

0.8.2 is fully compatible with 0.8.1. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it.

Upgrading from 0.8.0 to 0.8.1

0.8.1 is fully compatible with 0.8. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it.

Upgrading from 0.7

Release 0.7 is incompatible with newer releases. Major changes were made to the API, ZooKeeper data structures, and protocol, and configuration in order to add replication (Which was missing in 0.7). The upgrade from 0.7 to later versions requires a special tool for migration. This migration can be done without downtime.


從0.8.0升級到0.8.1


0.8.2和0.8.1完全兼容。升級可以逐臺進行:先停掉broker,然後升級代碼,最後重啓它


從0.7升級

0.7發佈版和新版本不兼容。主要的變化在於API,zookeeper的數據結構,以及協議,還有添加備份的配置(0.7版本沒有這個配置)。從0.7升級到更新版本需要一個特殊的工具,遷移過程不需要宕機。


發佈了130 篇原創文章 · 獲贊 40 · 訪問量 82萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章