kafka文档(9)----0.10.1-Document-文档(1)-Getting Started

1. GETTING STARTED

1.1 Introduction

http://blog.csdn.net/beitiandijun/article/details/53671269


1.2 Use Cases

http://blog.csdn.net/beitiandijun/article/details/53693409


1.3 Quick Start

http://blog.csdn.net/beitiandijun/article/details/53690132



1.4 Ecosystem

There are a plethora of tools that integrate with Kafka outside the main distribution. The ecosystem page lists many of these, including stream processing systems, Hadoop integration, monitoring, and deployment tools.

1.4 生态系统

除了主要的分布式功能外,kafka生态圈还有很多工具可用。ecosystem page列举了很多工具,包括流式处理系统工具,Hadoop集成工具、监控工具以及部署工具。


1.5 Upgrading From Previous Versions

Upgrading from 0.8.x, 0.9.x or 0.10.0.X to 0.10.1.0

0.10.1.0 has wire protocol changes. By following the recommended rolling upgrade plan below, you guarantee no downtime during the upgrade. However, please notice the Potential breaking changes in 0.10.1.0 before upgrade. 
Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters before upgrading your clients (i.e. 0.10.1.x clients only support 0.10.1.x or later brokers while 0.10.1.x brokers also support older clients).


1.5 和以前版本的对比

从0.8.x,0.9.x 或者 0.10.0.x 到0.10.1.0的更新

0.10.1.0有协议的改变。按照下面建议的滚动升级方案,可以保证无宕机升级。然而,在升级前需要注意0.10.1.0中隐含的变化点

注意:由于引入了新协议,重要的是在升级你客户端之前,需要升级kafka集群(kafka集群是向后兼容的,客户端一般是向前兼容的,例如,0.10.1.x客户端仅支持0.10.1.x或者更高版本的brokers,但是0.10.1.x版本的broker也支持老版本的客户端)


For a rolling upgrade:

  1. Update server.properties file on all brokers and add the following properties:
  2. Upgrade the brokers one at a time: shut down the broker, update the code, and restart it.
  3. Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version and setting it to 0.10.1.0.
  4. If your previous message format is 0.10.0, change log.message.format.version to 0.10.1 (this is a no-op as the message format is the same for both 0.10.0 and 0.10.1). If your previous message format version is lower than 0.10.0, do not change log.message.format.version yet - this parameter should only change once all consumers have been upgraded to 0.10.0.0 or later.
  5. Restart the brokers one by one for the new protocol version to take effect.
  6. If log.message.format.version is still lower than 0.10.0 at this point, wait until all consumers have been upgraded to 0.10.0 or later, then change log.message.format.version to 0.10.1 on each broker and restart them one by one.

对于滚动升级来说:

    1.更新所有brokers上的server.properties文件,并增加以下配置

    2.逐台升级brokers:停掉broker,更新代码,然后重启

    3.一旦所有集群节点都升级了,下面就可以逐步升级protocol版本-通过inter.broker.protocol.version这个配置选项,设置为0.10.1.0

    4.如果原有消息版本是0.10.0,可以直接将log.message.format.version设定为0.10.1(这是个空指令,因为0.10.0和0.10.1的消息格式是相同的)。如果原有消息版本低于0.10.0,则不能直接改变log.message.format.version,在改变这个参数之前,应该将所有consumer客户端都升级到0.10.0.0或者更高版本。

   5.逐台重启brokers,使新协议生效

   6.如果此时log.message.format.version依然低于0.10.0,则需要等待所有consumer都升级到0.10.0或者更高版本,然后才能改变每台broker的log.message.format.version到0.10.1,最后逐台重启


Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.

Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.


注意:如果你可以接受宕机,那你停掉所有brokers之后再更新代码然后重启它们。启动将默认采用新协议。

注意:设定版本协议以及重启需要在所有brokers都升级之后进行。不需要某台机子升级之后立马重启。


Potential breaking changes in 0.10.1.0
  • The log retention time is no longer based on last modified time of the log segments. Instead it will be based on the largest timestamp of the messages in a log segment.
  • The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms
  • The open file handlers of 0.10.0 will increase by ~33% because of the addition of time index files for each segment.
  • The time index and offset index share the same index size configuration. Since each time index entry is 1.5x the size of offset index entry. User may need to increase log.index.size.max.bytes to avoid potential frequent log rolling.
  • Due to the increased number of index files, on some brokers with large amount the log segments (e.g. >15K), the log loading process during the broker startup could be longer. Based on our experiment, setting the num.recovery.threads.per.data.dir to one may reduce the log loading time.
0.10.1.0隐含的变更点

-日志删除时间不再基于日志文件的最后变更时间,而是基于日志文件中消息的最大的时间戳

-日志滚动时间不在基于日志创建时间。而是基于消息的时间戳。需要特别指出的是,如果第一条消息的时间戳是T,当新消息的时间戳大于或者等于T+log.roll.ms时,将会滚动创建新的日志文件

-0.10.0打开文件的句柄数大约会增加33%,这是因为因为每个日志文件都会存在相应的时间索引文件

-时间索引以及offset索引将使用相同的索引尺寸配置。因为每个时间索引记录大小是offset索引记录大小的1.5倍。用户可能需要提高log.index.size.max.bytes以避免可能频繁出现的日志滚动。

-由于增大了索引文件的数量,某些brokers可能生成大量的日志文件,则broker启动期间日志加载过程会加长。基于我们的实验,设置num.recovery.threads.per.data.dir为1可以降低日志加载时间。


Notable changes in 0.10.1.0
  • The new Java consumer is no longer in beta and we recommend it for all new development. The old Scala consumers are still supported, but they will be deprecated in the next release and will be removed in a future major release.
  • The --new-consumer/--new.consumer switch is no longer required to use tools like MirrorMaker and the Console Consumer with the new consumer; one simply needs to pass a Kafka broker to connect to instead of the ZooKeeper ensemble. In addition, usage of the Console Consumer with the old consumer has been deprecated and it will be removed in a future major release.
  • Kafka clusters can now be uniquely identified by a cluster id. It will be automatically generated when a broker is upgraded to 0.10.1.0. The cluster id is available via the kafka.server:type=KafkaServer,name=ClusterId metric and it is part of the Metadata response. Serializers, client interceptors and metric reporters can receive the cluster id by implementing the ClusterResourceListener interface.
  • The BrokerState "RunningAsController" (value 4) has been removed. Due to a bug, a broker would only be in this state briefly before transitioning out of it and hence the impact of the removal should be minimal. The recommended way to detect if a given broker is the controller is via the kafka.controller:type=KafkaController,name=ActiveControllerCount metric.

0.10.1.0中的显著变化

-  java新版本的consumer不再是beta版本的,推荐在所有新开发中使用这个版本。老的Scala版本的consumers依然支持,但是下一个release版本中将会丢弃,并且在将来主要release版本中都会移除scala版本。

-  --new-consumer或者--new.consumer选项:在MirrorMaker工具中不再是必须的,在终端consumer中也不再是必须的;需要传递kafka broker给consumer,而不能再使用zookeeper集群中的配置。另外,终端无法使用旧版本的consumer,将来主要release版本中也不再提供对旧版本consumer的支持。

- Kafka 集群可以使用cluster id用来唯一标识。当broker升级到0.10.1.0版本时,会自动产生一个这样的cluster id。通过配置选项kafka.server:type=KafkaServer,name=ClusterId就可以使用cluster id,同时cluster id会作为metadata返回信息的一部分回馈给客户端。Serializers,client interceptors 和metric reportes可以通过ClusterResourceListener接口获取cluster id。

- BrokerState“RunningAsController”已经移除了。由于某个bug,broker可能会短暂的处于这种状态,因此移除这个状态造成的影响很小。推荐通过kafka.controller:type=KafkaController,name=ActiveControllerCount这个指标来检查某个broker是否为controller。


  • The new Java Consumer now allows users to search offsets by timestamp on partitions.
  • The new Java Consumer now supports heartbeating from a background thread. There is a new configuration max.poll.interval.ms which controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). The value of the configuration request.timeout.ms must always be larger thanmax.poll.interval.ms because this is the maximum time that a JoinGroup request can block on the server while the consumer is rebalancing, so we have changed its default value to just above 5 minutes. Finally, the default value of session.timeout.ms has been adjusted down to 10 seconds, and the default value of max.poll.records has been changed to 500.
  • When using an Authorizer and a user doesn't have Describe authorization on a topic, the broker will no longer return TOPIC_AUTHORIZATION_FAILED errors to requests since this leaks topic names. Instead, the UNKNOWN_TOPIC_OR_PARTITION error code will be returned. This may cause unexpected timeouts or delays when using the producer and consumer since Kafka clients will typically retry automatically on unknown topic errors. You should consult the client logs if you suspect this could be happening.
  • Fetch responses have a size limit by default (50 MB for consumers and 10 MB for replication). The existing per partition limits also apply (1 MB for consumers and replication). Note that neither of these limits is an absolute maximum as explained in the next point.
  • Consumers and replicas can make progress if a message larger than the response/partition size limit is found. More concretely, if the first message in the first non-empty partition of the fetch is larger than either or both limits, the message will still be returned.
  • Overloaded constructors were added to kafka.api.FetchRequest and kafka.javaapi.FetchRequest to allow the caller to specify the order of the partitions (since order is significant in v3). The previously existing constructors were deprecated and the partitions are shuffled before the request is sent to avoid starvation issues.
-新版本的java consumer目前支持拥护通过timestamp查询offsets

-新版本的java consumer目前支持通过后台线程进行心跳联系。新配置选项max.poll.interval.ms是用来检查consumer是否脱离consumer group的时间间隔(默认为5分钟)。配置选项request.timeout.ms一般大于max.poll.interval.ms,因为request.timeout.ms是当consumer进行重新负载均衡时,JoinGroup请求可以阻塞server的最大时间,因此我们只需要使request.timeout.ms大于5分钟就可以了。最后,session.timeout.ms的默认值已经降低到10秒钟了,max.poll.records的默认值已经调整为500条了。

-当使用授权模式时,如果用户对某个topic没有描述授权,broker不再返回TOPIC_AUTHORIZATION_FAILED错误,因为有可能会泄漏topic名字,相反,会返回错误码UNKNOWN_TOPIC_OR_PARTITION。当使用producer或者consumer的客户端遇到unknown topic error时一般会自动重试,所以这可能会造成超时或者延迟。如果想要确认客户端是否发生了这些错误,需要查询客户端日志。

-抓取请求的应答一般有默认大小的限制(默认情况下,consumers是50mb,备份是10mb)。针对每个partition的应答也有限制-consumers和备份是1mb。需要注意的是,这两种限制的任何一个都不是绝对的最大值,下面将会解释。

-即使消息大小大于response/partitions的默认大小限制,consumers和replicas依然会继续获取消息。更加直白的说,如果某次消息抓取中,第一个非空partition的第一个消息大于以上两个限制的任何一个,这条消息仍然会被返回,所以上面才说不是绝对的最大值。

-负载过重的构造者(?)将会被添加到kafka.api.FetchRequest和kafka.javaapi.FetchRequest中,以允许调用者可以指定partitions的次序(因为order在v3中很重要)。原有的构造者(?)将会被丢弃,在请求发送之前,为了避免出现空闲的链接,partitions将会重新随机分配。


New Protocol Versions
  • ListOffsetRequest v1 supports accurate offset search based on timestamps.
  • MetadataResponse v2 introduces a new field: "cluster_id".
  • FetchRequest v3 supports limiting the response size (in addition to the existing per partition limit), it returns messages bigger than the limits if required to make progress and the order of partitions in the request is now significant.
  • JoinGroup v1 introduces a new field: "rebalance_timeout".
新协议版本

-ListOffsetRequest v1支持准确的基于timestamps的offset查询。

-MetadataResponse v2引入新的域:“cluster_id”

-FetchRequest v3 支持限制应答尺寸(还有针对每个partition的限制),如果要求broker继续执行,则返回的消息可能大于上面的限制。同时,请求中partitions的次序也变得很重要了。

-JoinGroup v1引入新的域:rebalance_timeout


Upgrading from 0.8.x or 0.9.x to 0.10.0.0

0.10.0.0 has potential breaking changes (please review before upgrading) and possible performance impact following the upgrade. By following the recommended rolling upgrade plan below, you guarantee no downtime and no performance impact during and following the upgrade. 
Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters before upgrading your clients.

Notes to clients with version 0.9.0.0: Due to a bug introduced in 0.9.0.0, clients that depend on ZooKeeper (old Scala high-level Consumer and MirrorMaker if used with the old consumer) will not work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded to 0.9.0.1 before brokers are upgraded to 0.10.0.x. This step is not necessary for 0.8.X or 0.9.0.1 clients.


从0.8.x或者0.9.x升级到0.10.0.0

0.10.0.0有隐含的变更点(请在升级前回顾这些变更点)以及升级可能引发的潜在的性能影响。通过下面建议的升级策略,可以保证在升级期间无宕机以及没有性能影响。

注意:由于引入了新协议。在升级客户端之前升级kafka clusters很重要

注意:客户端版本0.9.0.0:由于0.9.0.0引入了bug,依赖于zookeeper的客户端(老的scala的高水位consumer以及使用老版本consumer的MirroMaker)可能无法与0.10.0.x版本的broker正常工作。因此,在brokers升级到0.10.0.x之前,0.9.0.0客户端应当升级到0.9.0.1。对于0.8.x或者0.9.0.1客户端来说,这是不需要的。


For a rolling upgrade:

  1. Update server.properties file on all brokers and add the following properties:
  2. Upgrade the brokers. This can be done a broker at a time by simply bringing it down, updating the code, and restarting it.
  3. Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version and setting it to 0.10.0.0. NOTE: You shouldn't touch log.message.format.version yet - this parameter should only change once all consumers have been upgraded to 0.10.0.0
  4. Restart the brokers one by one for the new protocol version to take effect.
  5. Once all consumers have been upgraded to 0.10.0, change log.message.format.version to 0.10.0 on each broker and restart them one by one.

Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.

Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.


对于滚动升级来说:

      1.升级所有brokes上的server.properties文件,并增加以下新配置:

            -inter.broker.protocol.version=CURRENT_KAFKA_VERSION(例如,0.8.2 或者0.9.0.0)

            -log.message.format.version=CURRENT_KAFKA_VERSION(查看升级中潜在的性能影响 ,查看更多细节信息)

     2.升级brokers。每次升级一台broker:停掉broker,升级代码,然后重启它

     3.一旦全部cluster都升级了,可以通过编辑inter.broker.protocol.version为0.10.0.0,来升级协议版本。

          注意:不能轻易改动log.message.format.version--这个选项只能当所有consumers已经升级到0.10.0.0之后才能升级。

     4.逐台重启brokers,使新协议生效

     5.一旦所有consumers都升级到0.10.0,将每台broker升级log.message.format.version到0.10.0,然后重启

注意:如果可以接受宕机,那可以先停掉所有broker,然后更新代码之后,重启所有broker。它们会默认采用新协议。

注意:升级协议版本以及重启可以在brokers升级之后的任何时候来做,不需要升级brokers之后立马就做。


Potential performance impact following upgrade to 0.10.0.0

The message format in 0.10.0 includes a new timestamp field and uses relative offsets for compressed messages. The on disk message format can be configured through log.message.format.version in the server.properties file. The default on-disk message format is 0.10.0. If a consumer client is on a version before 0.10.0.0, it only understands message formats before 0.10.0. In this case, the broker is able to convert messages from the 0.10.0 format to an earlier format before sending the response to the consumer on an older version. However, the broker can't use zero-copy transfer in this case. Reports from the Kafka community on the performance impact have shown CPU utilization going from 20% before to 100% after an upgrade, which forced an immediate upgrade of all clients to bring performance back to normal. To avoid such message conversion before consumers are upgraded to 0.10.0.0, one can set log.message.format.version to 0.8.2 or 0.9.0 when upgrading the broker to 0.10.0.0. This way, the broker can still use zero-copy transfer to send the data to the old consumers. Once consumers are upgraded, one can change the message format to 0.10.0 on the broker and enjoy the new message format that includes new timestamp and improved compression. The conversion is supported to ensure compatibility and can be useful to support a few apps that have not updated to newer clients yet, but is impractical to support all consumer traffic on even an overprovisioned cluster. Therefore, it is critical to avoid the message conversion as much as possible when brokers have been upgraded but the majority of clients have not.


升级到0.10.0.0之后潜在的性能影响:

0.10.0中的消息格式包含一个新加的时间戳,并对压缩的消息采用相对offsets。磁盘上的消息格式可以通过server.properties文件中log.message.format.version来配置。磁盘上消息格式的默认版本是0.10.0.如果consumer客户端版本早于0.10.0.0,那么只能识别早于0.10.0版本的消息。这种情况下,broker可以将消息格式从0.10.0版本转换为更早的版本,以兼容更早版本的consumer。然而,broker一旦转换则无法使用零拷贝。来自kafka 社区的报告称:升级前后,cpu的利用率由20%提升到100%,这就可以使原本由于拷贝产生性能降低恢复正常。在consumers升级到0.10.0.0之前,为了避免这样的消息转换,可以设置log.message.format.version为0.8.2或者0.9.0.这种方法,broker可以依然使用零拷贝转换,以发送对应格式消息给老的consumers。一旦consumers升级之后,可以在broker上更新消息格式到0.10.0,这样consumer就可以获取包含新增时间戳的消息格式以及经过改善的消息压缩。消息格式的升级保证了兼容性,可以有效的支持一些没有更新版本的客户端。因此,对于避免broker已经更新但是大部分客户端没有更新时broker出现的消息转换,这是非常关键的。


For clients that are upgraded to 0.10.0.0, there is no performance impact.

Note: By setting the message format version, one certifies that all existing messages are on or below that message format version. Otherwise consumers before 0.10.0.0 might break. In particular, after the message format is set to 0.10.0, one should not change it back to an earlier format as it may break consumers on versions before 0.10.0.0.

Note: Due to the additional timestamp introduced in each message, producers sending small messages may see a message throughput degradation because of the increased overhead. Likewise, replication now transmits an additional 8 bytes per message. If you're running close to the network capacity of your cluster, it's possible that you'll overwhelm the network cards and see failures and performance issues due to the overload.

Note: If you have enabled compression on producers, you may notice reduced producer throughput and/or lower compression rate on the broker in some cases. When receiving compressed messages, 0.10.0 brokers avoid recompressing the messages, which in general reduces the latency and improves the throughput. In certain cases, however, this may reduce the batching size on the producer, which could lead to worse throughput. If this happens, users can tune linger.ms and batch.size of the producer for better throughput. In addition, the producer buffer used for compressing messages with snappy is smaller than the one used by the broker, which may have a negative impact on the compression ratio for the messages on disk. We intend to make this configurable in a future Kafka release.


对于客户端来说,升级到0.10.0.0,没有任何性能影响。

注意:通过设置消息格式版本,可以确认识别所有原有的消息或者是当前消息格式版本下的消息

否则,0.10.0.0之前的consumer可能会意外中止消费。特别是,在消息格式升级为0.10.0之后,不应当再将消息格式降级为低版本,否则可能会导致consumers意外中止。


注意:由于每条消息中新引入的时间戳,生产者在发送小消息时会发现吞吐率会降低,这是由于消息头增大了。而且,备份时每条消息也会增加额外的8字节。如果原本已经逼近集群的网络性能瓶颈,则时间戳的引入可能会超出网络负载能力,并引发发送失败。


注意:如果producer可以压缩,可能会发现producer的吞吐率下降,或者broker压缩性能下降。当接收到压缩后的消息,0.10.0版本的broker会避免重压缩,这可以减少延迟并改善吞吐率。在某些情况下,这可能会降低producer的批量发送量,这将导致较差的吞吐率。如果发生这种情况,用户更改linger.ms以及batch.size来获取更高的吞吐率。另外,如果在producer端采用snappy方式压缩,则比在broker端进行压缩耗费的内存要小,而且向磁盘上写消息对压缩率有副作用。我们打算在将来的kafka release版本中可以配置在哪一方进行压缩。


Potential breaking changes in 0.10.0.0
  • Starting from Kafka 0.10.0.0, the message format version in Kafka is represented as the Kafka version. For example, message format 0.9.0 refers to the highest message version supported by Kafka 0.9.0.
  • Message format 0.10.0 has been introduced and it is used by default. It includes a timestamp field in the messages and relative offsets are used for compressed messages.
  • ProduceRequest/Response v2 has been introduced and it is used by default to support message format 0.10.0
  • FetchRequest/Response v2 has been introduced and it is used by default to support message format 0.10.0
  • MessageFormatter interface was changed from def writeTo(key: Array[Byte], value: Array[Byte], output: PrintStream) to def writeTo(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]], output: PrintStream)
  • MessageReader interface was changed from def readMessage(): KeyedMessage[Array[Byte], Array[Byte]] to def readMessage(): ProducerRecord[Array[Byte], Array[Byte]]
0.10.0.0中隐含的改变点

-从kafka 0.10.0.0开始,消息格式版本也代表了kafka版本。例如,消息格式0.9.0即kafka 0.9.0可以支持的最高版本。

-消息格式0.10.0已经引入并且默认情况下会使用。它包含了时间戳域,以及用来压缩消息的相对的offset。

-ProduceRequest/Response v2已经引入,默认情况下支持消息格式0.10.0

-FetchRequest/Response v2已经引入,默认情况下可以支持消息格式0.10.0

-MessageFormatter接口从 def writeTo(key : Array[Byte], value : Array[Byte], output:PrintStream)  变为def writeTo(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]], output: PrintStream)

- MessageReader接口从def readMessage(): KeyedMessage[Array[Byte], Array[Byte]]变为 def readMessage() : ProducerRecord[Array[Byte], Array[Byte]]


  • MessageFormatter's package was changed from kafka.tools to kafka.common
  • MessageReader's package was changed from kafka.tools to kafka.common
  • MirrorMakerMessageHandler no longer exposes the handle(record: MessageAndMetadata[Array[Byte], Array[Byte]]) method as it was never called.
  • The 0.7 KafkaMigrationTool is no longer packaged with Kafka. If you need to migrate from 0.7 to 0.10.0, please migrate to 0.8 first and then follow the documented upgrade process to upgrade from 0.8 to 0.10.0.
  • The new consumer has standardized its APIs to accept java.util.Collection as the sequence type for method parameters. Existing code may have to be updated to work with the 0.10.0 client library.
  • LZ4-compressed message handling was changed to use an interoperable framing specification (LZ4f v1.5.1). To maintain compatibility with old clients, this change only applies to Message format 0.10.0 and later. Clients that Produce/Fetch LZ4-compressed messages using v0/v1 (Message format 0.9.0) should continue to use the 0.9.0 framing implementation. Clients that use Produce/Fetch protocols v2 or later should use interoperable LZ4f framing. A list of interoperable LZ4 libraries is available at http://www.lz4.org/
-MessagesFormatter's包名从kafka.tools变为kafka.common

-MessageReader's包从kafka.tools变为kafka.common

-MirroMakerMesageHandler 不在暴露接口: handle(record: MessageAndMetadata[Array[Byte], Array[Byte]]),因为此方法从来没调用过。

-0.7 KafkaMigrationTool不在由kafka封装了。如果你想从0.7迁移到0.10.0,请首先迁移到0.8,然后下面会说如何从0.8迁移到0.10.0

-新版本的consumer已经规范化它的APIs,以接受 java.util.Collection作为方法参数的序列类型。已存的代码可能需要更新到0.10.0客户端库。

-LZ4-压缩消息处理已经改变,采用可共同操作框架说明(LZ4f v1.5.1)。为了与老版本客户端的兼容性,这个改变只在0.10.0或者更高的版本中使用。使用v0/v1(0.9.0的消息格式)生产或者抓取LZ4压缩消息的客户端应当继续使用0.9.0的框架说明。可共同操作的LZ4库的说明可以查看官网。


Notable changes in 0.10.0.0
  • Starting from Kafka 0.10.0.0, a new client library named Kafka Streams is available for stream processing on data stored in Kafka topics. This new client library only works with 0.10.x and upward versioned brokers due to message format changes mentioned above. For more information please read this section.
  • The default value of the configuration parameter receive.buffer.bytes is now 64K for the new consumer.
  • The new consumer now exposes the configuration parameter exclude.internal.topics to restrict internal topics (such as the consumer offsets topic) from accidentally being included in regular expression subscriptions. By default, it is enabled.
  • The old Scala producer has been deprecated. Users should migrate their code to the Java producer included in the kafka-clients JAR as soon as possible.
  • The new consumer API has been marked stable.
0.10.0.0中显著的变化

-从kafka 0.10.0.0开始,一个新的客户端库-Kafka Streams在流式处理中开始可用。这个新增kafka客户端库只能工作在0.10.x或者更高kafka版本上。细节信息请阅读

-配置选项receive.buffer.bytes的默认值目前变为64k

-新版本consumer提供了exclude.internal.topics用于严格区分内部使用的topic,防止正则表达式包含内部使用的topics。默认情况是生效的。

-老得Scala版本的producer已经废弃了。用户需要迁移它们的代码到Java producer。

-新版本的consumer API已经标注稳定了。


Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X to 0.9.0.0

0.9.0.0 has potential breaking changes (please review before upgrading) and an inter-broker protocol change from previous versions. This means that upgraded brokers and clients may not be compatible with older versions. It is important that you upgrade your Kafka cluster before upgrading your clients. If you are using MirrorMaker downstream clusters should be upgraded first as well.

For a rolling upgrade:

  1. Update server.properties file on all brokers and add the following property: inter.broker.protocol.version=0.8.2.X
  2. Upgrade the brokers. This can be done a broker at a time by simply bringing it down, updating the code, and restarting it.
  3. Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version and setting it to 0.9.0.0.
  4. Restart the brokers one by one for the new protocol version to take effect

Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.

Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.


从0.8.0,0.8.1或者0.8.2.x升级到0.9.0.0

0.9.0.0有一些潜在的变更点(更新前请阅读),和以前的版本相比,内部协议也发生了一些变化。这就意味着,升级后的brokers和客户端可能与老版本并不兼容。在升级客户端之前重要的是升级kafka 集群版本。如果使用MirroMaker获取消息,则也应当首先升级。

对于滚动升级来说:

   1.更新所有brokers上的server.properties文件:inter.broker.protocol.version=0.8.2.x

   2.更新brokers。这一步可以一次性做完:停掉broker,更新代码,然后重启它

   3.一旦所有集群节点都更新完了,逐步更新inter.broker.protocol.version到0.9.0.0.

   4.逐台重启所有brokers,使新协议生效。


注意:如果你可以接受宕机,那你可以先停掉所有brokers,然后更新代码,最后重启所有brokers。重启后将默认采用新协议。

注意:逐步更新协议可以在brokers升级后的任何时间做,而不是升级之后立马做。



Potential breaking changes in 0.9.0.0
  • Java 1.6 is no longer supported.
  • Scala 2.9 is no longer supported.
  • Broker IDs above 1000 are now reserved by default to automatically assigned broker IDs. If your cluster has existing broker IDs above that threshold make sure to increase the reserved.broker.max.id broker configuration property accordingly.
  • Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
  • Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.
  • Compacted topics no longer accept messages without key and an exception is thrown by the producer if this is attempted. In 0.8.x, a message without key would cause the log compaction thread to subsequently complain and quit (and stop compacting all compacted topics).
  • MirrorMaker no longer supports multiple target clusters. As a result it will only accept a single --consumer.config parameter. To mirror multiple source clusters, you will need at least one MirrorMaker instance per source cluster, each with its own consumer configuration.
  • Tools packaged under org.apache.kafka.clients.tools.* have been moved to org.apache.kafka.tools.*. All included scripts will still function as usual, only custom code directly importing these classes will be affected.
  • The default Kafka JVM performance options (KAFKA_JVM_PERFORMANCE_OPTS) have been changed in kafka-run-class.sh.
  • The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero exit code on failure.
  • The kafka-topics.sh script (kafka.admin.TopicCommand) will now print a warning when topic names risk metric collisions due to the use of a '.' or '_' in the topic name, and error in the case of an actual collision.
  • The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) will use the Java producer instead of the old Scala producer be default, and users have to specify 'old-producer' to use the old producer.
  • By default, all command line tools will print all logging messages to stderr instead of stdout.

0.9.0.0潜在的变更点

-不再支持java1.6

-不再支持scala2.9

-大于1000的broker ids默认情况是保留的,用于自动分配broker ids。如果集群已经有大于这个界线的broker ids,确保reserved.broker.max.id适当的增加,以防冲突。

-配置选项replica.lag.max.messages删除了。partition leaders在决定是否删除活跃备份节点时不在关注落后的消息条数。

-压缩的topic不在接受没有key的消息,如果producer尝试发送此类消息,会收到异常的应答。在0.8.x中,没有key的消息将造成日志压缩线程抱怨收到的消息并放弃压缩(即停止压缩所有已经压缩的topic)

-MirroMaker不在支持多个目标集群。只接受一个单独的-consumer.config 参数。想要映射多个源集群,需要每个源集群都有一个MirrorMaker实例,每个实例都有自己的consumer配置

-org.apache.kafka.clients.tools.*下的工具迁移到org.apache.kafka.tools.*.所有包含的脚本均会保留原来的功能,只有导入这些类的客户端代码会受影响。

-默认的kafka JVM性能参数(KAFKA_JVM_PERFORMANCE_OPTS)在kafka-run-class.sh中已经改变了

-kafka-topic.sh脚本(kafka.admin.TopicCommand)失败时会返回非0值。

-kafka-topic.sh脚本(kafka.admin.TopicCommand)不在打印警告信息,当topic名字包含一些诸如'.'或者‘_'等非法字符时,而是在实际崩溃时打印错误信息。

-kafka-console-producer.sh脚本(kafka.tools.ConsolProducer)将使用java版本的producer,而不再使用老版本的Scala producer

-默认情况下,所有命令行工具都将在stderr中打印日志信息而不是stdout


Notable changes in 0.9.0.1
  • The new broker id generation feature can be disabled by setting broker.id.generation.enable to false.
  • Configuration parameter log.cleaner.enable is now true by default. This means topics with a cleanup.policy=compact will now be compacted by default, and 128 MB of heap will be allocated to the cleaner process via log.cleaner.dedupe.buffer.size. You may want to review log.cleaner.dedupe.buffer.size and the other log.cleaner configuration values based on your usage of compacted topics.
  • Default value of configuration parameter fetch.min.bytes for the new consumer is now 1 by default.
0.9.0.1中显著的变化

-新增的broker id创建特征可以通过设置broker.id.generation.enable为false来禁用

-配置选项log.cleaner.enable目前默认情况是true。这意味着使用cleanup.policy=compact的topics将默认被压缩。 cleaner进程通过log.cleaner.dequpe.buffer.size将会分配128mb的堆空间。可以回顾一下log.cleaner.dedupe.buffer.size以及其它log.cleaner配置值来确认一下如何使用压缩策略。

-配置选项fetch.min.bytes的默认值在新版本的consumer中为1.


Deprecations in 0.9.0.0
  • Altering topic configuration from the kafka-topics.sh script (kafka.admin.TopicCommand) has been deprecated. Going forward, please use the kafka-configs.sh script (kafka.admin.ConfigCommand) for this functionality.
  • The kafka-consumer-offset-checker.sh (kafka.tools.ConsumerOffsetChecker) has been deprecated. Going forward, please use kafka-consumer-groups.sh (kafka.admin.ConsumerGroupCommand) for this functionality.
  • The kafka.tools.ProducerPerformance class has been deprecated. Going forward, please use org.apache.kafka.tools.ProducerPerformance for this functionality (kafka-producer-perf-test.sh will also be changed to use the new class).
  • The producer config block.on.buffer.full has been deprecated and will be removed in future release. Currently its default value has been changed to false. The KafkaProducer will no longer throw BufferExhaustedException but instead will use max.block.ms value to block, after which it will throw a TimeoutException. If block.on.buffer.full property is set to true explicitly, it will set the max.block.ms to Long.MAX_VALUE and metadata.fetch.timeout.ms will not be honoured
0.9.0.0中删除的东西

-kafka-topics.sh脚本(kafka.admin.TopicCommand)中变更topic的配置废弃了。以后可以使用kafka-configs.sh脚本(kafka.admin.ConfigCommand)实现这个功能。

-kafka-consumer-offset-checker.sh(kafka.tools.ConsumerOffsetChecer)已经废弃了。以后可以使用kafka-consumer-groups.sh(kafka.admin.ConsumerGroupCommand)来实现这个功能

-kafka.tools.ProducerPerformance类已经移除了。以后可以使用org.apache.kafka.tools.ProducerPerformance使用这个功能(kafka-producer-perf-test.sh将使用新的类)

-producer的配置选项block.on.buffer.full已经废弃了,将会在以后的release版本中删除。当前默认值已经变为false了。KafkaProducer不在抛出BufferExhaustedException异常了,而是使用max.block.ms值进行阻塞,以后会抛出TimeoutException异常。如果blcok.buffer.full特征设置成隐式调用,将设置max.block.ms为Long.MAX_VALUE以及metadata.fetch.timeout.ms不再尊敬(?)


Upgrading from 0.8.1 to 0.8.2

0.8.2 is fully compatible with 0.8.1. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it.

Upgrading from 0.8.0 to 0.8.1

0.8.1 is fully compatible with 0.8. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it.

Upgrading from 0.7

Release 0.7 is incompatible with newer releases. Major changes were made to the API, ZooKeeper data structures, and protocol, and configuration in order to add replication (Which was missing in 0.7). The upgrade from 0.7 to later versions requires a special tool for migration. This migration can be done without downtime.


从0.8.0升级到0.8.1


0.8.2和0.8.1完全兼容。升级可以逐台进行:先停掉broker,然后升级代码,最后重启它


从0.7升级

0.7发布版和新版本不兼容。主要的变化在于API,zookeeper的数据结构,以及协议,还有添加备份的配置(0.7版本没有这个配置)。从0.7升级到更新版本需要一个特殊的工具,迁移过程不需要宕机。


发布了130 篇原创文章 · 获赞 40 · 访问量 82万+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章