Kafka系列10:面试题是否有必要深入了解其背后的原理?我觉得应该刨根究底(下)

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在今天文章开始之前,想和粉丝朋友们先分享一个好消息,作者坚持以"},{"type":"text","marks":[{"type":"strong"}],"text":"原创"},{"type":"text","text":"的态度去努力写好每一篇文章,同时得到了一小部分粉丝朋友们的认可和 InfoQ 写作平台的支持。在此非常感谢粉丝朋友的支持,同时也非常感谢 InfoQ 小编的认可。接下来我会继续努力,不忘初心,用心写好每一篇文章。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/259044ff007181eab9c2b679afc35973.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外最近忙着搬家和工作的事情,导致没有多余的时间来更文,希望朋友们能够多多包涵。好了,今天我们我们来继续分析 Kafka 的常见面试题。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"文章概览"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kafka 的延迟队列你有了解吗?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kafka 的幂等性是怎么实现的?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"知道 ISR、AR 是什么吗?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kafka 中的 HW、LEO、LSO 知道是什么意思吗?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kafka 消息有顺序吗?如果有其顺序性是怎么保证的?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"有遇到过消息重复消费的情况吗?"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kafka 的延迟队列你有了解吗?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 的延迟队列使用了一个叫“时间轮”的东西来实现的,听起来牛逼哄哄的样子,直接来看图。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bc/bc25192d526999747cf15d70bf978bd6.png","alt":null,"title":"时间轮原理图","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"boxShadow"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"TickMs:时间单位,每一格代表一个时间跨度。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"CurrentTime:当前时间,表示当前时间轮运行到的位置。CurrentTime 运行到一个位置,表示当前位置对应的队列中的任务需要被处理。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"TaskList:双端任务队列,其中每个 Task 就是一个实际要执行的任务。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"WheelSize:表示时间轮的容量大小。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Interval:表示时间轮的最长时间跨度,即最长可以存储的时间跨度有多长。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从上图可以看出,时间轮是由左侧的类似“时钟”的转盘和右侧的任务队列组成。其中左侧的转盘是由一个数组实现的循环队列。看到这里大家应该会有一个疑问,假设时间单位是毫秒,现在需要一个 30 分钟大小的延迟队列,那么时间轮的大小应该是 30 * 60 * 1000 = 1800000 大小的数组才能实现。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相信没几个人愿意直接初始化一个这么大的数组,聪明的人类总是会想出各种办法来解决问题,所以就出现了“二级”时间轮和“三级”时间轮。我们暂且把上面的图称之为“一级”时间轮,其每个相邻位置的时间跨度为 1ms,总时间跨度为 1ms * 8 = 8ms,二级时间轮相邻位置的时间跨度以一级时间轮总时间跨度为基准,则其二级时间轮的总跨度为 8ms * 8 = 64ms,即“二级”时间轮可以容纳 64ms 内的任务,同理,“三级”时间轮以“二级”时间轮的总跨度为基准,则“三级”时间轮的总跨度为 64ms * 8 = 512ms,即三级时间轮可以容纳 512ms 内的任务。假设“一级”时间轮的 WheelSize 设置为 100,30 分钟的延迟任务只需使用三个时间轮,总容量 300 大小就可以容纳所有的任务了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要注意的一点是,“二级”和“三级”时间轮是不直接执行任务的,当 CurrentTime 执行到时,会将其对应的任务进行降级操作,降级就是“三级”时间轮的任务降级到“二级”时间轮上,“二级”时间轮上的任务降级到“一级”时间轮上,即所有的任务实际是在“一级”时间轮上执行的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kafka 的幂等性是怎么实现的?"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"幂等性原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 kafka0.11 之前是没有幂等性的概念的,在 0.11 之后 Kafka 通过引入 PID 实现了单 Partition 的幂等性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息生产端生产的每条消息都会带上一个 PID 值,Broker 端也会缓存当前 Partition 对应的 PID,Broker 接收到消息以后会判断 PID 值,此时可能会产生三种情况。"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"【Broker 中缓存的 PID - 生产端发送的 PID <= 0】,此时认为消息是发生了重传,直接丢弃该条消息。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"【Broker 中缓存的 PID - 生产端发送的 PID > 1】,此时认为中间的消息发生了丢失,直接丢弃掉该条消息。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"【Broker 中缓存的 PID - 生产端发送的 PID = 1】,此时认为消息是单调递增,可以正常写入。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"开启幂等性"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"Properties props = new Properties();\nprops.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, \"localhost:9092\");\n// 注意,启用幂等性时,需要将其设置为all,否则会报错\nprops.put(ProducerConfig.ACKS_CONFIG, \"all\");\nprops.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\nprops.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n// 启用幂等性\nprops.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, \"true\");\n\nKafkaProducer kafkaProducer = new KafkaProducer<>(props);\nkafkaProducer.send(new ProducerRecord(\"truman_kafka_center\", \"1\", \"hello world.\")).get();\nkafkaProducer.close();"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"知道 ISR、OSR、AR 是什么吗?"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ISR:全称 In-Sync-Replicas,表示当前正在同步的从 Partition,该列表是由主 Partition 进行维护的,在该列表中的 Partition 定期从主 Partition 上拉取数据进行同步,若在指定周期内没同步数据,则认为该从 Partition 失效,从 ISR 列表中剔除,并将其移入到 OSR 列表中。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"OSR:全称 Out-Sync-Replicas,没有进行数据同步的从 Partition 列表,其中包括失效的从 Partition 和刚刚加入进来的新 Partition。处于 OSR 列表中的 Partition 不能够进行数据的同步操作。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AR:全称 Assigned-Replicas,一个 Partition 对应的主从所有的 Partition 列表,AR = ISR + OSR。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kafka 中的 HW、LEO、LSO 知道是什么意思?"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bf/bfceaa15f88f7eaaef44656942ff6608.png","alt":null,"title":"位置示意图","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"boxShadow"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"HW:全称 High-Water,即“高水位”,处于 HW 之前的数据才可能被正常消费,处于 HW 之后的数据不能够被消费。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"LEO:全称 Log-End-Offset,表示下一条消息被写入的位置。注意,每次消息被写入后会更新 LEO 值,所以 LEO 值代表的不是当前最新一条消息的位置,而是下一条消息要被写入的位置。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"LSO:全称 Last-Stable-Offset,表示为 LSO 之前的消息都已经被确认,而在 LSO 之后的消息还未被确认,其主要被用于事务。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kafka 消息有顺序吗?如果有其顺序性是怎么保证的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 无法做到消息全局有序,只能做到 Partition 维度的有序。所以如果想要消息有序,就需要从 Partition 维度入手。一般有两种解决方案。"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"单 Partition,单 Consumer。通过此种方案强制消息全部写入同一个 Partition 内,但是同时也牺牲掉了 Kafka 高吞吐的特性了,所以一般不会采用此方案。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"多 Partition,多 Consumer,指定 key 使用特定的 Hash 策略,使其消息落入指定的 Partition 中,从而保证相同的 key 对应的消息是有序的。此方案也是有一些弊端,比如当 Partition 个数发生变化时,相同的 key 对应的消息会落入到其他的 Partition 上,所以一旦确定 Partition 个数后就不能在修改 Partition 个数了。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"有遇到过消息重复消费的情况吗?是怎么解决的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有,发生过两次重复消费的情况。发现用户的\"xx\"计数偶现大于实际情况,排查日志发现大概意思是心跳检测异常导致 commit 还没有来得及提交,对应的 Partition 被重新分配给其他的 Consumer 消费导致消息被重复消费。"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"解决方式 1:调整降低消费端的消费速率、提高心跳检测周期。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过方案 1 调整参数后,还是会出现重复消费的情况,只是出现的概率降低了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":"2","normalizeStart":"2"},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"解决方案 2:在业务层增加 Redis,在一定周期内,相同 key 对应的消息认为是同一条,如果 Redis 内不存在则正常消费消费,反之直接抛弃。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常见的Kafka面试题先总结这些,但远不止这些。本文会持续收集更新常见的面试题,同时也希望朋友们积极留言面试中见过的面试题。"}]},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"精彩文章推荐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzIzNTIzNzYyNw==&mid=2247484048&idx=1&sn=2de06942948b02842fd042e7783cbc61&chksm=e8eb7b04df9cf2121a98984fae0207d422b9d643a2c3ff4d7c91cabfe2e8b4d530a62c376035&token=1378636505&lang=zh_CN&scene=21#wechat_redirect","title":null},"content":[{"type":"text","text":"面试题是否有必要深入了解其背后的原理?我觉得应该刨根究底(上)"}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzIzNTIzNzYyNw==&mid=2247483919&idx=1&sn=ee4c393dabadcecd51b34380425d227a&chksm=e8eb7b9bdf9cf28da20a8083d772d6744850256e7b338aa79225889c69f8395ab6019808d946&token=1378636505&lang=zh_CN&scene=21#wechat_redirect","title":null},"content":[{"type":"text","text":"你必须要知道集群内部工作原理的一些事!"}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzIzNTIzNzYyNw==&mid=2247483881&idx=1&sn=6624157a86dea60abd8a6842c863ba4e&chksm=e8eb787ddf9cf16b85561dcd25144a73304709a14f615291a5374f9ca2bf05a2284b181be1af&token=1378636505&lang=zh_CN&scene=21#wechat_redirect","title":null},"content":[{"type":"text","text":"消息是如何在服务端存储与读取的,你真的知道吗?"}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?__biz=MzIzNTIzNzYyNw==&mid=2247483827&idx=1&sn=44cfb50953b0f745b2719977453c3a42&chksm=e8eb7827df9cf131ea8aa1496e9ec2d1ed22ed245225dad0613511006b9428fffb16c92cdd6a&token=1378636505&lang=zh_CN&scene=21#wechat_redirect","title":null},"content":[{"type":"text","text":"一文读懂消费者背后的那点\"猫腻\""}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微信公众号搜索"},{"type":"text","marks":[{"type":"strong"}],"text":"【z小赵】"},{"type":"text","text":",更多系列精彩文章等你解"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/86/862b51ac70a151f66fa21406f2affc39.gif","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章