急需降低系统复杂性，我们从Kafka迁移到了Pulsar

原創

2021-01-26 19:53

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"要点总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分布式消息系统支持流和队列两种语义。这两种语义最适合使用的场景有所不同。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 的独特之处在于它同时支持流和队列使用场景。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 采用多层架构，可以轻松扩展 topic 的数量和大小，比其他消息系统的操作更便捷。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 实现可扩展性、可靠性和其他特性之间的良好平衡。这有助于替换 Iterable 采用的 RabbitMQ 消息系统，并最终替换其他消息系统（如 Kafka 和 Amazon SQS）。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable公司每天代表客户发送大量营销消息，包括电子邮件、通知、短信、应用程序消息等，并且每天处理更多的用户数据更新、事件、自定义工作流状态。Iterable 日常处理的很多消息都可能触发系统中的其他操作，从而导致系统越来越复杂，产品易用性越来越低。随着客户数量不断增加，降低系统复杂性迫在眉睫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable 可以在架构的某些部分改用分布式消息系统，主要用于存储需要 consumer 处理的消息，追踪 consumer 处理消息时的状态，从而降低系统复杂性，保证 consumer 专注于处理消息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable 使用工作队列执行客户指定的营销工作流、webhooks 和其他类型的工作安排或进展。其他组件（如提取用户和事件）使用流模型处理有序消息流。分布式消息系统通常支持流和队列两种语义，而最适合使用这两种语义的场景则有所不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"流和队列"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在流消息系统中，producer 追加数据到“仅追加”消息流中。在每个消息流中，必须按特定顺序处理消息，consumer 在消息流中标记消息的位置。我们可以采取某种策略（如对用户 ID 进行哈希处理）对消息进行分区，使分区成为单独的数据流，增加并行度。由于每个流中的数据不可变，且只保存偏移 entry，因此处理时不会遗漏消息。流适用于重视消息顺序（如提取数据）的场景。"},{"type":"link","attrs":{"href":"https:\/\/kafka.apache.org\/","title":"","type":null},"content":[{"type":"text","text":"Kafka"}]},{"type":"text","text":" 和 "},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/kinesis\/","title":"","type":null},"content":[{"type":"text","text":"Amazon Kinesis"}]},{"type":"text","text":" 都使用流语义处理消息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在队列消息系统中，一个队列可能有多个 producer 和 consumer。producer 向队列发送消息，consumer 从队列中接收消息。接收消息后，consumer 开始处理消息，并在处理完每条消息后向队列消息系统发送 ack。由于多个 consumer 共用一个队列，消息顺序并不重要，因此基于队列的系统很容易对 consumer 进行扩展。消息队列系统适用于不需要按特定顺序执行任务的队列，例如，发送同一封邮件给多个收件人。"},{"type":"link","attrs":{"href":"https:\/\/www.rabbitmq.com\/","title":"","type":null},"content":[{"type":"text","text":"RabbitMQ"}]},{"type":"text","text":" 和 "},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/sqs\/","title":"","type":null},"content":[{"type":"text","text":"Amazon SQS"}]},{"type":"text","text":" 都是基于队列的消息系统。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常情况下，消息队列系统可以简化消息级别错误的处理。例如，在发生错误后，RabbitMQ 可以轻松地将消息发送到特定队列，由该队列保留特定时间后，再将消息发送回到原始队列进行重试。RabbitMQ 还可以反馈 ack 失败，这样可以在消息发送失败后重新发送。大多数消息队列在收到 ack 后不会将消息存储在 backlog 中，因此系统无法找到需要新发送的消息，这就增加了调试和灾备的难度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于流的系统（如 Kafka）也可以用于队列使用场景，但使用起来有些麻烦。Kafka 支持多种特性，很多客户决定在队列中使用 Kafka。但是由于 Kafka 不能严格按照流指定的顺序处理消息，为开发人员增加很多额外工作。如果 consumer 无法消费消息，导致消息处理速度降低或需要重新消费消息，那么同一流上其他消息的处理速率也会受到影响。常见的解决方案是将消息发布到另一个 topic 进行重试，但这会增加应用程序的状态管理，提高复杂性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"为什么 Iterable 需要新消息系统"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable 一直使用 RabbitMQ 的特性，处理大量内部消息。我们自定义存活时间（Time-to-Live，TTL），用于指定重试次数，并实现消息处理中的显示延迟。例如，我们可能会延迟发送营销邮件（在收件人最可能查看邮件时，再发送营销邮件）。我们还需要查阅 ack 失败，来确定重新发送失败的队列消息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable 的架构简图如下："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3c\/4f\/3c2f1f9238520a7a0a1cdd746b7cc24f.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在评估 Pulsar 时，我们使用 Kafka 提取消息，使用 RabbitMQ 处理上文提到的所有队列。Kafka 具备相应的性能和排序保证，非常适合提取消息，但由于缺少必要的队列语义，不适合其他使用场景。RabbitMQ 的特性（如延迟）对我们至关重要，这就增加了我们寻找替代方案的难度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在扩展系统时，RabbitMQ 出现以下问题："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在高负载场景中，RabbitMQ 经常出现流量控制问题。在内存或其他资源受到限制时，broker 落后于 producer，流控制机制降低 producer 的速度。但这会影响 producer，导致服务延迟和其他工作区域的请求失败。例如，我们发现当大量消息的生存时间同时终止时，流控制发生的频率增加。在这种情况下，RabbitMQ 尝试将所有到期的消息一次传输到目标队列，但这会急剧增加 RabbitMQ 实例的内存容量，从而触发 producer 的流控制机制，阻止 producer 发布消息。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RabbitMQ 的 broker 在收到 ack 后不会存储消息，增加了调试的难度。也就是说，broker 端无法设置消息的保留时间。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RabbitMQ 的复制组件不足以应对我们的使用场景，导致难以复制消息，RabbitMQ 因而成为消息状态的单点故障。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RabbitMQ 难以处理大量队列。我们有很多需要专用队列的使用场景，经常需要一次性处理 1 万多个队列。在处理这个数量级的队列时，RabbitMQ 的管理页面和 API 经常出现问题。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"评估 Apache Pulsar"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整体来看，"},{"type":"link","attrs":{"href":"https:\/\/pulsar.apache.org\/","title":"","type":null},"content":[{"type":"text","text":"Apache Pulsar"}]},{"type":"text","text":" 支持我们需要的全部特性。尽管在 Pulsar 和 Kafka 的对比中，Pulsar 云服务提供商和用户都在强调 Pulsar 的流处理特性，但我们发现 Pulsar 非常适合处理队列。Pulsar 的共享订阅模式支持将 topic 用作队列，因而可以向同一 topic 内的 consumer 提供多个虚拟队列。Pulsar 也原生支持延迟发送消息。在我们刚开始测试 Pulsar 的时候，支持这些特性的系统并不多见。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上述特性外，Pulsar 的分层架构还简化了扩展 topic 数量和大小的操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0c\/8d\/0cb4da73eb60b529dfdbd988a0a6458d.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 的顶层为 broker，负责从 producer 接收消息并发送消息到 consumer，但不存储消息。一个 broker 负责一个 topic 分区，但 broker 不存储 topic 状态，topic 的 owner broker 可以随意互换。因此用户可以添加 broker，轻松扩大吞吐量，并可以在添加后立即使用新 broker。Pulsar 也因而可以处理 broker 故障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 的底层为 "},{"type":"link","attrs":{"href":"https:\/\/bookkeeper.apache.org\/","title":"","type":null},"content":[{"type":"text","text":"BookKeeper"}]},{"type":"text","text":"，负责将 topic 数据分片存储在整个集群中。需要增加存储时，可以添加 BookKeeper 节点（bookie）到集群中，然后用这些新节点来存储新的分片。Broker 与 bookie 相互协调，更新 topic 的状态。Pulsar 使用 BookKeeper 存储大量 topic，这对 Iterable 当前的使用场景而言非常重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在评估了几个消息系统后，我们决定使用 Pulsar，因为 Pulsar 的可扩展性、可靠性和特性之间达到了完美的平衡，足以取代 Kafka、Amazon SQS 等消息系统。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"初试 Pulsar：发送消息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable 平台的主要任务之一就是代表客户定时发送营销电子邮件。因此，我们为不同的客户分别创建队列，将这些消息发送到相应的队列中，再检查并发送这些消息。Pulsar 提供的队列让我们最终决定放弃 RabbitMQ。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"将营销邮件作为对 Pulsar 的第一项测试有两个原因。一是我们使用 RabbitMQ 主要用于发送消息；二是发送消息是我们使用 RabbitMQ 处理的较为复杂的使用场景。对 Iterable 来说，这一测试场景的风险并不低。但在对 Pulsar 进行全面测试后，我们发现 Pulsar 更适合为 Iterable 处理队列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iterable 平台主要处理以下三种常见的营销消息："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"同时发送营销消息给所有收件人。假设客户希望发送通知邮件给最近一个月的活跃用户，我们查询 "},{"type":"link","attrs":{"href":"https:\/\/www.elastic.co\/","title":"","type":null},"content":[{"type":"text","text":"ElasticSearch"}]},{"type":"text","text":" 获取用户列表，然后设置定时发送消息，再发送这些消息到相应的 Pulsar topic。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"为每个收件人指定发送时间。发送时间可能是固定的（如收件人所在时区的早上 9 点），也可能根据我们的发送时间优化算法确定。但无论是哪种情况，我们都需要在指定时间发送队列消息，即延迟处理消息。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"用户触发的消息发送。用户使用自定义流程或发起交易（如在线购物）时，触发消息发送。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上述场景中，同一时间发送的消息数量可能会相差很大，因此我们需要消息系统可以根据实际情况扩缩 consumer 的数量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"迁移到 Apache Pulsar"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"虽然在负载测试中，Pulsar 表现良好，但是我们不确定 Pulsar 是否能够承受实际生产环境的高负载。这也是我们特别关心的问题，因为我们想要利用 Pulsar 的一些新特性（如"},{"type":"link","attrs":{"href":"https:\/\/pulsar.apache.org\/docs\/en\/concepts-messaging\/#negative-acknowledgement","title":"","type":null},"content":[{"type":"text","text":"Nack"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/pulsar.apache.org\/docs\/en\/2.5.0\/concepts-messaging\/#delayed-message-delivery","title":"","type":null},"content":[{"type":"text","text":"延时发送消息"}]},{"type":"text","text":"）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了检测 Pulsar 的性能，我们部署了并行管道，同时向 RabbitMQ 和 Pulsar 发送消息，并配置不实际处理消息的 consumer 进行 ack。另外，我们还模拟了延迟消费，以便了解 Pulsar 在特定生产环境中的表现。我们对测试 topic 和生产 topic 同时使用 consumer 级别的特性标记，因此可以逐一迁移 consumer 进行测试，最终用于生产环境。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在测试期间，我们发现了 Pulsar 的一些错误。例如一个"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/pulsar\/pull\/5499","title":"","type":null},"content":[{"type":"text","text":"与延迟消息相关的竞态条件"}]},{"type":"text","text":"问题，但在 Pulsar 开发人员的帮助下，这些问题都得以定位和解决。这是我们发现的最严重的问题，它会导致 consumer 出现假死，消息积压。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们还发现 Pulsar producer 默认启用批处理。例如，Pulsar 积压 metric 返回的是批数量而不是消息数量，增加为消息积压设置报警阈值的难度。后来，我们在 Nack 和批处理之间的交互中发现了一个更严重的"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/pulsar\/issues\/5969","title":"","type":null},"content":[{"type":"text","text":"错误"}]},{"type":"text","text":"，Pulsar 团队也及时修复了这个错误。我们最终决定不使用批处理。在 Pulsar 中，禁用 producer 批处理操作简单，Pulsar 性能也满足了我们的需求。Pulsar 在新的版本中可能会合并上文提到的错误修复。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息延迟下发和 Nack 在当时属于 Pulsar 新特性，我们觉得在使用中可能会出现一些问题，所以我们决定在初试阶段只发布消息到测试 topic，并在几个月内逐步迁移到 Pulsar。如果出现问题，我们可以迅速定位并及时解决问题，不影响客户的使用。市场营销业务的整体迁移历时大约六个月，这期间 Pulsar 实现了预期表现，我们感到十分满意。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"迁移全部完成后，我们发现增加 consumer 后，业务规模得到拓展，但运营成本降低了一半。迁移到 Pulsar 前，我们的业务成本较高，可能是因为我们在使用 RabbitMQ 时，为了提高性能，超额配置了实例。到目前为止，我们的 Pulsar 集群已经运行了六个多月，没有出现任何问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"实施和工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在后端，Iterable 主要使用 "},{"type":"link","attrs":{"href":"https:\/\/www.scala-lang.org\/","title":"","type":null},"content":[{"type":"text","text":"Scala"}]},{"type":"text","text":"，因此我们需要使用支持 Pulsar 的 Scala 工具。我们一直在使用 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/sksamuel\/pulsar4s","title":"","type":null},"content":[{"type":"text","text":"pulsar4s"}]},{"type":"text","text":" 库，也对新特性做了一些贡献，例如延迟发送消息。我们还贡献了一个"},{"type":"link","attrs":{"href":"https:\/\/doc.akka.io\/docs\/akka\/current\/stream\/index.html","title":"","type":null},"content":[{"type":"text","text":"基于 Akka Streams"}]},{"type":"text","text":" 的连接器，作为 source 接收消息，还支持 ack。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如，我们可以这样消费命名空间中的所有 topic。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\/\/ Create a consumer on all topics in this namespace\nval createConsumer = () => client.consumer(ConsumerConfig(\n topicPattern = \"persistent:\/\/email\/project-123\/.*\".r,\n subscription = Subscription(\"email-service\")\n))\n\n\/\/ Create an Akka streams `Source` stage for this consumer\nval pulsarSource = committableSource(createConsumer, Some(MessageId.earliest))\n\n\/\/ Materialize the source and get back a `control` to shut it down later.\nval control = pulsarSource.mapAsync(parallelism)(handleMessage).to(Sink.ignore).run()\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用正则表达式为 consumer 添加订阅，这样 consumer 不必了解特定的 topic 划分策略，可以自动订阅新创建的 topic。由于 Pulsar 支持大量 topic，可以在发布消息时自动创建新 topic，因此可以轻松为新消息类型或单独的消息创建新 topic。Pulsar 帮助用户可以更轻松地限制不同 consumer 和消息类型的速率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"结语"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 是一个正在快速发展的开源项目，因此我们需要随时关注 Pulsar 的动态，深入了解 Pulsar 的各个方面。Pulsar 的文档还不太完善，我们经常需要联系社区，寻求帮助。社区的小伙伴们十分热情，我们也乐于参与到 Pulsar 的开发中，为 Pulsar 的新特性添砖加瓦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar 采用分层架构，不仅具有高可扩展、高可用、低延迟等特性，还同时支持流和队列，因而可以代替 Iterable 架构中正在使用的多个分布式消息系统。Pulsar 支持我们的 Kafka、RabbitMQ 和 SQS 用例。迁移到 Pulsar 后，我们可以专心使用一个统一的架构，熟悉 Pulsar 的各项操作和工具即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们在 2019 年初开始接触 Pulsar。到目前为止，Pulsar 已经取得了巨大的进展，尤其是入门文档和相关培训。Pulsar 也新增了许多工具，例如，"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/pulsar-manager","title":"","type":null},"content":[{"type":"text","text":"Pulsar Manager"}]},{"type":"text","text":" 用于管理集群。一些公司提供托管和管理 Pulsar 的服务，便于初创公司和小型团队上手 Pulsar。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"总而言之，Iterable 迁移到 Pulsar 的过程非常成功，期间也遇到了一些挑战。Iterable 的使用场景目前还不多见。我们原以为会出现一些问题，但测试解决了大多数问题，将对客户的影响降到最低。我们对 Pulsar 的表现充满信心，打算将 Pulsar 同时用于 Iterable 平台其他的新旧组件中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文链接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/www.infoq.com\/articles\/pulsar-customer-engagement-platform\/"}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

2024年DataOps趋势预测：AI不会取代数据工程师

APM digest收集了多位行業專家對DataOps在2024的發展形勢及對IT和業務的影響的預測，這些技術最高管理者，包括Confluent技術戰略負責人Andrew Sellers的深刻洞見可能與你的感覺一致嗎？快來探討一下。數據可

2024-04-30 11:49:29

Apache DolphinScheduler支持Flink吗？

隨着大數據技術的快速發展，很多企業開始將Flink引入到生產環境中，以滿足日益複雜的數據處理需求。而作爲一款企業級的數據調度平臺，Apache DolphinScheduler也跟上了時代步伐，推出了對Flink任務類型的支持。 Flink

2024-04-30 11:49:27

云原生周刊：K8s 中的服务和网络｜ 2024.4.29

開源項目推薦 k8s-image-swapper k8s-image-swapper 是 Kubernetes 的一個變更 Webhook，它將鏡像下載到自己的鏡像倉庫，並將鏡像指向該新位置。它是 docker pull-through p

2024-04-30 10:48:10

MySQL 社区经理：MySQL 8.4 InnoDB 参数默认值为什么要这么改？

MySQL 8.4 LTS 版本，我們一共修改了 20 個 InnoDB 變量的默認值。作者：Frederic Descamps，EMEA 和亞太地區的 MySQL 社區經理。於 2016 年 5 月加入 MySQL 社區團隊。擔任開源

2024-05-06 23:20:21

Redis开源社区持续壮大，华为云为Valkey项目注入新的活力

摘要：作爲Valkey社區的Technical Steering Committee member，華爲雲將持續參與社區建設。一、背景今年3月21日，Redis Labs宣佈從Redis 7.4版本開始，將原先比較寬鬆的BSD

2024-05-06 22:32:57

通义灵码实战系列：一个新项目如何快速启动，如何维护遗留系统代码库？

作者：別象進入 2024 年，AI 熱度持續上升，翻閱科技區的文章，AI 可謂是軍書十二卷，卷卷有爺名。而麥肯錫最近的研究報告顯示，軟件工程是 AI 影響最大的領域之一，AI 已經成爲了軟件工程的必選項，也有研究稱開發者每天的事務性工作可

2024-04-30 21:12:20

30 秒出服装设计稿，森马用函数计算+AIGC 整“新活”!

創新項目如何去賦能我們的業務，這件事情在森馬很重要。阿里雲函數計算幫我們屏蔽掉了想把AI落地到實際業務場景中 GPU 算力資源儲備、採購成本、技術門檻等很多難題，從而迅速做出決策，快人一步站在正確的起點，體驗新技術對整個服裝爆款設計、營銷

2024-04-30 21:12:14

当「软件研发」遇上 AI 大模型

作者：陳鑫（神秀）大家好，我是通義靈碼的產品技術負責人陳鑫。過去有八年時間，我都是在阿里集團做研發效能，即研發工具相關的工作。我們從 2015 年開始做一站式 DevOps 平臺，然後打造了雲效，也就是將 DevOps 平臺實現雲化。到

2024-04-30 21:12:13

云原生周刊：Terraform 1.8 发布｜ 2024.5.6

開源項目推薦 xlskubectl 用於控制 Kubernetes 集羣的電子表格。xlskubectl 將 Google Spreadsheet 與 Kubernetes 集成。你可以通過用於跟蹤費用的同一電子表格來管理集羣。 git-

2024-05-06 22:46:37

ACK One x OpenKruiseGame 全球游戏服多地域一致性交付最佳实践

作者：劉秋陽、蔡靖前言在當今全球一體化的經濟環境下，數字娛樂產業正日益成爲文化和商業交流的有力代表。在此背景下大量遊戲廠商嘗試遊戲出海並取得了令人矚目的成績，許多遊戲以全球同服架構吸引着世界各地廣泛的玩家羣體。遊戲全球化部署不僅擴大了單

2024-04-30 21:12:18

如何通过前端表格控件在10分钟内完成一张分组报表？

前言：當今時代，報表作爲信息化系統的重要組成部分，在日常的使用中發揮着關鍵作用。藉助報表工具使得數據錄入、分析和傳遞的過程被數字化和智能化，大大提高了數據的準確性及利用的高效性。而在此過程中，信息化系統能夠實現對數據的實時監控和更新，爲管

2024-05-06 10:22:56

巧用 TiCDC Syncpiont 构建银行实时交易和准实时计算一体化架构

本文闡述了某商業銀行如何利用 TiCDC Syncpoint 功能，在 TiDB 平臺上構建一個既能處理實時交易又能進行準實時計算的一體化架構，用以優化其零售資格業務系統的實踐。通過遷移到 TiDB 並巧妙應用 Syncpoint，該銀行成

2024-04-30 22:24:58

从原始边列表到邻接矩阵Python实现图数据处理的完整指南

本文分享自華爲雲社區《從原始邊列表到鄰接矩陣Python實現圖數據處理的完整指南》，作者：檸檬味擁抱。在圖論和網絡分析中，圖是一種非常重要的數據結構，它由節點（或頂點）和連接這些節點的邊組成。在Python中，我們可以使用鄰接矩陣來表示

2024-04-30 10:34:05

如何通过前后端交互的方式制作Excel报表

前言 Excel擁有在辦公領域最廣泛的受衆羣體，以其強大的數據處理和可視化功能，成了無可替代的工具。它不僅可以呈現數據清晰明瞭，還能進行數據分析、圖表製作和數據透視等操作，爲用戶提供了全面的數據展示和分析能力。今天小編就爲大家介紹一下，如

2024-04-30 10:24:12

Python爬虫技术与数据可视化：Numpy、pandas、Matplotlib的黄金组合

前言在當今信息爆炸的時代，數據已成爲企業決策和發展的關鍵。而互聯網作爲信息的主要來源，網頁中蘊含着大量的數據等待被挖掘。Python爬蟲技術和數據可視化工具的結合，爲我們提供了一個強大的工具箱，可以幫助我們從網絡中抓取數據，並將其可視

2024-04-29 23:26:28

24小時熱門文章

最新文章

急需降低系統複雜性，我們從Kafka遷移到了Pulsar

最新評論文章