基于Kafka技术栈构建和部署实时搜索引擎的实践

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在 Koverhoop,我们正在保险、医疗、房地产和离线分析领域建立一些大型项目。在我们其中一个"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"多租户团体保险经纪平台"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "},{"type":"link","attrs":{"href":"https:\/\/klient.ca\/","title":null,"type":null},"content":[{"type":"text","text":"klient.ca"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",我们计划构建一个"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"强大的搜索功能"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",希望能在用户输入内容的同时同步呈现搜索结果。下面是我们能够实现的效果,我将在这篇文章讨论这一功能的核心基础设施,包括如何完全自动化部署及如何快速完成构建工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fb\/41\/fb25631ee6e143bf5593239650230241.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"来自作者的动图: 搜索能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这个系列文章分为"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"两部分"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",我将分别讨论以下内容:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"第1部分"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":了解用于支持此搜索能力的技术栈,并使用 Docker 和 Docker-compose 进行部署(本文)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"第2部分"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":使用 Kubernetes 对这些服务进行可伸缩的生产部署(待发布)"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"问题定义和决策"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了构建一个快速、实时的搜索引擎,我们必须做出某些设计决策。我们使用 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Postgres"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 作为主数据库,因此有以下选项可以使用:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"直接在 Postgres 数据库中查询我们在搜索栏中键入的每个字符。😐"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"使用一个高效的搜索数据库,如 Elasticsearch。🤔"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"考虑到我们已经是一个"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"多租户应用程序"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",同时被搜索的实体可能需要大量的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"关联操作"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(如果我们使用 Postgres)且预计规模也相当大,因此我们决定不使用以前直接查询数据库的方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"因此,我们必须决定一种可靠、高效的方式,将数据从 Postgres "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"实时"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"迁移到 Elasticsearch。接下来需要作出以下决定:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"使用 "},{"type":"link","attrs":{"href":"https:\/\/www.elastic.co\/logstash","title":null,"type":null},"content":[{"type":"text","text":"Logstash"}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":" 定期查询 Postgres 数据库并将数据发送到 Elasticsearch。😶"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"在我们的应用程序中使用 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}},{"type":"strong"}],"text":"Elasticsearch 客户端"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":",在 Postgres 和 Elasticsearch 中同时对数据进行 CRUD 操作。🧐 "}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"使用"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}},{"type":"strong"}],"text":"基于事件的流引擎"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":",从 Postgres 的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}},{"type":"strong"}],"text":"预写日志"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"中提取事件,将它们导入到流处理服务器,并将其接收到 Elasticsearch。🤯"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"选项1因为不是实时的,所以很快就被排除了,而且即使我们以较短的间隔进行查询,也会给 Postgres 服务器带来"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"明显的压力"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。对于其他两种选择,不同的公司做出的决定可能不一样。在我们的场景里如果选择选项2,我们可以预见到一些问题:如果 Elasticsearch 在确认更新时"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"速度很慢"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",这可能会减慢我们应用程序的速度,或者在"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"不一致"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的情况下,我们要如何对单个或一组事件的插入进行重试?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"因此,我们决定构建一个基于事件队列的基础设施。还因为我们已经计划了一些适合基于事件的未来场景和服务,比如"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"通知服务、数据仓库、微服务架构"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"等。事不宜迟,让我们直接开始解决方案及所使用服务的基本介绍吧。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"服务简介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了实现基于事件的流基础设施,我们决定使用 Confluent Kafka 技术栈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以下是我们整合的服务:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/33\/3c\/3371212b89a287e8903b080554f4f93c.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"来源:"},{"type":"link","attrs":{"href":"https:\/\/confluent.io\/","title":null,"type":null},"content":[{"type":"text","text":"Confluent"}],"marks":[{"type":"size","attrs":{"size":10}},{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 公司"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Apache Kafka:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Kafka 是 Confluent 平台的核心。它是一个基于开源的分布式事件流平台。它将是数据库事件(插入、更新和删除)的主存储区域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Kafka Connect:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们使用 Kafka-Connect 从 "},{"type":"link","attrs":{"href":"https:\/\/debezium.io\/documentation\/reference\/connectors\/postgresql.html","title":null,"type":null},"content":[{"type":"text","text":"Debezium"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 的 Postgres 连接器获取 Kafka 的数据,该连接器从 Postgres "},{"type":"link","attrs":{"href":"https:\/\/www.postgresql.org\/docs\/9.0\/wal-intro.html","title":null,"type":null},"content":[{"type":"text","text":"WAL"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 文件中获取事件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在接收端,我们使用 ElasticSearch 连接器处理数据并将其加载到 ElasticSearch 中。Connect 既可以作为一个独立软件运行,也可以作为一个生产环境容错且可伸缩的服务运行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"ksqlDB:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"ksqlDB 允许在 Kafka 之上构建一个流处理应用程序。它在内部使用 Kafka-streams 并在事件进来时进行转换,我们使用它来丰富特定流的事件,其中包括已经在 Kafka 持久存在的其他表的事件,这些事件可能与搜索功能相关,例如 root表中的"},{"type":"codeinline","content":[{"type":"text","text":"tenant_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ea\/b3\/ea82aff73ff4f774672b55f90f58c7b3.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"自作者的图片:基于 Apache Kafka 的 ksqlDB"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"使用 ksqlDB,只需编写"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"SQL"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"查询来"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"过滤、聚合、关联和填充"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据即可。例如,假设我们正在接收一个关于两个主题的事件流,其中包括与"},{"type":"codeinline","content":[{"type":"text","text":"brands"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"和"},{"type":"codeinline","content":[{"type":"text","text":"brand_products"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"相关的信息。考虑到这是一个多租户数据源,我们需要使用 "},{"type":"codeinline","content":[{"type":"text","text":"tenant_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 来填充 "},{"type":"codeinline","content":[{"type":"text","text":"brand_product"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",而 "},{"type":"codeinline","content":[{"type":"text","text":"tenant_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"目前只与"},{"type":"codeinline","content":[{"type":"text","text":"brands"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"相关联。然后,我们可以使用这些填充后的记录,并将它们以非标准化的形式保存在 Elasticsearch 中(以便进行搜索)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们可以使用一个主题来设置 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"KStream"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":"}]},{"type":"codeblock","attrs":{"lang":"sql"},"content":[{"type":"text","text":"CREATE STREAM \"brands\"\nWITH (\n kafka_topic = 'store.public.brands', \n value_format = 'avro'\n);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了只使用其中几列并按 "},{"type":"codeinline","content":[{"type":"text","text":"id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 对数据流分区,我们可以创建一个名为 "},{"type":"codeinline","content":[{"type":"text","text":"enriched_brands"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 的新数据流:"}]},{"type":"codeblock","attrs":{"lang":"sql"},"content":[{"type":"text","text":"CREATE STREAM \"enriched_brands\"\nWITH (\n kafka_topic = 'enriched_brands'\n) \nAS \n SELECT \n CAST(brand.id AS VARCHAR) as \"id\", \n brand.tenant_id as \"tenant_id\",\n brand.name as \"name\" \n FROM \n \"brands\" brand \n PARTITION BY \n CAST(brand.id AS VARCHAR) \n EMIT CHANGES;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"然后可以通过 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"KTable"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 中的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"最新偏移量"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"来实现事件集合。我们使用这个功能是为了将"},{"type":"codeinline","content":[{"type":"text","text":"brand"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"事件的当前状态与其他流关联起来。"}]},{"type":"codeblock","attrs":{"lang":"sql"},"content":[{"type":"text","text":"CREATE TABLE \"brands_table\"\nAS \n SELECT \n id as \"id\", \n latest_by_offset(tenant_id) as \"tenant_id\"\n FROM \n \"brands\" group by id \n EMIT CHANGES; "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"现在我们添加了一个含有"},{"type":"codeinline","content":[{"type":"text","text":"brand_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 字段的 "},{"type":"codeinline","content":[{"type":"text","text":"brand_products"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 的新流,但没有"},{"type":"codeinline","content":[{"type":"text","text":"tenant_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 字段。"}]},{"type":"codeblock","attrs":{"lang":"sql"},"content":[{"type":"text","text":"CREATE STREAM \"brand_products\" \nWITH (\n kafka_topic = 'store.public.brand_products', \n value_format = 'avro' \n);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们可以使用以下关联查询向 "},{"type":"codeinline","content":[{"type":"text","text":"brand_products"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"填充 "},{"type":"codeinline","content":[{"type":"text","text":"tenant_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"codeblock","attrs":{"lang":"sql"},"content":[{"type":"text","text":"CREATE STREAM \"enriched_brand_products\" \nWITH (\n kafka_topic = 'enriched_brand_products’ \n) AS \n SELECT \n \"brand\".\"id\" as \"brand_id\", \n \"brand\".\"tenant_id\" as \"tenant_id\", \n CAST(brand_product.id AS VARCHAR) as \"id\",\n brand_product.name AS \"name\"\n FROM \n \"brand_products\" AS brand_product \n INNER JOIN \"brands_table\" \"brand\"\n ON \n brand_product.brand_id = \"brand\".\"id\"\n PARTITION BY \n CAST(brand_product.id AS VARCHAR) \n EMIT CHANGES;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Schema 注册表:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"它在 Kafka 的上层,用于存储你在 Kafka 中提取的事件的元数据。它基于 AVRO 模式,并提供 REST 接口来存储和查询它们。它有助于确保一些 Schema 兼容性检查及其随时间发生的演变。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"配置技术栈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们使用 Docker 和 Docker-compose 来配置和部署服务。下面是准备用于构建服务所写的 docker-compose 文件,将运行 Postgres,Elasticsearch,和 Kafka 相关的服务。下面我还将解释提到的每一种服务。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Postgres 和 Elasticsearch"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"postgres:\n build: services\/postgres\n container_name: oeso_postgres\n volumes:\n - database:\/var\/lib\/postgresql\/data\n env_file:\n - .env\n ports:\n - 5432:5432\n networks:\n - project_network\n "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"用于 Postgres 的 Docker-compose 服务"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"elasticsearch:\n image: docker.elastic.co\/elasticsearch\/elasticsearch:7.10.0\n container_name: elasticsearch\n volumes:\n - .\/services\/elasticsearch\/config\/elasticsearch.yml:\/usr\/share\/elasticsearch\/config\/elasticsearch.yml:ro\n - elasticsearch-database:\/usr\/share\/elasticsearch\/data\n env_file:\n - .env\n ports:\n - \"9200:9200\"\n - \"9300:9300\"\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"用于 Elasticsearch 的 Docker-compose 服务"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了从源数据库中流式的导出事件,我们需要启用逻辑解码以便从其日志中进行复制。在 Postgres 的例子中,这些日志被称为 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Write-Ahead Logs (WAL) "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",它们被写入一个文件中。我们需要一个逻辑解码插件,在我们的例子中,wal2json 用来提取关于持久数据库更改的易于阅读的信息,以便它可以被作为事件发送到 Kafka。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了配置所需的扩展,你可以参考这个 Postgres "},{"type":"link","attrs":{"href":"https:\/\/github.com\/behindthescenes-group\/oesophagus\/blob\/master\/services\/postgres\/Dockerfile","title":null,"type":null},"content":[{"type":"text","text":"Dockerfile"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"对于 Elasticsearch 和 Postgres,我们需要在环境文件中指定一些必要的变量来设置它们,如用户名、密码等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Zookeeper"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"zookeeper:\n image: confluentinc\/cp-zookeeper:6.0.0\n hostname: zookeeper\n container_name: zookeeper\n ports:\n - \"2181:2181\"\n environment:\n ZOOKEEPER_CLIENT_PORT: 2181\n ZOOKEEPER_TICK_TIME: 2000\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"总的来说,Zookeeper 扮演 Kafka 这样的分布式平台的中心服务,它存储所有元数据,如 Kafka 节点状态,并持续跟踪主题或分区。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"即便已经有了"},{"type":"link","attrs":{"href":"https:\/\/cwiki.apache.org\/confluence\/display\/KAFKA\/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum","title":null,"type":null},"content":[{"type":"text","text":"在无 zookeeper 的情况下运行 Kafka"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的替代计划,但是目前它还是管理集群所必须的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Kafka Broker"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"broker:\n image: confluentinc\/cp-enterprise-kafka:6.0.0\n hostname: broker\n container_name: broker\n depends_on:\n - zookeeper\n ports:\n - \"29092:29092\"\n environment:\n KAFKA_BROKER_ID: 1\n KAFKA_ZOOKEEPER_CONNECT: \"zookeeper:2181\"\n KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT\n KAFKA_ADVERTISED_LISTENERS: PLAINTEXT:\/\/broker:9092,PLAINTEXT_HOST:\/\/localhost:29092\n KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1\n KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0\n KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1\n KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了简单起见,我们将配置一个单节点 Kafka 集群。我将在本系列的第2部分中讨论关于多阶段集群的更多内容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"了解我们为 Kafka Broker所做的一些配置尤其重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"监听器(Listeners)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"因为 Kafka 被设计成一个分布式平台,我们需要提供一些明确的方式来允许 Kafka Broker彼此在内部通信,并基于您的网络结构与其他客户端进行外部通信。因此我们使用监听器来完成这个任务,监听器是主机、端口和协议的组合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"KAFKA_LISTENERS"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这是一个可以由 KAFKA 绑定的网络端口列表,由主机、端口和协议组合成。默认情况下,它被设置为 "},{"type":"codeinline","content":[{"type":"text","text":"0.0.0.0"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",即监听所有端口。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"KAFKA_ADVERTISED_LISTENERS"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这个值同样是主机和端口的组合,客户端将使用它来连接 KAFKA Broker。因此,如果客户端在 docker 中,它可以使用 "},{"type":"codeinline","content":[{"type":"text","text":"broker:9092"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"连接到 broker,如果在 docker 外,则返回 "},{"type":"codeinline","content":[{"type":"text","text":"localhost:9092"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"来建立和 broker 的连接。我们还需要提到监听器名称,其才能被映射到恰当的协议以建立连接。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"KAFKA_LISTENER_SECURITY_PROTOCOL_MAP"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这里我们将用户定义的监听器名称映射到希望用于通信的协议;它可以是"},{"type":"codeinline","content":[{"type":"text","text":"PLAINTEXT"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(未加密)或 "},{"type":"codeinline","content":[{"type":"text","text":"SSL"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" (加密的)。这些名字在 "},{"type":"codeinline","content":[{"type":"text","text":"KAFKA_LISTENERS"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"KAFKA_ADVERTISED_LISTENERS"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 中被进一步与host\/ip 一起使用,以便使用恰当的协议。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由于我们只配置了单节点的 Kafka 集群,因此返回的或者说发送给任何客户端的推荐地址都将是自身这"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"同一 broker"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Schema 注册(Schema-Registry)"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"schema-registry:\n image: confluentinc\/cp-schema-registry:6.0.0\n hostname: schema-registry\n container_name: schema-registry\n depends_on:\n - zookeeper\n - broker\n ports:\n - \"8081:8081\"\n environment:\n SCHEMA_REGISTRY_HOST_NAME: schema-registry\n SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: \"zookeeper:2181\"\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"对於单节点 schema 注册,我们指定用来连接 zookeeper 的字符串,Kafka 用它存储与 schema 相关的数据。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Kafka-Connect"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"connect:\n image: confluentinc\/cp-kafka-connect:6.0.0\n hostname: connect\n container_name: connect\n volumes:\n - \".\/producers\/debezium-debezium-connector-postgresql\/:\/usr\/share\/confluent-hub-components\/debezium-debezium-connector-postgresql\/\"\n - \".\/consumers\/confluentinc-kafka-connect-elasticsearch\/:\/usr\/share\/confluent-hub-components\/confluentinc-kafka-connect-elasticsearch\/\"\n depends_on:\n - zookeeper\n - broker\n - schema-registry\n ports:\n - \"8083:8083\"\n environment:\n CONNECT_BOOTSTRAP_SERVERS: \"broker:9092\"\n KAFKA_HEAP_OPTS: \"-Xms256M -Xmx512M\"\n CONNECT_REST_ADVERTISED_HOST_NAME: connect\n CONNECT_REST_PORT: 8083\n CONNECT_GROUP_ID: compose-connect-group\n CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs\n CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1\n CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000\n CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets\n CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1\n CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status\n CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1\n CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter\n CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter\n CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http:\/\/schema-registry:8081\n CONNECT_INTERNAL_KEY_CONVERTER: \"org.apache.kafka.connect.json.JsonConverter\"\n CONNECT_INTERNAL_VALUE_CONVERTER: \"org.apache.kafka.connect.json.JsonConverter\"\n CONNECT_ZOOKEEPER_CONNECT: \"zookeeper:2181\"\n CLASSPATH: \/usr\/share\/java\/monitoring-interceptors\/monitoring-interceptors-5.5.1.jar\n CONNECT_PRODUCER_INTERCEPTOR_CLASSES: \"io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor\"\n CONNECT_CONSUMER_INTERCEPTOR_CLASSES: \"io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor\"\n CONNECT_PLUGIN_PATH: \"\/usr\/share\/java,\/usr\/share\/confluent-hub-components\"\n CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们看到一些新的参数,比如:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"CONNECT_BOOTSTRAP_SERVERS:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一组主机和端口组合,用于建立到 Kafka 集群的初始连接"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"CONNECT_KEY_CONVERTER:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"用于将键(key)从"},{"type":"codeinline","content":[{"type":"text","text":"connect"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"格式序列化为与 Kafka 兼容的格式。类似地,对于 "},{"type":"codeinline","content":[{"type":"text","text":"CONNECT_VALUE_CONVERTER"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",我们使用 AvroConverter 进行序列化。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"映射大量 source 和 sink 连接器插件并在 "},{"type":"codeinline","content":[{"type":"text","text":"CONNECT_PLUGIN_PATH"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 中指定它们是非常的重要。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"ksqlDB"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"ksqldb-server:\n image: confluentinc\/cp-ksqldb-server:6.0.0\n hostname: ksqldb-server\n container_name: ksqldb-server\n depends_on:\n - broker\n - schema-registry\n ports:\n - \"8088:8088\"\n volumes:\n - \".\/producers\/debezium-debezium-connector-postgresql\/:\/usr\/share\/kafka\/plugins\/debezium-debezium-connector-postgresql\/\"\n - \".\/consumers\/confluentinc-kafka-connect-elasticsearch\/:\/usr\/share\/kafka\/plugins\/confluentinc-kafka-connect-elasticsearch\/\"\n environment:\n KSQL_LISTENERS: \"http:\/\/0.0.0.0:8088\"\n KSQL_BOOTSTRAP_SERVERS: \"broker:9092\"\n KSQL_KSQL_SCHEMA_REGISTRY_URL: \"http:\/\/schema-registry:8081\"\n KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: \"true\"\n KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: \"true\"\n KSQL_KSQL_STREAMS_MAX_TASK_IDLE_MS: 2000\n KSQL_CONNECT_GROUP_ID: \"ksql-connect-cluster\"\n KSQL_CONNECT_BOOTSTRAP_SERVERS: \"broker:9092\"\n KSQL_CONNECT_KEY_CONVERTER: \"io.confluent.connect.avro.AvroConverter\"\n KSQL_CONNECT_VALUE_CONVERTER: \"io.confluent.connect.avro.AvroConverter\"\n KSQL_CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: \"http:\/\/schema-registry:8081\"\n KSQL_CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: \"http:\/\/schema-registry:8081\"\n KSQL_CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: \"false\"\n KSQL_CONNECT_CONFIG_STORAGE_TOPIC: \"ksql-connect-configs\"\n KSQL_CONNECT_OFFSET_STORAGE_TOPIC: \"ksql-connect-offsets\"\n KSQL_CONNECT_STATUS_STORAGE_TOPIC: \"ksql-connect-statuses\"\n KSQL_CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1\n KSQL_CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1\n KSQL_CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1\n KSQL_CONNECT_PLUGIN_PATH: \"\/usr\/share\/kafka\/plugins\"\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如果不打算使用 "},{"type":"codeinline","content":[{"type":"text","text":"Kafka-Connect"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",并且不需要独立于 "},{"type":"codeinline","content":[{"type":"text","text":"ksql"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"扩展 "},{"type":"codeinline","content":[{"type":"text","text":"Kafka-Connect"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",那么可以为 "},{"type":"codeinline","content":[{"type":"text","text":"ksql"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"设置 "},{"type":"codeinline","content":[{"type":"text","text":"embedded-connect"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"配置,这将暴露来自 "},{"type":"codeinline","content":[{"type":"text","text":"ksqldb-server"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的连接点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除此之外,还有一个环境变量需要考虑:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"KSQL_KSQL_STREAMS_MAX_TASK_IDLE_MS"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":在当前版本的 ksqlDB,对于流式表关联,关联的结果可能变成不确定的,即如果在流事件之前还没有创建或更新被关联的表中的实时事件,那您可能无法关联成功。当流中的某个事件在某个特定时间戳到达时,配置这个环境变量可以做一些等待让这个事件加载到表中。这提高了关联的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"可预测性"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",但可能会导致某些"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"性能下降"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。在"},{"type":"link","attrs":{"href":"https:\/\/cwiki.apache.org\/confluence\/display\/KAFKA\/KIP-695%3A+Further+Improve+Kafka+Streams+Timestamp+Synchronization","title":null,"type":null},"content":[{"type":"text","text":"这里"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们正在努力改善这一点。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"实际上,如果你不能清楚地理解上面的内容,我建议你现在就使用这个配置,因为它很有效;它实际上需要另一篇文章来详细讨论"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"时间同步"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",或者如果你仍然好奇,你可以观看这个由来自 Confluent 的 Matthias j. Sax 制作的"},{"type":"link","attrs":{"href":"https:\/\/www.confluent.io\/resources\/kafka-summit-2020\/the-flux-capacitor-of-kafka-streams-and-ksqldb\/","title":null,"type":null},"content":[{"type":"text","text":"视频"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"ksqldb-cli:\n image: confluentinc\/cp-ksqldb-cli:6.0.0\n container_name: ksqldb-cli\n depends_on:\n - broker\n - ksqldb-server\n entrypoint: \/bin\/sh\n tty: true\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在测试或开发环境中,使用 "},{"type":"codeinline","content":[{"type":"text","text":"ksqldb-cli"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"服务来尝试和测试流非常方便。即使在生产环境中,如果您想探索事件流或 Ktables,或者手动创建或过滤流,也可以这样做。尽管如此,还是建议您使用 ksql 或 kafka 客户端或其 REST 端点自动创建流、表或主题,这些我们将在下面进行讨论。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b3\/e2\/b3820925253c5288f3fff27030d153e2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"图片由作者提供:目前为止对我们的架构进行的更详细观察"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"初始化数据"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"流"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"streams-init:\n build: jobs\/streams-init\n container_name: streams-init\n depends_on:\n - zookeeper\n - broker\n - schema-registry\n - ksqldb-server\n - ksqldb-cli\n - postgres\n - elasticsearch\n - connect\n env_file:\n - .env\n environment:\n ZOOKEEPER_HOSTS: \"zookeeper:2181\"\n KAFKA_TOPICS: \"brands, brand_products\"\n networks:\n - project_network"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这个服务的目的是进行流初始化和 Kafka 内部配置,以及我们正在使用的其他服务。在部署时,我们不希望在服务器上手动创建主题、流、连接等。因此,我们使用为每个服务提供的 REST 服务,并编写 shell 脚本来自动化这个过程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们的配置脚本如下所示:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"#!\/bin\/bash\n\n\n# Setup ENV variables in connectors json files\nsed -i \"s\/POSTGRES_USER\/${POSTGRES_USER}\/g\" connectors\/postgres.json\nsed -i \"s\/POSTGRES_PASSWORD\/${POSTGRES_PASSWORD}\/g\" connectors\/postgres.json\nsed -i \"s\/POSTGRES_DB\/${POSTGRES_DB}\/g\" connectors\/postgres.json\nsed -i \"s\/ELASTIC_PASSWORD\/${ELASTIC_PASSWORD}\/g\" connectors\/elasticsearch.json\n\n\n# Simply wait until original kafka container and zookeeper are started.\nexport WAIT_HOSTS=zookeeper:2181,broker:9092,schema-registry:8081,ksqldb-server:8088,elasticsearch:9200,connect:8083\nexport WAIT_HOSTS_TIMEOUT=300\n\/wait\n\n\n# Parse string of kafka topics into an array\n# https:\/\/stackoverflow.com\/a\/10586169\/4587961\nkafkatopicsArrayString=\"$KAFKA_TOPICS\"\nIFS=', ' read -r -a kafkaTopicsArray <<< \"$kafkatopicsArrayString\"\n\n\n# A separate variable for zookeeper hosts.\nzookeeperHostsValue=$ZOOKEEPER_HOSTS\n\n\n# Terminate all queries\ncurl -s -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" \\\n -H \"Content-Type: application\/vnd.ksql.v1+json; charset=utf-8\" \\\n -d '{\"ksql\": \"SHOW QUERIES;\"}' | \\\n jq '.[].queries[].id' | \\\n xargs -Ifoo curl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" \\\n -H \"Content-Type: application\/vnd.ksql.v1+json; charset=utf-8\" \\\n -d '{\"ksql\": \"TERMINATE 'foo';\"}'\n \n\n\n# Drop All Tables\ncurl -s -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" \\\n -H \"Content-Type: application\/vnd.ksql.v1+json; charset=utf-8\" \\\n -d '{\"ksql\": \"SHOW TABLES;\"}' | \\\n jq '.[].tables[].name' | \\\n xargs -Ifoo curl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" \\\n -H \"Content-Type: application\/vnd.ksql.v1+json; charset=utf-8\" \\\n -d '{\"ksql\": \"DROP TABLE \\\"foo\\\";\"}'\n\n\n\n\n# Drop All Streams\ncurl -s -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" \\\n -H \"Content-Type: application\/vnd.ksql.v1+json; charset=utf-8\" \\\n -d '{\"ksql\": \"SHOW STREAMS;\"}' | \\\n jq '.[].streams[].name' | \\\n xargs -Ifoo curl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" \\\n -H \"Content-Type: application\/vnd.ksql.v1+json; charset=utf-8\" \\\n -d '{\"ksql\": \"DROP STREAM \\\"foo\\\";\"}'\n \n\n\n# Create kafka topic for each topic item from split array of topics.\nfor newTopic in \"${kafkaTopicsArray[@]}\"; do\n # https:\/\/kafka.apache.org\/quickstart\n curl -X DELETE http:\/\/elasticsearch:9200\/enriched_$newTopic --user elastic:${ELASTIC_PASSWORD}\n curl -X DELETE http:\/\/schema-registry:8081\/subjects\/store.public.$newTopic-value\n kafka-topics --create --topic \"store.public.$newTopic\" --partitions 1 --replication-factor 1 --if-not-exists --zookeeper \"$zookeeperHostsValue\"\n curl -X POST -H \"Content-Type: application\/vnd.schemaregistry.v1+json\" --data @schemas\/$newTopic.json http:\/\/schema-registry:8081\/subjects\/store.public.$newTopic-value\/versions\n\n\ndone\n\n\ncurl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" -H \"Accept: application\/vnd.ksql.v1+json\" -d \n\n{ \"ksql\": \"CREATE STREAM \\\\\"brands\\\\\" WITH (kafka_topic = \\'store.public.brands\\', value_format = \\'avro\\');\", \"streamsProperties\": {} }'\ncurl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" -H \"Accept: application\/vnd.ksql.v1+json\" -d \n\n{ \"ksql\": \"CREATE STREAM \\\\\"enriched_brands\\\\\" WITH ( kafka_topic = \\'enriched_brands\\' ) AS SELECT CAST(brand.id AS VARCHAR) as \\\\\"id\\\\\", brand.tenant_id as \\\\\"tenant_id\\\\\", brand.name as \\\\\"name\\\\\" from \\\\\"brands\\\\\" brand partition by CAST(brand.id AS VARCHAR) EMIT CHANGES;\", \"streamsProperties\": {} }'\n\n\ncurl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" -H \"Accept: application\/vnd.ksql.v1+json\" -d \n\n{ \"ksql\": \"CREATE STREAM \\\\\"brand_products\\\\\" WITH ( kafka_topic = \\'store.public.brand_products\\', value_format = \\'avro\\' );\", \"streamsProperties\": {} }'\ncurl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" -H \"Accept: application\/vnd.ksql.v1+json\" -d \n\n{ \"ksql\": \"CREATE TABLE \\\\\"brands_table\\\\\" AS SELECT id as \\\\\"id\\\\\", latest_by_offset(tenant_id) as \\\\\"tenant_id\\\\\" FROM \\\\\"brands\\\\\" group by id EMIT CHANGES;\", \"streamsProperties\": {} }'\ncurl -X \"POST\" \"http:\/\/ksqldb-server:8088\/ksql\" -H \"Accept: application\/vnd.ksql.v1+json\" -d \n\n{ \"ksql\": \"CREATE STREAM \\\\\"enriched_brand_products\\\\\" WITH ( kafka_topic = \\'enriched_brand_products\\' ) AS SELECT \\\\\"brand\\\\\".\\\\\"id\\\\\" as \\\\\"brand_id\\\\\", \\\\\"brand\\\\\".\\\\\"tenant_id\\\\\" as \\\\\"tenant_id\\\\\", CAST(brand_product.id AS VARCHAR) as \\\\\"id\\\\\", brand_product.name AS \\\\\"name\\\\\" FROM \\\\\"brand_products\\\\\" AS brand_product INNER JOIN \\\\\"brands_table\\\\\" \\\\\"brand\\\\\" ON brand_product.brand_id = \\\\\"brand\\\\\".\\\\\"id\\\\\" partition by CAST(brand_product.id AS VARCHAR) EMIT CHANGES;\", \"streamsProperties\": {} }'\n\n\ncurl -X DELETE http:\/\/connect:8083\/connectors\/enriched_writer\ncurl -X \"POST\" -H \"Content-Type: application\/json\" --data @connectors\/elasticsearch.json http:\/\/connect:8083\/connectors\n\n\ncurl -X DELETE http:\/\/connect:8083\/connectors\/event_reader\ncurl -X \"POST\" -H \"Content-Type: application\/json\" --data @connectors\/postgres.json http:\/\/connect:80"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这就是我们目前的工作方式:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在运行任何任务之前,我们确保所有的服务都"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"准备好了"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":";"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们需要确保主题在 Kafka 上"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"已存在"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",或者我们创建新的主题;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"即使有 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"schema 更新"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",我们的数据流也应该是可用的;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"当底层数据 srouce 或 sink 的密码或版本更改,需要再次创建连接。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"共享这个配置脚本的目的只是为了"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"演示"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一种自动化这些 pipeline的方法。完全相同的配置可能并不适合您,但是自动化工作流和避免在任何环境中的进行手工部署的想法始终是一样的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了让这个数据基础设施能够真正快速地运行起来,请参考 Github 仓库:"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/behindthescenes-group\/oesophagus","title":null,"type":null},"content":[{"type":"text","text":"behindthescenes-group\/oesophagus"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在你的终端中克隆代码库并执行以下操作:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"cp default.env .env\ndocker-compose up -d"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在Postgres 数据库 "},{"type":"codeinline","content":[{"type":"text","text":"store"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"中创建 "},{"type":"codeinline","content":[{"type":"text","text":"brands"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"brand_products"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 表:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"CREATE TABLE brands (\n id serial PRIMARY KEY,\n name VARCHAR (50),\n tenant_id INTEGER\n);\nCREATE TABLE brand_products (\n id serial PRIMARY KEY,\n brand_id INTEGER,\n name VARCHAR(50)\n);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在"},{"type":"codeinline","content":[{"type":"text","text":"brands"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"表中插入一些记录:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"INSERT INTO brands VALUES(1, 'Brand Name 1', 1);\nINSERT INTO brands VALUES(2, 'Brand Name 2', 1);\nINSERT INTO brands VALUES(3, 'Brand Name 3', 2);\nINSERT INTO brands VALUES(4, 'Brand Name 4', 2);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"然后"},{"type":"codeinline","content":[{"type":"text","text":"brand_products"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"表中的一些记录:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"INSERT INTO brand_products VALUES(1, 1, 'Product Name 1');\nINSERT INTO brand_products VALUES(2, 2, 'Product Name 2');\nINSERT INTO brand_products VALUES(3, 3, 'Product Name 3');\nINSERT INTO brand_products VALUES(4, 4, 'Product Name 4');\nINSERT INTO brand_products VALUES(5, 1, 'Product Name 5');"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在 Elasticsearch 的中查看填充了"},{"type":"codeinline","content":[{"type":"text","text":"tenant_id"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 的"},{"type":"codeinline","content":[{"type":"text","text":"brand_products"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" :"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"curl localhost:9200\/enriched_brand_products\/_search --user elastic:your_password"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我将持续为上述代码库做出贡献:添加在 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Kubernetes"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 部署多节点 Kafka 基础设施的配置,编写更多连接器,使用期望的服务实现"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"即插即用"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"架构的框架。请在"},{"type":"link","attrs":{"href":"https:\/\/forms.gle\/GGg2hvnEpG6r4bgg7","title":null,"type":null},"content":[{"type":"text","text":"这里"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"自由的提交贡献,或让我知道在你在当前配置中所遇到的任何"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"数据工程问题"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"下一步"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我希望这篇文章能给你一个关于部署和运行完整 Kafka 技术栈的清晰思路,这是一个构建实时流处理应用程序的基础且有效的示例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"根据产品或公司的自身特点,部署过程根据需要可能会有所不同。我还计划在本系列的下一部分中就这样一个系统在可伸缩性方面进行探讨,那将是关于在相同使用场景下如何在 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"Kubernetes"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 上部署这样的基础设施的讨论。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"英文原文链接"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/enabling-a-powerful-search-capability-building-and-deploying-a-real-time-stream-processing-etl-a27ecb0ab0ae","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/enabling-a-powerful-search-capability-building-and-deploying-a-real-time-stream-processing-etl-a27ecb0ab0ae"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章