Pulsar Flink Connector 2.5.0 正式发布

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"经过不断地努力,社区成功发布 Pulsar Flink Connector 2.5.0 版本。Pulsar Flink Connector 集成了 Apache Pulsar 和 Apache Flink(数据处理引擎),允许 Apache Flink 向 Apache Pulsar 读写数据。 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"项目地址:https://github.com/streamnative/pulsar-flink/tree/release-2.5.0 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面将详细介绍 Pulsar Flink Connector 2.5.0 引入的新特性,希望能够帮助大家更好地理解 Pulsar Flink Connector。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink 是一款快速发展的分布式计算引擎,在 1.11 版本中,支持以下新特性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 核心引擎引入了非对齐的 Checkpoint 机制。这一机制明显改善了 Flink 容错机制,它可以提高严重反压作业的 Checkpoint 速度。 "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 提供一套新的 Source 接口。通过统一流和批作业 Source 的运行机制,支持常用的内部实现,如事件时间处理、watermark 生成和空闲并发检测。这套新的 Source 接口可以极大地降低开发新 Source 的复杂度。 "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Flink SQL 支持变动数据捕获(Change Data Capture,CDC)。它使 Flink 可以方便地通过像 Debezium 这类工具来翻译和消费数据库的变动日志。Table API 和 SQL 也有助于文件系统连接器支持更多用户场景和格式,从而支持将流式数据从 Pulsar 写入 Hive 等场景。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" PyFlink 优化多个部分的性能,包括支持向量化的用户自定义函数(Python UDF)。这些改动使 Flink Python 接口可以与常用的 Python 库(如 Pandas 和 NumPy)进行相互操作,从而使 Flink 更适合数据处理与机器学习的场景。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在新版本发布后,为了让小伙伴们尽快使用支持 Flink 1.11 的 Pulsar Flink Connector,我们对新版 Pulsar Flink Connector 进行了升级。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们发现这次升级难度很大,问题在于 Flink 新版本对于公开 API 的支持有增减(基础的 FieldsDataType 类型、StreamTableEnvironment 包变更和 execute 方法的变化)、Table 检查 Schema 操作变更为启动时检查、连接器运行时转换为 Catalog,直接使新旧版本不兼容。 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"经过多方考量,我们最终决定新增 pulsar-flink-1.11 模块来支持 Flink 1.11。在这里非常感谢 BIGO 团队的陈航、吴展鹏,为社区贡献了 Flink 1.11 的兼容升级技术支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar Schema 包含消息的类型结构信息,它可以很好地和 Flink Table 进行集成。在 Flink 1.9 时,SQL 类型可以绑定物理类型,用于 Pulsar 的 SchemaType。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是在 Flink 1.11,Table 变更后,SQL 类型只能使用默认的物理类型,Pulsar 的 SchemaType 不支持 Flink 日期、事件的默认物理类型。我们为 Pulsar Schema 添加了新的原生类型,使 Pulsar Schema 可以和 Flink SQL 类型系统集成起来。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Pulsar Flink Connector 新特性详解 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下是 Pulsar Flink Connector 2.5.0 中添加的一些主要的功能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"pulsar-flink"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"📣 支持 Flink 1.11 和 flink-sql DDL"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink 1.11 版本升级的幅度较大,一些公共 API 发生了增删,导致 Flink 1.9、Flink 1.11 的 Pulsar 连接器无法做到兼容。本次变更使项目分为两个模块,来支持不同版本的 Flink。BIGO 的陈航、吴展鹏童鞋为此特性付出了很大的努力。 "}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 支持 Flink 1.11 版本 "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 新增 Flink-sql DDL 支持 "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 更新 topic 分区策略,使消费更均匀 "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Flink 1.11 兼容 Pulsar schema"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有关实现的更多信息,请参见 PR-115:https://github.com/streamnative/pulsar-flink/pull/115 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"📣 添加 PulsarDeserializationSchema 接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"抽象 PulsarDeserializationSchema 接口,使用户可以自定义解码,获得更多源信息。有关实现的更多信息,请参见 PR-95: https://github.com/streamnative/pulsar-flink/pull/95 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"贡献者:@wuzhanpeng"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"📣 Flink Sink 增加 JSON 支持"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink Sink 实现中,Pulsar Schema 类型支持 JSON 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有关实现的更多信息,请参见 PR-116:https://github.com/streamnative/pulsar-flink/pull/116 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"贡献者:@jianyun8023"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"📣 PulsarCatalog 变更为基于 GenericInMemoryCatalog 实现"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PulsarCatalog 的实现变更为继承 GenericInMemoryCatalog。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有关实现的更多信息,请参见 PR-91:https://github.com/streamnative/pulsar-flink/pull/91。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"贡献者:@sijie"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"Pulsar Schema"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"📣 增加 Java 8 时间、日期类型到 Pulsar Schema 的原生类型"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为 Pulsar Schema 增加 Java 8 常用的 Instant、LocalDate、LocalTime、LocalDateTime 等类型支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有关实现的更多信息,请参见PR-7874:https://github.com/apache/pulsar/pull/7874 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"贡献者:@jianyun8023"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pulsar Flink Connector 2.5.0 的发布,对于这个快速发展的项目来说,是一个大的里程碑。在此特别感谢为本次版本发布做出贡献的陈航、吴展鹏、郭斯杰、赵建云。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你有好的想法或想成为项目贡献者,欢迎提交 ISSUE,也可以参考我们的贡献指南:https://github.com/streamnative/pulsar-flink/issues。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"相关链接"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s?_biz=MzU3Mzg4OTMyNQ==&mid=2247488124&idx=1&sn=002d1acbfa47887a4a28f14a4459f7ea&scene=21#wechatredirect","title":""},"content":[{"type":"text","text":"Flink 1.11 新特性(Flink-China)"}]},{"type":"text","text":" "}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Pulsar Flink Connector:https://github.com/streamnative/pulsar-flink/tree/release-2.5.0"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" streamnative/pulsar-flink:https://github.com/streamnative/pulsar-flink/issues"}]}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章