一、checkpoint失败
原因:我把insert into这个语句给注释掉了
其他的原因大概率是flinksql不够严谨
二、运行中的报错
java.io.IOException: Failed to deserialize consumer record due to
at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:54)
at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:32)
at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:143)
at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:354)
at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to deserialize consumer record ConsumerRecord(topic = topic_test_01, partition = 2, leaderEpoch = 0, offset = 808016, CreateTime = 1682242261803, serialized key size = -1, serialized value size = 278, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = [B@25f6134a).
at org.apache.flink.connector.kafka.source.reader.deserializer.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:57)
at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:51)
... 14 more
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:196)
at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter$SourceOutputWrapper.collect(KafkaRecordEmitter.java:65)
at org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:84)
at org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaDeserializationSchema.deserialize(DynamicKafkaDeserializationSchema.java:113)
at org.apache.flink.connector.kafka.source.reader.deserializer.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:54)
... 15 more
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:99)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
at StreamExecCalc$41.processElement_split3(Unknown Source)
at StreamExecCalc$41.processElement(Unknown Source)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
... 23 more
Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.TimeoutException: Topic test_tpoic_02 not present in metadata after 60000 ms.
这个错误消息表明Flink在反序列化来自Kafka主题“dwd_crm_mianxin_card_use_detail_rt”的消费者记录时遇到了麻烦。
这个错误可能有几个原因,但一个常见的原因是Kafka主题中的数据不是预期的格式。Flink使用反序列化器将Kafka主题中的二进制数据转换为Java对象,如果数据不是预期的格式,反序列化器将失败。
要解决此问题,您可以尝试以下步骤:
检查Kafka主题中的数据以确保它是预期的格式。你可以使用Kafka消费者工具从主题中读取数据并检查它。
检查Flink作业中Kafka消费者的配置,以确保它对主题中的数据使用了正确的反序列化器。你可以在Flink Kafka消费者配置中指定反序列化类。
检查Flink和Kafka库的版本兼容性。确保你正在使用的Flink和Kafka库的版本相互兼容。
请检查Flink作业的日志,了解更详细的错误信息。您提供的错误消息不是非常具体,因此日志中可能有更多信息可以帮助您诊断问题。
三、向doris导入数据遇到的问题
1、table Already Exists and load job finished,change you label prefix or restore from lates savepoint
原因:这个是在任务重新启动的时候,没有修改参数:sink.label-prefix 的值,每次重启任务向doris导数都得改其value。 例如:'sink.label-prefix'='label_ads_cdp_mianxin_card_use_detail_rt_pt_d_2023042716'。如果不改就会报上面的错误。
2、任务正常运行,没有报错,但是doris的目标表没有数据
主要是把配置项修改得符合数据改:'sink.properties.strip_outer_array'='false',
加:'sink.properties.read_json_by_line'='true',
上图错误主要是sink.properties.strip_outer_array=false 没有设置或者说没有生效
3、标准的导入doris的建表语句
create table if not exists ads_rt.ads_cdp_mianxin_card_use_detail_rt(
id string,
dt string,
date_happened string,
mbr_id string,
card_id string,
business_type string,
card_status string,
event_type string,
ts string,
balance decimal(22,2),
change_amount decimal(22,2)
) WITH (
'connector'='doris',
'fenodes'='10.00.00.01:8045',
'table.identifier'='doris_db.doris_sink_table',
'username'='username',
'password'='000000',
'sink.properties.format'='json',
'sink.properties.strip_outer_array'='false',
'sink.enable-delete'='true',
'sink.properties.read_json_by_line'='true',
'sink.label-prefix'='label_doris_sink_table_2023042716'
);