Flink開發遇到的問題彙總-

1，提交的離線任務完成之後，在一段時間後web端沒有顯示或者說自動消失：

原因分析：

https://blog.csdn.net/u013076044/article/details/104740792

是需要開啓historyserver

2，Flink讀取hdfs的（hive分區）文件，目前已知有三種方式：

通過循環union方式，本地可以執行，集羣提交報錯

通過官方API，可是隻支持csv文件格式,可以讀取hdfs上的文件:

不在官方API的方式，正在測試….

參考：

背景：使用flink批作業讀取存在hdfs上的日誌需要迭代讀取目錄下所有文件的內容

使用的方法：

Configuration conf = new Configuration();

conf.setBoolean("recursive.file.enumeration", true);

DataSet<String> in = env.readTextFile(urlWithDate).withParameters(conf);

但是由於日誌數量比較大出現akka鏈接超時問題

無法正常提交job

相關社區issue:

https://issues.apache.org/jira/browse/FLINK-3964

後來改用如下方法讀取日誌，成功解決：

FileInputFormat fileInputFormat = new TextInputFormat(new Path(urlWithDate));

fileInputFormat.setNestedFileEnumeration(true);

DataSet<String> dataSet = env.readFile(fileInputFormat, urlWithDate);

相關mail-list參考：

http://mail-archives.apache.org/mod_mbox/flink-user/201701.mbox/<[email protected]>

參考代碼：

package com.zhisheng.sql.blink.stream.example;

import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.table.descriptors.FileSystem;
import org.apache.flink.table.descriptors.OldCsv;
import org.apache.flink.table.descriptors.Schema;
import org.apache.flink.types.Row;

/**
* Desc: Blink Stream Table Job
* Created by zhisheng on 2019/11/3 下午1:14
* blog：http://www.54tianzhisheng.cn/
* 微信公衆號：zhisheng
*/
public class TableExampleWordCount {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment blinkStreamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
        blinkStreamEnv.setParallelism(1);
        EnvironmentSettings blinkStreamSettings = EnvironmentSettings.newInstance()
                .useBlinkPlanner()
                .inStreamingMode()
                .build();
        StreamTableEnvironment blinkStreamTableEnv = StreamTableEnvironment.create(blinkStreamEnv, blinkStreamSettings);

        String path = TableExampleWordCount.class.getClassLoader().getResource("words.txt").getPath();
        blinkStreamTableEnv
                .connect(new FileSystem().path(path))
                .withFormat(new OldCsv().field("word", Types.STRING).lineDelimiter("\n"))
                .withSchema(new Schema().field("word", Types.STRING))
                .inAppendMode()
                .registerTableSource("FlieSourceTable");

        Table wordWithCount = blinkStreamTableEnv.scan("FlieSourceTable")
                .groupBy("word")
                .select("word,count(word) as _count");
        blinkStreamTableEnv.toRetractStream(wordWithCount, Row.class).print();

        //打印結果中的 true 和 false，可能會有點疑問，爲啥會多出一個字段。
        //Sink 做的事情是先刪除再插入，false 表示刪除上一條數據，true 表示插入該條數據

        blinkStreamTableEnv.execute("Blink Stream SQL Job");
    }
}

任務提交到yarn集羣無法運行

出現錯誤：

具體描述就是提交到任務到yarn之後，yarn一直處於等待狀態，（jobManager）打印一堆日誌之後然後出現上述報錯

解決方案：

由於之前是部署的standalone高可用，導致可能是端口衝突，重置配置文件，去除無用的配置，就可以了在yarn提交任務的時候 flink集羣不用啓動

本地代碼提交的時候遇到問題：

Cannot support file system for 'hdfs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.

解決方案：代碼導入 flink-shaded-hadoop-2-uber-2.6.5-10.0.jar

正常集羣lib目錄下會有這個jar包的。

集羣不能執行SQL任務

代碼pom文件加入配置：

運行代碼遇到 ：

org.codehaus.jackson.map.ObjectMapper.writerWithDefaultPrettyPrinter()Lorg/codehaus/jackson/map/ObjectWriter;

後來發現是依賴衝突，具體解決方案：

如果有紅色就是衝突了，點擊進去

複製：

去對應的包解決掉排除：

不支持將group by寫入kafka

莫名其妙的本地找不到class，pom依賴被provide了。

解決方案：

消費kafka發現數據消費的不對，因爲消費kafka是指定的時間戳消費，必須要精確到毫秒，當時寫入的是秒，所以還以爲是代碼業務問題。

Caused by: java.lang.OutOfMemoryError: Metaspace

後查證是 jvm Metaspace 大小受限制，默認是96M，設置爲512M，在配置文件中設置 taskmanager.memory.jvm-metaspace.size: 512mb

儘量提交到集羣的包小一點。

TTL狀態過期：

        @Override

            public void open(Configuration parameters) throws Exception {

                super.open(parameters);

                ValueStateDescriptor<OrderStatistics> valueStateDescriptor =

                        new ValueStateDescriptor<>("lastUserLogin", TypeInformation.of(new TypeHint<OrderStatistics>() {

                        }));



                //設置ttl 設置7天狀態自動過期

//                       StateTtlConfig ttlConfig = StateTtlConfig

////                                .newBuilder(Time.days(7))

//                                .newBuilder(Time.seconds(60))

//                                .cleanupIncrementally(10, false)

//                                .build();

//

//                        valueStateDescriptor.enableTimeToLive(ttlConfig);

                valueState = getRuntimeContext().getState(valueStateDescriptor);



            }

Flink開發遇到的問題彙總-

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Flink開發遇到的問題彙總-

Flink on zeppelin 初試

Flink sql 基於hbase，mysql的維表實戰 -未完

python 基礎系列08-內建函數

python 基礎系列07-字典集合

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結