學習Flink的時候第一個入門程序WordCount,官方給的使用匿名類實現方法,這樣看起來代碼不簡潔。於是想用lamda改寫下,踩了不少坑,記錄下。
Table of Contents
錯誤2: .keyBy("word") 類型不能做key的錯誤
flink 版本 1.9
官方給定版本
public class SocketWindowWordCount { public static void main(String[] args) throws Exception { // the port to connect to final int port; try { final ParameterTool params = ParameterTool.fromArgs(args); port = params.getInt("port"); } catch (Exception e) { System.err.println("No port specified. Please run 'SocketWindowWordCount --port <port>'"); return; } // get the execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // get input data by connecting to the socket DataStream<String> text = env.socketTextStream("localhost", port, "\n"); // parse the data, group it, window it, and aggregate the counts DataStream<WordWithCount> windowCounts = text .flatMap(new FlatMapFunction<String, WordWithCount>() { @Override public void flatMap(String value, Collector<WordWithCount> out) { for (String word : value.split("\\s")) { out.collect(new WordWithCount(word, 1L)); } } }) .keyBy("word") .timeWindow(Time.seconds(5), Time.seconds(1)) .reduce(new ReduceFunction<WordWithCount>() { @Override public WordWithCount reduce(WordWithCount a, WordWithCount b) { return new WordWithCount(a.word, a.count + b.count); } }); // print the results with a single thread, rather than in parallel windowCounts.print().setParallelism(1); env.execute("Socket Window WordCount"); } // Data type for words with count public static class WordWithCount { public String word; public long count; public WordWithCount() {} public WordWithCount(String word, long count) { this.word = word; this.count = count; } @Override public String toString() { return word + " : " + count; } } }
Lamda第一版 POJO版
package com.my.study.flink; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; /** * Description: * * @author adore.chen * @date 2019-11-19 */ public class SocketStreamWordCount { public static void main(String[] args) throws Exception { ParameterTool tool = ParameterTool.fromArgs(args); int port = tool.getInt("port"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> dataStream = env.socketTextStream("localhost", port, "\n"); dataStream.flatMap((String value, Collector<WordCount> out) -> { for (String word: value.split("\\s")) { if (word.trim().length()>0) { out.collect(new WordCount(word, 1)); } } }) .returns(WordCount.class) .keyBy((WordCount wc) -> wc.word) .reduce((WordCount wc1, WordCount wc2) -> new WordCount(wc1.word, wc1.count + wc2.count)) .print(); env.execute("socket word count"); } public static class WordCount { private String word; private int count; public WordCount(String word, int count) { this.word = word; this.count = count; } @Override public String toString() { return word + ":" +count; } } }
錯誤1:Collector無泛型參數錯誤
InvalidTypesException: The generic type parameters of 'Collector' are missing. In many cases lambda methods don't provide enough information for automatic type extraction when Java generics are involved. An easy workaround is to use an (anonymous) class instead that implements the 'org.apache.flink.api.common.functions.FlatMapFunction' interface. Otherwise the type has to be specified explicitly using type information.
at org.apache.flink.api.java.typeutils.TypeExtractionUtils.validateLambdaType(TypeExtractionUtils.java:350)
at org.apache.flink.api.java.typeutils.TypeExtractionUtils.extractTypeFromLambda(TypeExtractionUtils.java:176)
at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:571)
at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:196)
at org.apache.flink.streaming.api.datastream.DataStream.flatMap(DataStream.java:611)
at com.coupang.ecfds.flink.SocketStreamWordCount.main(SocketStreamWordCount.java:24)
Lamda表達式編譯之後,編譯器擦除了泛型GenericType,所以不知道返回類型,需要顯示指定。通過 returns(TypeInformation)語句指定。
詳細參考:Flink TypeInformation https://www.cnblogs.com/qcloud1001/p/9626462.html
錯誤2: .keyBy("word") 類型不能做key的錯誤
InvalidProgramException: This type (GenericType<com.coupang.ecfds.flink.SocketStreamWordCount.WordCount>) cannot be used as key.
at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)
at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)
at com.coupang.ecfds.flink.SocketStreamWordCount.main(SocketStreamWordCount.java:32)
這應該是Flink代碼的一個錯誤,懶得去改了,直接使用lamda表達式實現 KeySelector函數接口解決。
解決方案
.keyBy((WordCount wc) -> wc.word)
參考資料:KeySelector https://www.jianshu.com/p/3763854d609b
Lamda第二版 Tuple2版
package com.coupang.ecfds.flink; import org.apache.flink.api.common.typeinfo.Types; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; /** * Description: * * @author adore.chen * @date 2019-11-19 */ public class SocketStreamWordCount { public static void main(String[] args) throws Exception { ParameterTool tool = ParameterTool.fromArgs(args); int port = tool.getInt("port"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> dataStream = env.socketTextStream("localhost", port, "\n"); dataStream.flatMap((String value, Collector<Tuple2<String,Integer>> out) -> { for (String word: value.split("\\s")) { if (word.trim().length()>0) { out.collect(new Tuple2<>(word, 1)); } } }) .returns(Types.TUPLE(Types.STRING, Types.INT)) .keyBy(0) .reduce((Tuple2<String,Integer> wc1, Tuple2<String,Integer> wc2) -> new Tuple2<>(wc1.f0, wc1.f1 + wc2.f1)) .print(); env.execute("socket word count"); } }
感覺簡潔了不少。