Flink的API總結:
0.首先創建執行環境(可以設置並行度)
StreamExecutionEnvironment executionEnvironment =
StreamExecutionEnvironment.getExecutionEnvironment();
executionEnvironment.setParallelism(1); //並行度設置
1.1 從集合中讀取
1.2 從文件中讀取
1.3 從socket端口中讀取數據:
1.4 從Kafka中讀取數據:
先引入maven依賴
1.5 自定義source,可以用來隨機生成自測數據(需要的時候看https://www.bilibili.com/video/BV1MK411W7o4?p=16)
val stream5 = env.addSource(new MySensorSource())
2.1 可以使用slotSharingGroup進行單獨共享組的配置,因爲有些操作的計算量很大,沒必要進行共享
使用 disableOperatorchaining()進行直接打散
==================================
3.1 Map進行數據清洗工作(這兒主要將原來的元素由單元組變爲二元組用於計數)
SingleOutputStreamOperator<Tuple2<String, Integer>> result = str.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return new Tuple2<>(s, 1);
}
});
3.2 flatMap進行打散(這兒的邏輯是按空格進行打散)
DataStream<String> result = str.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> collector) throws Exception {
String[] str1 = value.split(" ");
for (String str : str1) {
collector.collect(str);
}
}
});
3.3 Filter (在這裏面主要實現去空的操作)
DataStream<String> result = str.filter(new FilterFunction<String>() {
@Override
public boolean filter(String value) throws Exception {
return !value.trim().equals("");
}
});
3.4 分組聚合(這兒按照字符串hashcode進行分組,然後根據將2號位的數字用聚合函數進行聚合)
SingleOutputStreamOperator<Tuple2<String, Integer>> result = result2.keyBy(0).sum(1);
3.5 reduce進行聚合
SingleOutputStreamOperator<Tuple2<String, Integer>> result = result3.reduce(new ReduceFunction<Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception {
return new Tuple2<String, Integer>(value1.f0,value1.f1 + value2.f1);
}
});