一.多流轉換算子概述
多流轉換算子一般包括:
Split和Select (新版已經移除)
Connect和CoMap
Union
1.1 Split和Select
注:新版Flink已經不存在Split和Select這兩個API了(至少Flink1.12.1沒有!)
Split
DataStream -> SplitStream:根據某些特徵把DataStream拆分成SplitStream;
SplitStream雖然看起來像是兩個Stream,但是其實它是一個特殊的Stream;
Select
SplitStream -> DataStream:從一個SplitStream中獲取一個或者多個DataStream;
我們可以結合split&select將一個DataStream拆分成多個DataStream。
1.2 Connect和CoMap
Connect
DataStream,DataStream -> ConnectedStreams: 連接兩個保持他們類型的數據流,兩個數據流被Connect 之後,只是被放在了一個流中,內部依然保持各自的數據和形式不發生任何變化,兩個流相互獨立。
CoMap
ConnectedStreams -> DataStream: 作用於ConnectedStreams 上,功能與map和flatMap一樣,對ConnectedStreams 中的每一個Stream分別進行map和flatMap操作;
1.3 Union
DataStream -> DataStream:對兩個或者兩個以上的DataStream進行Union操作,產生一個包含多有DataStream元素的新DataStream。
問題:和Connect的區別?
- Connect 的數據類型可以不同,Connect 只能合併兩個流;
- Union可以合併多條流,Union的數據結構必須是一樣的;
二.代碼實現
數據準備:
sensor.txt
sensor_1,1547718199,35.8
sensor_6,1547718201,15.4
sensor_7,1547718202,6.7
sensor_10,1547718205,38.1
sensor_1,1547718207,36.3
sensor_1,1547718209,32.8
sensor_1,1547718212,37.1
代碼:
package org.flink.transform;
/**
* @author 只是甲
* @date 2021-08-31
* @remark Flink 基礎Transform MultipleStreams
*/
import org.flink.beans.SensorReading;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.collector.selector.OutputSelector;
import org.apache.flink.streaming.api.datastream.ConnectedStreams;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.SplitStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.CoMapFunction;
import java.util.Collections;
public class TransformTest4_MultipleStreams {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
// 從文件讀取數據
DataStream<String> inputStream = env.readTextFile("C:\\Users\\Administrator\\IdeaProjects\\FlinkStudy\\src\\main\\resources\\sensor.txt");
// 轉換成SensorReading
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
} );
// 1. 分流,按照溫度值30度爲界分爲兩條流
SplitStream<SensorReading> splitStream = dataStream.split(new OutputSelector<SensorReading>() {
@Override
public Iterable<String> select(SensorReading value) {
return (value.getTemperature() > 30) ? Collections.singletonList("high") : Collections.singletonList("low");
}
});
DataStream<SensorReading> highTempStream = splitStream.select("high");
DataStream<SensorReading> lowTempStream = splitStream.select("low");
DataStream<SensorReading> allTempStream = splitStream.select("high", "low");
highTempStream.print("high");
lowTempStream.print("low");
allTempStream.print("all");
// 2. 合流 connect,將高溫流轉換成二元組類型,與低溫流連接合並之後,輸出狀態信息
DataStream<Tuple2<String, Double>> warningStream = highTempStream.map(new MapFunction<SensorReading, Tuple2<String, Double>>() {
@Override
public Tuple2<String, Double> map(SensorReading value) throws Exception {
return new Tuple2<>(value.getId(), value.getTemperature());
}
});
ConnectedStreams<Tuple2<String, Double>, SensorReading> connectedStreams = warningStream.connect(lowTempStream);
DataStream<Object> resultStream = connectedStreams.map(new CoMapFunction<Tuple2<String, Double>, SensorReading, Object>() {
@Override
public Object map1(Tuple2<String, Double> value) throws Exception {
return new Tuple3<>(value.f0, value.f1, "high temp warning");
}
@Override
public Object map2(SensorReading value) throws Exception {
return new Tuple2<>(value.getId(), "normal");
}
});
resultStream.print();
// 3. union聯合多條流
// warningStream.union(lowTempStream);
highTempStream.union(lowTempStream, allTempStream);
env.execute();
}
}
測試記錄: