這裏再貼一下相關代碼:
// 連接socket獲取輸入的數據
DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");
// 計算數據
DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
/**
*
*/
private static final long serialVersionUID = 1L;
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split(" ");
for (String word : splits) {
out.collect(new WordWithCount(word, 1L));
}
}
})// 打平操作,把每行的單詞轉爲<word,count>類型的數據
.keyBy("word")// 針對相同的word數據進行分組
.timeWindow(Time.seconds(4), Time.seconds(1))// 指定計算數據的窗口大小和滑動窗口大小
.sum("count");
上一節分析了flatMap函數的執行過程,這一節我們繼續分析下一個調用函數keyBy,調用這個函數的對象是上一個函數flatMap返回的DataStream類對象,keyBy函數的代碼如下:
/**
* Partitions the operator state of a {@link DataStream} using field expressions.
* A field expression is either the name of a public field or a getter method with parentheses
* of the {@link DataStream}'s underlying type. A dot can be used to drill
* down into objects, as in {@code "field1.getInnerField2()" }.
*
* @param fields
* One or more field expressions on which the state of the {@link DataStream} operators will be
* partitioned.
* @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
**/
public KeyedStream<T, Tuple> keyBy(String... fields) {
return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
}
getType()函數用來獲取上一個DataStream的輸出類型,根據fields和上一個DataStream的輸出類型(TypeInformation類)對象,創建ExpressionKeys類對象,該類的結構體代碼如下:
public ExpressionKeys(String[] keyExpressions, TypeInformation<T> type) {
Preconditions.checkNotNull(keyExpressions, "Field expression cannot be null.");
this.keyFields = new ArrayList(keyExpressions.length);
int i;
//如果type是複合類型
if (type instanceof CompositeType) {
CompositeType<T> cType = (CompositeType)type;
this.originalKeyTypes = new TypeInformation[keyExpressions.length];
for(i = 0; i < keyExpressions.length; ++i) {
String keyExpr = keyExpressions[i];
if (keyExpr == null) {
throw new InvalidProgramException("Expression key may not be null.");
}
keyExpr = keyExpr.trim();
List<FlatFieldDescriptor> flatFields = cType.getFlatFields(keyExpr);
if (flatFields.size() == 0) {
throw new InvalidProgramException("Unable to extract key from expression '" + keyExpr + "' on key " + cType);
}
Iterator var7 = flatFields.iterator();
while(var7.hasNext()) {
FlatFieldDescriptor field = (FlatFieldDescriptor)var7.next();
if (!field.getType().isKeyType()) {
throw new InvalidProgramException("This type (" + field.getType() + ") cannot be used as key.");
}
}
this.keyFields.addAll(flatFields);
String strippedKeyExpr = WILD_CARD_REGEX.matcher(keyExpr).replaceAll("");
if (strippedKeyExpr.isEmpty()) {
this.originalKeyTypes[i] = type;
} else {
this.originalKeyTypes[i] = cType.getTypeAt(strippedKeyExpr);
}
}
} else {//如果不是複合類型
if (!type.isKeyType()) {
throw new InvalidProgramException("This type (" + type + ") cannot be used as key.");
}
String[] var9 = keyExpressions;
i = keyExpressions.length;
for(int var10 = 0; var10 < i; ++var10) {
String keyExpr = var9[var10];
if (keyExpr == null) {
throw new InvalidProgramException("Expression key may not be null.");
}
keyExpr = keyExpr.trim();
if (!"*".equals(keyExpr) && !"_".equals(keyExpr)) {
throw new InvalidProgramException("Field expression must be equal to '*' or '_' for non-composite types.");
}
this.keyFields.add(new FlatFieldDescriptor(0, type));
}
this.originalKeyTypes = new TypeInformation[]{type};
}
}
將每一個keyBy中的key通過List<FlatFieldDescriptor> flatFields = cType.getFlatFields(key);得到List<FlatFieldDescriptor>類對象,然後將該對象中的元素添加到ExpressionKeys類對象中的private List<FlatFieldDescriptor> keyFields中,上游DataStream類對象中的private TypeInformation<?>[] originalKeyTypes存儲對應的keyFields中的對應key的類型。ExpressionKeys類繼承了Keys類,創建完ExpressionKeys類對象後,開始創建KeyedStream類對象,代碼如下:
private KeyedStream<T, Tuple> keyBy(Keys<T> keys) {
return new KeyedStream(this, (KeySelector)this.clean(KeySelectorUtil.getSelectorForKeys(keys, this.getType(), this.getExecutionConfig())));
}
KeyedStream類對象的創建如下代碼:
public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector) {
this(dataStream, keySelector, TypeExtractor.getKeySelectorTypes(keySelector, dataStream.getType()));
}
public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
this(dataStream, new PartitionTransformation(dataStream.getTransformation(), new KeyGroupStreamPartitioner(keySelector, 128)), keySelector, keyType);
}
@Internal
KeyedStream(DataStream<T> stream, PartitionTransformation<T> partitionTransformation, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
//將環境變量和Transformation類子對象保存到KeyedStream類對象中
super(stream.getExecutionEnvironment(), partitionTransformation);
this.keySelector = (KeySelector)this.clean(keySelector);
this.keyType = this.validateKeyType(keyType);
}
flink中數據類型和序列化機制參考https://cloud.tencent.com/developer/article/1573059這篇文章