flink深入研究(12) DataStream類中keyBy函數調用流程

這裏再貼一下相關代碼:

// 連接socket獲取輸入的數據
		DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");

		// 計算數據
		DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
			/**
			 * 
			 */
			private static final long serialVersionUID = 1L;

			public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
				String[] splits = value.split(" ");
				for (String word : splits) {
					out.collect(new WordWithCount(word, 1L));
				}
			}
		})// 打平操作,把每行的單詞轉爲<word,count>類型的數據
				.keyBy("word")// 針對相同的word數據進行分組
				.timeWindow(Time.seconds(4), Time.seconds(1))// 指定計算數據的窗口大小和滑動窗口大小
				.sum("count");

上一節分析了flatMap函數的執行過程,這一節我們繼續分析下一個調用函數keyBy,調用這個函數的對象是上一個函數flatMap返回的DataStream類對象,keyBy函數的代碼如下:

/**
	 * Partitions the operator state of a {@link DataStream} using field expressions.
	 * A field expression is either the name of a public field or a getter method with parentheses
	 * of the {@link DataStream}'s underlying type. A dot can be used to drill
	 * down into objects, as in {@code "field1.getInnerField2()" }.
	 *
	 * @param fields
	 *            One or more field expressions on which the state of the {@link DataStream} operators will be
	 *            partitioned.
	 * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
	 **/
	public KeyedStream<T, Tuple> keyBy(String... fields) {
		return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
	}

getType()函數用來獲取上一個DataStream的輸出類型,根據fields和上一個DataStream的輸出類型(TypeInformation類)對象,創建ExpressionKeys類對象,該類的結構體代碼如下:

public ExpressionKeys(String[] keyExpressions, TypeInformation<T> type) {
            Preconditions.checkNotNull(keyExpressions, "Field expression cannot be null.");
            this.keyFields = new ArrayList(keyExpressions.length);
            int i;
            //如果type是複合類型
            if (type instanceof CompositeType) {
                CompositeType<T> cType = (CompositeType)type;
                this.originalKeyTypes = new TypeInformation[keyExpressions.length];

                for(i = 0; i < keyExpressions.length; ++i) {
                    String keyExpr = keyExpressions[i];
                    if (keyExpr == null) {
                        throw new InvalidProgramException("Expression key may not be null.");
                    }

                    keyExpr = keyExpr.trim();
                    List<FlatFieldDescriptor> flatFields = cType.getFlatFields(keyExpr);
                    if (flatFields.size() == 0) {
                        throw new InvalidProgramException("Unable to extract key from expression '" + keyExpr + "' on key " + cType);
                    }

                    Iterator var7 = flatFields.iterator();

                    while(var7.hasNext()) {
                        FlatFieldDescriptor field = (FlatFieldDescriptor)var7.next();
                        if (!field.getType().isKeyType()) {
                            throw new InvalidProgramException("This type (" + field.getType() + ") cannot be used as key.");
                        }
                    }

                    this.keyFields.addAll(flatFields);
                    String strippedKeyExpr = WILD_CARD_REGEX.matcher(keyExpr).replaceAll("");
                    if (strippedKeyExpr.isEmpty()) {
                        this.originalKeyTypes[i] = type;
                    } else {
                        this.originalKeyTypes[i] = cType.getTypeAt(strippedKeyExpr);
                    }
                }
            } else {//如果不是複合類型
                if (!type.isKeyType()) {
                    throw new InvalidProgramException("This type (" + type + ") cannot be used as key.");
                }

                String[] var9 = keyExpressions;
                i = keyExpressions.length;

                for(int var10 = 0; var10 < i; ++var10) {
                    String keyExpr = var9[var10];
                    if (keyExpr == null) {
                        throw new InvalidProgramException("Expression key may not be null.");
                    }

                    keyExpr = keyExpr.trim();
                    if (!"*".equals(keyExpr) && !"_".equals(keyExpr)) {
                        throw new InvalidProgramException("Field expression must be equal to '*' or '_' for non-composite types.");
                    }

                    this.keyFields.add(new FlatFieldDescriptor(0, type));
                }

                this.originalKeyTypes = new TypeInformation[]{type};
            }

        }

將每一個keyBy中的key通過List<FlatFieldDescriptor> flatFields = cType.getFlatFields(key);得到List<FlatFieldDescriptor>類對象,然後將該對象中的元素添加到ExpressionKeys類對象中的private List<FlatFieldDescriptor> keyFields中,上游DataStream類對象中的private TypeInformation<?>[] originalKeyTypes存儲對應的keyFields中的對應key的類型。ExpressionKeys類繼承了Keys類,創建完ExpressionKeys類對象後,開始創建KeyedStream類對象,代碼如下:

private KeyedStream<T, Tuple> keyBy(Keys<T> keys) {
        return new KeyedStream(this, (KeySelector)this.clean(KeySelectorUtil.getSelectorForKeys(keys, this.getType(), this.getExecutionConfig())));
    }

KeyedStream類對象的創建如下代碼:

public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector) {
        this(dataStream, keySelector, TypeExtractor.getKeySelectorTypes(keySelector, dataStream.getType()));
    }

    public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
        this(dataStream, new PartitionTransformation(dataStream.getTransformation(), new KeyGroupStreamPartitioner(keySelector, 128)), keySelector, keyType);
    }

    @Internal
    KeyedStream(DataStream<T> stream, PartitionTransformation<T> partitionTransformation, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
        //將環境變量和Transformation類子對象保存到KeyedStream類對象中
        super(stream.getExecutionEnvironment(), partitionTransformation);
        this.keySelector = (KeySelector)this.clean(keySelector);
        this.keyType = this.validateKeyType(keyType);
    }

 

flink中數據類型和序列化機制參考https://cloud.tencent.com/developer/article/1573059這篇文章

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章