flink深入研究(12) DataStream类中keyBy函数调用流程

这里再贴一下相关代码:

// 连接socket获取输入的数据
		DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");

		// 计算数据
		DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
			/**
			 * 
			 */
			private static final long serialVersionUID = 1L;

			public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
				String[] splits = value.split(" ");
				for (String word : splits) {
					out.collect(new WordWithCount(word, 1L));
				}
			}
		})// 打平操作,把每行的单词转为<word,count>类型的数据
				.keyBy("word")// 针对相同的word数据进行分组
				.timeWindow(Time.seconds(4), Time.seconds(1))// 指定计算数据的窗口大小和滑动窗口大小
				.sum("count");

上一节分析了flatMap函数的执行过程,这一节我们继续分析下一个调用函数keyBy,调用这个函数的对象是上一个函数flatMap返回的DataStream类对象,keyBy函数的代码如下:

/**
	 * Partitions the operator state of a {@link DataStream} using field expressions.
	 * A field expression is either the name of a public field or a getter method with parentheses
	 * of the {@link DataStream}'s underlying type. A dot can be used to drill
	 * down into objects, as in {@code "field1.getInnerField2()" }.
	 *
	 * @param fields
	 *            One or more field expressions on which the state of the {@link DataStream} operators will be
	 *            partitioned.
	 * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
	 **/
	public KeyedStream<T, Tuple> keyBy(String... fields) {
		return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
	}

getType()函数用来获取上一个DataStream的输出类型,根据fields和上一个DataStream的输出类型(TypeInformation类)对象,创建ExpressionKeys类对象,该类的结构体代码如下:

public ExpressionKeys(String[] keyExpressions, TypeInformation<T> type) {
            Preconditions.checkNotNull(keyExpressions, "Field expression cannot be null.");
            this.keyFields = new ArrayList(keyExpressions.length);
            int i;
            //如果type是复合类型
            if (type instanceof CompositeType) {
                CompositeType<T> cType = (CompositeType)type;
                this.originalKeyTypes = new TypeInformation[keyExpressions.length];

                for(i = 0; i < keyExpressions.length; ++i) {
                    String keyExpr = keyExpressions[i];
                    if (keyExpr == null) {
                        throw new InvalidProgramException("Expression key may not be null.");
                    }

                    keyExpr = keyExpr.trim();
                    List<FlatFieldDescriptor> flatFields = cType.getFlatFields(keyExpr);
                    if (flatFields.size() == 0) {
                        throw new InvalidProgramException("Unable to extract key from expression '" + keyExpr + "' on key " + cType);
                    }

                    Iterator var7 = flatFields.iterator();

                    while(var7.hasNext()) {
                        FlatFieldDescriptor field = (FlatFieldDescriptor)var7.next();
                        if (!field.getType().isKeyType()) {
                            throw new InvalidProgramException("This type (" + field.getType() + ") cannot be used as key.");
                        }
                    }

                    this.keyFields.addAll(flatFields);
                    String strippedKeyExpr = WILD_CARD_REGEX.matcher(keyExpr).replaceAll("");
                    if (strippedKeyExpr.isEmpty()) {
                        this.originalKeyTypes[i] = type;
                    } else {
                        this.originalKeyTypes[i] = cType.getTypeAt(strippedKeyExpr);
                    }
                }
            } else {//如果不是复合类型
                if (!type.isKeyType()) {
                    throw new InvalidProgramException("This type (" + type + ") cannot be used as key.");
                }

                String[] var9 = keyExpressions;
                i = keyExpressions.length;

                for(int var10 = 0; var10 < i; ++var10) {
                    String keyExpr = var9[var10];
                    if (keyExpr == null) {
                        throw new InvalidProgramException("Expression key may not be null.");
                    }

                    keyExpr = keyExpr.trim();
                    if (!"*".equals(keyExpr) && !"_".equals(keyExpr)) {
                        throw new InvalidProgramException("Field expression must be equal to '*' or '_' for non-composite types.");
                    }

                    this.keyFields.add(new FlatFieldDescriptor(0, type));
                }

                this.originalKeyTypes = new TypeInformation[]{type};
            }

        }

将每一个keyBy中的key通过List<FlatFieldDescriptor> flatFields = cType.getFlatFields(key);得到List<FlatFieldDescriptor>类对象,然后将该对象中的元素添加到ExpressionKeys类对象中的private List<FlatFieldDescriptor> keyFields中,上游DataStream类对象中的private TypeInformation<?>[] originalKeyTypes存储对应的keyFields中的对应key的类型。ExpressionKeys类继承了Keys类,创建完ExpressionKeys类对象后,开始创建KeyedStream类对象,代码如下:

private KeyedStream<T, Tuple> keyBy(Keys<T> keys) {
        return new KeyedStream(this, (KeySelector)this.clean(KeySelectorUtil.getSelectorForKeys(keys, this.getType(), this.getExecutionConfig())));
    }

KeyedStream类对象的创建如下代码:

public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector) {
        this(dataStream, keySelector, TypeExtractor.getKeySelectorTypes(keySelector, dataStream.getType()));
    }

    public KeyedStream(DataStream<T> dataStream, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
        this(dataStream, new PartitionTransformation(dataStream.getTransformation(), new KeyGroupStreamPartitioner(keySelector, 128)), keySelector, keyType);
    }

    @Internal
    KeyedStream(DataStream<T> stream, PartitionTransformation<T> partitionTransformation, KeySelector<T, KEY> keySelector, TypeInformation<KEY> keyType) {
        //将环境变量和Transformation类子对象保存到KeyedStream类对象中
        super(stream.getExecutionEnvironment(), partitionTransformation);
        this.keySelector = (KeySelector)this.clean(keySelector);
        this.keyType = this.validateKeyType(keyType);
    }

 

flink中数据类型和序列化机制参考https://cloud.tencent.com/developer/article/1573059这篇文章

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章