Flink RichFunction題目一則

原創

LittleMagic

2023-03-19 20:32

前言

祝廣大女性節日快樂~

快問快答

Flink DataStream API中的RichFunction有哪些用途/特點？

RichFunction中獲取到的RuntimeContext是幹什麼用的？

所有Function都有對應的RichFunction實現嗎？

所有Flink流處理的算子都可以傳入RichFunction嗎？

前兩個問題實際上可以合併成一個問題。RichFunction的特點是比Function多出了生命週期管理（open()和close()方法），以及能夠獲取其運行時上下文RuntimeContext。RuntimeContext與Function的每個並行實例（即一個Sub-task）相關聯，通過它還能進一步得到如下信息：

運行時靜態信息，如Task的名稱、並行度、最大並行度、當前Sub-task的編號、當前類加載器等；
全局數據結構，即累加器（Accumulators）、廣播變量（Broadcast variables）和分佈式緩存（Distributed cache）；
創建各種狀態句柄，即我們熟知的get***State(StateDescriptor)方法。

第三個問題，yes；第四個問題，no。

RichFunction不適用的場景

簡單的開窗聚合場景：

dataStream.keyBy(x -> x.getKey())
  .window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
  .reduce(new MyRichReduceFunction<>())

這段代碼能編譯通過，但執行時會拋出UnsupportedOperationException，提示ReduceFunction of reduce can not be a RichFunction。如果換成aggregate()方法和RichAggregateFunction會有同樣的問題，提示This aggregation function cannot be a RichFunction。在WindowedStream的對應實現中，可以看到此路不通：

    public SingleOutputStreamOperator<T> reduce(ReduceFunction<T> function) {
        if (function instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "ReduceFunction of reduce can not be a RichFunction. "
                            + "Please use reduce(ReduceFunction, WindowFunction) instead.");
        }

        // clean the closure
        function = input.getExecutionEnvironment().clean(function);
        return reduce(function, new PassThroughWindowFunction<>());
    }

    public <ACC, R> SingleOutputStreamOperator<R> aggregate(AggregateFunction<T, ACC, R> function) {
        checkNotNull(function, "function");

        if (function instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "This aggregation function cannot be a RichFunction.");
        }

        TypeInformation<ACC> accumulatorType =
                TypeExtractor.getAggregateFunctionAccumulatorType(
                        function, input.getType(), null, false);

        TypeInformation<R> resultType =
                TypeExtractor.getAggregateFunctionReturnType(
                        function, input.getType(), null, false);

        return aggregate(function, accumulatorType, resultType);
    }

爲什麼不能用Rich[Reduce / Aggregate]Function？

答案並不難：與FlatMap、Filter等算子不同，Reduce和Aggregate本身就是自帶確定的狀態語義的算子，不需要用戶手動操作狀態（如果用戶能干預的話大概率會出問題），也不需要生命期管理的特性（它們的生命期總是始於第一條數據，終於最後一條數據）。

以Reduce邏輯爲例（Aggregate同理），不妨進一步看下對應的窗口算子是如何構造的。

    public <R> WindowOperator<K, T, ?, R, W> reduce(
            ReduceFunction<T> reduceFunction, WindowFunction<T, R, K, W> function) {
        Preconditions.checkNotNull(reduceFunction, "ReduceFunction cannot be null");
        Preconditions.checkNotNull(function, "WindowFunction cannot be null");

        if (reduceFunction instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "ReduceFunction of apply can not be a RichFunction.");
        }

        if (evictor != null) {
            return buildEvictingWindowOperator(
                    new InternalIterableWindowFunction<>(
                            new ReduceApplyWindowFunction<>(reduceFunction, function)));
        } else {
            ReducingStateDescriptor<T> stateDesc =
                    new ReducingStateDescriptor<>(
                            WINDOW_STATE_NAME, reduceFunction, inputType.createSerializer(config));

            return buildWindowOperator(
                    stateDesc, new InternalSingleValueWindowFunction<>(function));
        }
    }

注意到這裏創建了ReducingStateDescriptor（ReduceFunction恰好是它的一個入參），並最終獲取了內置的ReducingState句柄。其實就DataStream API用戶的日常編程習慣而言，很少會主動用到ReducingState（以及AggregateState）。即使這樣，在它們的描述符構造方法中，也加了同樣的強制校驗，防止傳入RichFunction，以保護狀態的確定性。

    public ReducingStateDescriptor(
            String name, ReduceFunction<T> reduceFunction, Class<T> typeClass) {
        super(name, typeClass, null);
        this.reduceFunction = checkNotNull(reduceFunction);

        if (reduceFunction instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "ReduceFunction of ReducingState can not be a RichFunction.");
        }
    }

話說回來，Rich[Reduce / Aggregate]Function在Flink工程內部以及示例中都沒有有效的使用過，所以我們大概可以判定這是Flink發展過程中的遺產吧（笑

The End

晚安晚安。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Flink RichFunction題目一則

前言

快問快答

RichFunction不適用的場景

爲什麼不能用Rich[Reduce / Aggregate]Function？

The End

裁員了！別錯過2024年大數據工程師必備的10項技能

如何熟悉一個陌生系統

更換容器內的源

【安裝部署】Apache SeaTunnel 和 Web快速安裝詳解

一個.NET開源的功能豐富、靈活易用的 Windows 窗口增強神器

揭祕智能寫手GPT的測試報告生成技巧

低代碼集成Java系列：高效構建自定義插件

淺談軟件工程中的Shim

Flink RichFunction題目一則

「Daylight -デイライト-」（日光）

2022。

淺談Flink批模式Adaptive Hash Join

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結