Flink reduce 作用 實例

reduce作用:把2個類型相同的值合併成1個,對組內的所有值連續使用reduce,直到留下最後一個值!

package reduce;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

/**
 * @Author you guess
 * @Date 2020/6/17 20:52
 * @Version 1.0
 * @Desc
 */
public class DataStreamReduceTest {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<Tuple3<String, String, Integer>> src1 = env.addSource(new SourceFunction<Tuple3<String, String, Integer>>() {
            @Override
            public void run(SourceContext<Tuple3<String, String, Integer>> ctx) throws Exception {
                ctx.collect(Tuple3.of("Lisi", "Math", 1));
                ctx.collect(Tuple3.of("Lisi", "English", 2));
                ctx.collect(Tuple3.of("Lisi", "Chinese", 3));

                ctx.collect(Tuple3.of("Zhangsan", "Math", 4));
                ctx.collect(Tuple3.of("Zhangsan", "English", 5));
                ctx.collect(Tuple3.of("Zhangsan", "Chinese", 6));
            }

            @Override
            public void cancel() {

            }
        }, "source1");

//        src1.print();
//        7> (Zhangsan,Chinese,6)
//        4> (Lisi,Chinese,3)
//        2> (Lisi,Math,1)
//        5> (Zhangsan,Math,4)
//        3> (Lisi,English,2)
//        6> (Zhangsan,English,5)


        /**
         * 代碼段2
         */
//        src1.keyBy(0).reduce(new ReduceFunction<Tuple3<String, String, Integer>>() {
//            @Override
//            public Tuple3<String, String, Integer> reduce(Tuple3<String, String, Integer> value1, Tuple3<String, String, Integer> value2) throws Exception {
//                return Tuple3.of(value1.f0, "總分:", value1.f2 + value2.f2);
//            }
//        }).print();
//        1> (Lisi,Math,1)
//        11> (Zhangsan,Math,4)
//        1> (Lisi,總分:,3)
//        11> (Zhangsan,總分:,9)
//        1> (Lisi,總分:,6)
//        11> (Zhangsan,總分:,15)


        /**
         * 代碼段3,與代碼段2 同義
         */
        src1.keyBy(0).reduce((value1, value2) -> Tuple3.of(value1.f0, "總分:", value1.f2 + value2.f2)).print();
//        1> (Lisi,Math,1)
//        11> (Zhangsan,Math,4)
//        1> (Lisi,總分:,3)
//        11> (Zhangsan,總分:,9)
//        1> (Lisi,總分:,6)
//        11> (Zhangsan,總分:,15)

        env.execute("Flink DataStreamReduceTest by Java");
    }


}

前面幾個aggregation是幾個較爲特殊的操作,對分組數據進行處理更爲通用的方法是使用reduce算子。

上圖展示了reduce算子的原理:reduce在按照同一個Key分組的數據流上生效,它接受兩個輸入,生成一個輸出,即兩兩合一地進行彙總操作,生成一個同類型的新元素。

https://mp.weixin.qq.com/s/2vcKteQIyj31sVrSg1R_2Q

DataStreamSource沒有aggregate(min minby max maxby sum等)、reduce操作;

KeyedStream、AllWindowedStream、DataSet有aggregate(min minby max maxby sum等)、reduce操作;

Flink ,Min MinBy Max MaxBy sum實例

flink 1.9.2,java1.8 

源碼:注意看註釋:


/**
 * Base interface for Reduce functions. Reduce functions combine groups of elements to
 * a single value, by taking always two elements and combining them into one. Reduce functions
 * may be used on entire data sets, or on grouped data sets. In the latter case, each group is reduced
 * individually.
 *
 * <p>For a reduce functions that work on an entire group at the same time (such as the
 * MapReduce/Hadoop-style reduce), see {@link GroupReduceFunction}. In the general case,
 * ReduceFunctions are considered faster, because they allow the system to use more efficient
 * execution strategies.
 *
 * <p>The basic syntax for using a grouped ReduceFunction is as follows:
 * <pre>{@code
 * DataSet<X> input = ...;
 *
 * DataSet<X> result = input.groupBy(<key-definition>).reduce(new MyReduceFunction());
 * }</pre>
 *
 * <p>Like all functions, the ReduceFunction needs to be serializable, as defined in {@link java.io.Serializable}.
 *
 * @param <T> Type of the elements that this function processes.
 */
@Public
@FunctionalInterface
public interface ReduceFunction<T> extends Function, Serializable {

	/**
	 * The core method of ReduceFunction, combining two values into one value of the same type.
	 * The reduce function is consecutively applied to all values of a group until only a single value remains.
	 *
	 * @param value1 The first value to combine.
	 * @param value2 The second value to combine.
	 * @return The combined value of both input values.
	 *
	 * @throws Exception This method may throw exceptions. Throwing an exception will cause the operation
	 *                   to fail and may trigger recovery.
	 */
	T reduce(T value1, T value2) throws Exception;
}

DataSet下:

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章