Flink的Accumulator即累加器,與Saprk Accumulator 的應用場景差不多,都能很好地觀察task在運行期間的數據變化
可以在Flink job任務中的算子函數中操作累加器,但是只能在任務執行結束之後才能獲得累加器的最終結果。spark的累加器用法.
Flink中累加器的用法非常的簡單:
1:創建累加器: val acc = new IntCounter();
2:註冊累加器: getRuntimeContext().addAccumulator("accumulator", acc );
3:使用累加器: this.acc.add(1);
4:獲取累加器的結果: myJobExecutionResult.getAccumulatorResult("accumulator")
下面看一個完整的demo:
package flink
import org.apache.flink.api.common.accumulators.IntCounter
import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.api.scala._
import org.apache.flink.configuration.Configuration
/**
* Flink的累加器使用
*/
object flinkBatch {
def main(args: Array[String]): Unit = {
val env = ExecutionEnvironment.getExecutionEnvironment
val text = env.fromElements("Hello Jason What are you doing Hello world")
val counts = text
.flatMap(_.toLowerCase.split(" "))
.map(new RichMapFunction[String, String] {
//創建累加器
val acc = new IntCounter()
override def open(parameters: Configuration): Unit = {
super.open(parameters)
//註冊累加器
getRuntimeContext.addAccumulator("accumulator", acc)
}
override def map(in: String): String = {
//使用累加器
this.acc.add(1)
in
}
}).map((_,1))
.groupBy(0)
.sum(1)
counts.writeAsText("d:/test.txt/").setParallelism(1)
val res = env.execute("Accumulator Test")
//獲取累加器的結果
val num = res.getAccumulatorResult[Int]("accumulator")
println(num)
}
}
提交到集羣后可以在ui上面看到我們註冊的累加器的信息,如下圖所示:
如果有寫的不對的地方,歡迎大家指正,如果有什麼疑問,可以加QQ羣:340297350,謝謝