一、前言
當您想要拆分數據流時,通常的做法是複製流,然後從每個流過濾出您不想擁有的數據,但是如果使用split 或者side output操作可以更好的解決這種問題。
split和select組合使用,對流按照條件進行拆分,取出。但是有一定的侷限性,就是隻能分流一次,不支持二級分流。
side outputs可以支持二級分流。
二、實踐
2.1 split方式
分流一次是可以的,分流兩次就會報:Consecutive multiple splits are not supported. Splits are deprecated. Please use side-outputs.
/**
* java.lang.IllegalStateException:
* Consecutive multiple splits are not supported. Splits are deprecated. Please use side-outputs.
* split不支持多級拆分,請使用side-outputs
*
* @param accessStream
* @return
*/
def splitTest(accessStream: DataStream[SplitAccess]) = {
val splitProvinceStream = accessStream.split(new OutputSelector[SplitAccess] {
override def select(splitAccess: SplitAccess): lang.Iterable[String] = {
val list = new util.ArrayList[String]()
if ("廣東省" == splitAccess.province) list.add("廣東省")
else if ("江蘇省" == splitAccess.province) list.add("蘇州省")
list
}
})
//splitStream.select("廣東省").print()
val splitCityStream = splitProvinceStream.select("廣東省").split(new OutputSelector[SplitAccess] {
override def select(splitAccess: SplitAccess): lang.Iterable[String] = {
val list = new util.ArrayList[String]()
if ("深圳市" == splitAccess.province) list.add("深圳市")
else if ("廣州市" == splitAccess.province) list.add("廣州市")
list
}
})
splitCityStream.select("深圳市").print()
}
2.2 side outputs方式
支持多級分流
/**
* side outputs
*
* @param accessStream
* @return
*/
def sideOutputsTest(accessStream: DataStream[SplitAccess]) = {
//1、按照省份進行劃分
val guangdongTag = new OutputTag[SplitAccess]("guangdong")
val jiangsuTag = new OutputTag[SplitAccess]("jiangsu")
val provinceSplitStream = accessStream.process(new ProcessFunction[SplitAccess, SplitAccess] {
override def processElement(value: SplitAccess, ctx: ProcessFunction[SplitAccess, SplitAccess]#Context,
out: Collector[SplitAccess]): Unit = {
if ("廣東省" == value.province) {
ctx.output(guangdongTag, value)
} else {
ctx.output(jiangsuTag, value)
}
}
})
//provinceSplitStream.getSideOutput(guangdongTag).print("廣東省分流結果:")
//2、按照城市進行劃分
val shenzhenTag = new OutputTag[SplitAccess]("shenzhen")
val guangzhouTag = new OutputTag[SplitAccess]("guangzhou")
val guangdongDataStream = provinceSplitStream.getSideOutput(guangdongTag)
val citySplitStream = guangdongDataStream.process(new ProcessFunction[SplitAccess, SplitAccess] {
override def processElement(value: SplitAccess, ctx: ProcessFunction[SplitAccess, SplitAccess]#Context,
out: Collector[SplitAccess]): Unit = {
if ("深圳市" == value.city) {
ctx.output(shenzhenTag, value)
} else {
ctx.output(guangzhouTag, value)
}
}
})
citySplitStream.getSideOutput(shenzhenTag).print("廣東省深圳市分流結果:")
}
三、總結
- 如果我們只分流一次,使用split或者side outputs方式都可以
- 如果需要多次分流,需要使用side outputs方式