做什麼?
收到kafka的數據,實時統計各省各城市各廣告的廣告點擊量
需求分析
kafka收到的數據仍然是需求六中的數據,思路也是相同的,即將數據變成(key,1L),再去改變總的數量
不同的地方是:
-
key現在變爲 (date_province _city_adid)
-
數量的統計不能再用reduceByKey,而是改爲sparkStreaming中的updateStateByKey,其作用是對當前批次的數據和以往的數據進行一個累加更新操作,從而避免一直查詢數據庫,(相似的還用window滑動操作)
步驟解析:
- 數據的獲取和上個需求一樣
- 更新數據爲(key,1L)
val key2ProvinceCityDStream = adRealTimeFilterDStream.map{
case log =>
val logSplit = log.split(" ")
val timeStamp = logSplit(0).toLong
// dateKey : yy-mm-dd
val dateKey = DateUtils.formatDateKey(new Date(timeStamp))
val province = logSplit(1)
val city = logSplit(2)
val adid = logSplit(4)
val key = dateKey + "_" + province + "_" + city + "_" + adid
(key, 1L)
}
3.利用sparkStreaming的updateStateByKey進行對以往數據的累加
//使用updateStateByKey算子,維護數據的更新
val key2StateDStream = key2ProvinceCityDStream.updateStateByKey[Long]{
(values:Seq[Long], state:Option[Long])=>{
var newValues=state.getOrElse(0L);
for(v<-values)newValues+=v;
Some(newValues);
}
- 更新數據庫
key2StateDStream.foreachRDD{
rdd => rdd.foreachPartition{
items =>
val adStatArray = new ArrayBuffer[AdStat]()
// key: date province city adid
for((key, count) <- items){
val keySplit = key.split("_")
val date = keySplit(0)
val province = keySplit(1)
val city = keySplit(2)
val adid = keySplit(3).toLong
adStatArray += AdStat(date, province, city, adid, count)
}
AdStatDAO.updateBatch(adStatArray.toArray)
adStatArray.foreach(println);
}
}
完整代碼
def provinceCityClickStat(adRealTimeFilterDStream: DStream[String])={
val key2ProvinceCityDStream = adRealTimeFilterDStream.map{
case log =>
val logSplit = log.split(" ")
val timeStamp = logSplit(0).toLong
// dateKey : yy-mm-dd
val dateKey = DateUtils.formatDateKey(new Date(timeStamp))
val province = logSplit(1)
val city = logSplit(2)
val adid = logSplit(4)
val key = dateKey + "_" + province + "_" + city + "_" + adid
(key, 1L)
}
//使用updateStateByKey算子,維護數據的更新
val key2StateDStream = key2ProvinceCityDStream.updateStateByKey[Long]{
(values:Seq[Long], state:Option[Long])=>{
var newValues=state.getOrElse(0L);
for(v<-values)newValues+=v;
Some(newValues);
}
}
key2StateDStream.foreachRDD{
rdd => rdd.foreachPartition{
items =>
val adStatArray = new ArrayBuffer[AdStat]()
// key: date province city adid
for((key, count) <- items){
val keySplit = key.split("_")
val date = keySplit(0)
val province = keySplit(1)
val city = keySplit(2)
val adid = keySplit(3).toLong
adStatArray += AdStat(date, province, city, adid, count)
}
// AdStatDAO.updateBatch(adStatArray.toArray)
adStatArray.foreach(println);
}
}
}