電商平臺分析平臺----需求七:實時統計之各省各城市廣告點擊量實時統計

做什麼?

收到kafka的數據,實時統計各省各城市各廣告的廣告點擊量

需求分析

kafka收到的數據仍然是需求六中的數據,思路也是相同的,即將數據變成(key,1L),再去改變總的數量
不同的地方是:

  • key現在變爲 (date_province _city_adid)

  • 數量的統計不能再用reduceByKey,而是改爲sparkStreaming中的updateStateByKey,其作用是對當前批次的數據和以往的數據進行一個累加更新操作,從而避免一直查詢數據庫,(相似的還用window滑動操作)

  • 關於updateStateByKey和window滑動操作,請看這篇文章

步驟解析:

  1. 數據的獲取和上個需求一樣
  2. 更新數據爲(key,1L)
    val key2ProvinceCityDStream = adRealTimeFilterDStream.map{
      case log =>
        val logSplit = log.split(" ")
        val timeStamp = logSplit(0).toLong
        // dateKey : yy-mm-dd
        val dateKey = DateUtils.formatDateKey(new Date(timeStamp))
        val province = logSplit(1)
        val city = logSplit(2)
        val adid = logSplit(4)

        val key = dateKey + "_" + province + "_" + city + "_" + adid
        (key, 1L)
    }

3.利用sparkStreaming的updateStateByKey進行對以往數據的累加

    //使用updateStateByKey算子,維護數據的更新
    val key2StateDStream = key2ProvinceCityDStream.updateStateByKey[Long]{
      (values:Seq[Long], state:Option[Long])=>{
         var newValues=state.getOrElse(0L);
         for(v<-values)newValues+=v;
         Some(newValues);
      }
  1. 更新數據庫
 key2StateDStream.foreachRDD{
      rdd => rdd.foreachPartition{
        items =>
          val adStatArray = new ArrayBuffer[AdStat]()
          // key: date province city adid
          for((key, count) <- items){
            val keySplit = key.split("_")
            val date = keySplit(0)
            val province = keySplit(1)
            val city = keySplit(2)
            val adid = keySplit(3).toLong

            adStatArray += AdStat(date, province, city, adid, count)
          }
          AdStatDAO.updateBatch(adStatArray.toArray)
          adStatArray.foreach(println);
      }
    }

完整代碼

  def provinceCityClickStat(adRealTimeFilterDStream: DStream[String])={
    val key2ProvinceCityDStream = adRealTimeFilterDStream.map{
      case log =>
        val logSplit = log.split(" ")
        val timeStamp = logSplit(0).toLong
        // dateKey : yy-mm-dd
        val dateKey = DateUtils.formatDateKey(new Date(timeStamp))
        val province = logSplit(1)
        val city = logSplit(2)
        val adid = logSplit(4)

        val key = dateKey + "_" + province + "_" + city + "_" + adid
        (key, 1L)
    }

    //使用updateStateByKey算子,維護數據的更新
    val key2StateDStream = key2ProvinceCityDStream.updateStateByKey[Long]{
      (values:Seq[Long], state:Option[Long])=>{
         var newValues=state.getOrElse(0L);
         for(v<-values)newValues+=v;
         Some(newValues);
      }
    }
    key2StateDStream.foreachRDD{
      rdd => rdd.foreachPartition{
        items =>
          val adStatArray = new ArrayBuffer[AdStat]()
          // key: date province city adid
          for((key, count) <- items){
            val keySplit = key.split("_")
            val date = keySplit(0)
            val province = keySplit(1)
            val city = keySplit(2)
            val adid = keySplit(3).toLong

            adStatArray += AdStat(date, province, city, adid, count)
          }
         // AdStatDAO.updateBatch(adStatArray.toArray)
          adStatArray.foreach(println);
      }
    }
  }
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章