org.apache.spark.SparkException: Task not serializable

在spark shell中運行下述代碼:

    val max_array = max_read_fav_share_vote.collect
    val max_read = max_array(0)(0).toString.toDouble
    val max_fav = max_array(0)(1).toString.toDouble
    val max_share = max_array(0)(2).toString.toDouble
    val max_vote = max_array(0)(3).toString.toDouble
    
    val id_hot = serviceid_read_fav_share_vote.map(x=>
    {
      val id = x.getString(0)
      val read = x.getLong(1).toDouble
      val fav = x.getLong(2).toDouble
      val share = x.getLong(3).toDouble
      val vote = x.getLong(4).toDouble

      val hot = 0.1 * (read/ max_read) + 0.2 * (fav/ max_fav) +0.3 * (share/ max_share) +0.4 * (vote/ max_vote)
      (id,hot)
     }).toDF("id","hot")

出現錯誤:
在這裏插入圖片描述
這是因爲在map、filter中使用了外部的變量,而spark中任務的執行是需要將對象分佈式傳送到各個節點上去的。因爲數據就分佈式存儲在各個節點上,因此傳送之前需要將對象序列化,但是有些變量不能序列化。

解決方法是:
對於不能序列化的變量,就不進行傳送,讓其在各個節點上使用即可,將map改成使用mapPartitions等方法即可,代碼修改爲:

 val id_hot = serviceid_read_fav_share_vote.mapPartitions{
      partition =>
        partition.map{
            x=>
            {
              val id = x.getString(0)
              val read = x.getLong(1).toDouble
              val fav = x.getLong(2).toDouble
              val share = x.getLong(3).toDouble
              val vote = x.getLong(4).toDouble

              val hot = 0.1 * (read/max_read) + 0.2 * (fav/max_fav) + 0.3 * (share/max_share) +0.4 * (vote/max_vote)
              (id,hot)
            }
        }
    }.toDF("id","hot")
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章