ERROR BatchJobMain: Task not serializable

spark在class中使用log4j報錯,無法序列化的問題

報錯信息

21/06/16 11:45:22 ERROR BatchJobMain: Task not serializable
org.apache.spark.SparkException: Task not serializable
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
	at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:341)
	at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:340)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.filter(RDD.scala:340)
	at com.winner.clu.spark.batch.analysis.AccPresetConditionData.mainFun(AccPresetConditionData.scala:110)
	at com.winner.clu.spark.batch.BatchJobMain$.main(BatchJobMain.scala:53)
	at com.winner.clu.spark.batch.BatchJobMain.main(BatchJobMain.scala)
Caused by: java.io.NotSerializableException: org.apache.log4j.Logger
Serialization stack:
	- object not serializable (class: org.apache.log4j.Logger, value: org.apache.log4j.Logger@2d728a9c)
	- field (class: com.winner.clu.spark.batch.analysis.AccPresetConditionData, name: log, type: class org.apache.log4j.Logger)
	- object (class com.winner.clu.spark.batch.analysis.AccPresetConditionData, com.winner.clu.spark.batch.analysis.AccPresetConditionData@67599bae)
	- field (class: com.winner.clu.spark.batch.analysis.AccPresetConditionData$$anonfun$9, name: $outer, type: class com.winner.clu.spark.batch.analysis.AccPresetConditionData)
	- object (class com.winner.clu.spark.batch.analysis.AccPresetConditionData$$anonfun$9, <function1>)
	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
	... 12 more

問題描述

在使用IDEA調試程序的時候,程序中在class中使用了log4j的包之後,報錯無法序列化的問題,但是,在object中使用log4j的時候就是沒有問題的

問題原因

因爲該部分邏輯是要分佈式並行執行的,所以需要將該類定義爲懶加載的方式

解決方案

將log4j的類定義長懶加載的方式即可:

class AccPresetConditionData extends Serializable {

  @transient lazy val log = Logger.getLogger(this.getClass.getSimpleName)

  var siteKey: String = _
  var execDate: String = _

  // 業務分析
  def mainFun(args: Array[String], sc: SparkContext): Unit = {}
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章