Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 198 in stage 4.0 failed 4 times, most recent failure: Lost task 198.3 in stage 4.0 (TID 1722, hadoop-slave-17, executor 22): ExecutorLostFailure (executor 22 exited caused by one of the running tasks) Reason: Container marked as failed: container_e209_1579608513692_34016_01_000030 on host: hadoop-slave-17. Exit status: 143. Diagnostics: [2020-03-05 17:17:55.532]Container killed on request. Exit code is 143
[2020-03-05 17:17:55.532]Container exited with a non-zero exit code 143.
[2020-03-05 17:17:55.532]Killed by external signal
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1524)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1512)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1511)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1511)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1739)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1694)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1683)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2031)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:184)
... 23 more
錯誤分析
從hive報錯看是由於物理內存達到限制,導致container被kill掉報錯。
從報錯時刻看是執行reduce階段報錯;故可能reduce處理階段container的內存不夠導致。
解決方案
首先查看關於container內存的配置:
hive (default)> SET mapreduce.map.memory.mb;
mapreduce.map.memory.mb=4096
hive (default)> SET mapreduce.reduce.memory.mb;
mapreduce.reduce.memory.mb=4096
hive (default)> SET yarn.nodemanager.vmem-pmem-ratio;
yarn.nodemanager.vmem-pmem-ratio=4.2
因此,單個map和reduce分配物理內存4G;虛擬內存限制4*4.2=16.8G;
單個reduce處理數據量超過內存4G的限制導致;設置 mapreduce.reduce.memory.mb=8192 解決;
二、如果是spark執行
直接 增加資源 就可以 (親測可用)
--executor-memory 4G
參考:
http://stackoverflow.com/questions/29001702/why-yarn-java-heap-space-memory-error?answertab=oldest#tab-top