spark sql insert 報錯 Container killed on request. Exit code is 143

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 198 in stage 4.0 failed 4 times, most recent failure: Lost task 198.3 in stage 4.0 (TID 1722, hadoop-slave-17, executor 22): ExecutorLostFailure (executor 22 exited caused by one of the running tasks) Reason: Container marked as failed: container_e209_1579608513692_34016_01_000030 on host: hadoop-slave-17. Exit status: 143. Diagnostics: [2020-03-05 17:17:55.532]Container killed on request. Exit code is 143
[2020-03-05 17:17:55.532]Container exited with a non-zero exit code 143. 
[2020-03-05 17:17:55.532]Killed by external signal
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1524)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1512)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1511)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1511)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1739)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1694)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1683)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2031)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:184)
	... 23 more

錯誤分析

從hive報錯看是由於物理內存達到限制,導致container被kill掉報錯。
從報錯時刻看是執行reduce階段報錯;故可能reduce處理階段container的內存不夠導致。

解決方案

首先查看關於container內存的配置:

hive (default)> SET mapreduce.map.memory.mb;
mapreduce.map.memory.mb=4096
hive (default)> SET mapreduce.reduce.memory.mb;
mapreduce.reduce.memory.mb=4096
hive (default)> SET yarn.nodemanager.vmem-pmem-ratio;
yarn.nodemanager.vmem-pmem-ratio=4.2

因此,單個map和reduce分配物理內存4G;虛擬內存限制4*4.2=16.8G;

單個reduce處理數據量超過內存4G的限制導致;設置 mapreduce.reduce.memory.mb=8192 解決;

二、如果是spark執行

直接 增加資源 就可以 (親測可用)

--executor-memory 4G

參考:
http://stackoverflow.com/questions/29001702/why-yarn-java-heap-space-memory-error?answertab=oldest#tab-top
 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章