>>> 16/10/15 20:07:35 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 1
16/10/15 20:07:35 INFO ExecutorAllocationManager: Removing executor 1 because it has been idle for 60 seconds (new desired total will be 0)
16/10/15 20:07:36 ERROR YarnScheduler: Lost executor 1 on hadoop05: remote Rpc client disassociated
16/10/15 20:07:36 INFO DAGScheduler: Executor lost: 1 (epoch 0)
16/10/15 20:07:36 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/10/15 20:07:36 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, hadoop05, 41258)
16/10/15 20:07:36 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
16/10/15 20:07:36 INFO ExecutorAllocationManager: Existing executor 1 has been removed (new total is 0)
時不時就報ERROR YarnScheduler: Lost executor 1 on hadoop05: remote Rpc client disassociated的錯誤。
後查證該問題是spark1.5的bug由於啓用了動態分配以及回收資源,當正確的回收資源後,會報出這個錯誤。
這個錯誤不會影響集羣以及計算任務的結果。
Jira地址:https://issues.apache.org/jira/browse/SPARK-4134
最好的辦法是將spark升級至1.6