Spark on Kubernetes提交任务报错:Expected HTTP 101 response but was '403 Forbidden'

环境:

Spark版本: 2.4.3

Kubernetes版本:v1.16.2

问题:

提交spark-submit example.jar 以cluster方式到k8s集群,driver-pod报错如下:

19/11/06 07:06:54 INFO ExecutorPodsAllocator: Going to request 5 executors from Kubernetes.
19/11/06 07:06:54 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
19/11/06 07:06:54 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/11/06 07:06:54 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

原因:

查了下,发现是EKS安全补丁导致Apache Spark作业失败并出现权限错误。

Stack Overflow

EKS security patches cause Apache Spark jobs to fail with permissions error

Spark社区patch:

https://github.com/apache/spark/pull/25641

https://github.com/apache/spark/pull/25640

解决:

方法1. 该版本已在spark-2.4.4-release及之后版本修复,测试环境的话,直接替换修复后的spark版本或cherry-pick相关commit即可解决;

方法2. 问题的根本原因是spark依赖的jar包问题,因此可将spark/jars下的三个jar包,替换为4.4.0 及更高版本即可。

kubernetes-client-4.4.2.jar
kubernetes-model-4.4.2.jar
kubernetes-model-common-4.4.2.jar

jar包可通过maven仓库获取,如:

wget  https://repo1.maven.org/maven2/io/fabric8/kubernetes-model/4.4.2/kubernetes-model-4.4.2.jar

补充:

1. 通过替换jar包方式,重新build并push镜像后,重新spark-submit提交任务,发现仍报相同错误;

2. 原因应该是本地镜像没更新,仍然用的是旧的镜像;

3. spark-submit 命令中添加: --conf spark.kubernetes.container.image.pullPolicy=Always,使用修改后新的image,问题解决。

至此,spark on kubernetes 官方demo完整提交命令如下:

spark-submit \
    --master k8s://https://172.16.192.128:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=merrily01/repo:spark-2.4.3-image-merrily01 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章