Spark在執行任務時,需要訪問到Executor的許多端口,而這些端口是隨機的,又是通過主機名稱訪問。所以Kubernetes環境與大數據環境之間難以直接訪問。可通過以下配置實現大數據集羣訪問到Kubernetes環境中運行的Spark Executor
1、Spark Executor在執行時,有許多隨機端口,在K8S環境中運行時需要固定其端口,端口的範圍爲K8S集羣NodePort分配的端口範圍:30000-32767
#driver監聽的接口。這用於和executors以及獨立的master通信(默認隨機) spark_driver_port: 30920 #driver的文件服務器監聽的端口(默認隨機) spark_fileserver_port: 30921 #driver的HTTP廣播服務器監聽的端口(默認隨機) spark_broadcast_port: 30922 #driver的HTTP類服務器監聽的端口(默認隨機) spark_replClassServer_port: 30923 #塊管理器監聽的端口。這些同時存在於driver和executors(默認隨機) spark_blockManager_port: 30924 #executor監聽的端口。用於與driver通信(默認隨機) spark_executor_port: 30925
2、爲Spark Executor創建一個StatefulSet,可以得到一個DNS域名:$(podname).(headless server name).namespace.svc.cluster.local
apiVersion: apps/v1 kind: StatefulSet metadata: name: my-executor-statefulset namespace: [namespace] labels: app: my-executor-statefulset spec: serviceName: my-executor replicas: 1 selector: matchLabels: app: my-executor-pod version: [version] template: metadata: labels: app: my-executor-pod version: [version] spec: containers: - name: my-executor-pod image: 192.168.0.12:9090/eyes/my-executor-[namespace]:[version]-[ru] imagePullPolicy: Always ports: - containerPort: 5011 hostAliases: - hostnames: - hadoop-master01 ip: 192.168.0.10 - hostnames: - hadoop-slave02 ip: 192.168.0.11
3、爲Spark Executor創建一個NodePort類型的Service,需要配置剛剛第一步配置好的固定端口
apiVersion: v1 kind: Service metadata: name: my-executor-svc namespace: [namespace] labels: app: my-executor-pod spec: ports: - port: 5011 name: tcp-port protocol: TCP - port: 4040 name: spark-http-port protocol: TCP nodePort: 30028 - port: 30920 name: spark-driver-port protocol: TCP nodePort: 30920 - port: 30921 name: spark-fileserver-port protocol: TCP nodePort: 30921 - port: 30922 name: spark-broadcast-port protocol: TCP nodePort: 30922 - port: 30923 name: spark-eplclassserver-port protocol: TCP nodePort: 30923 - port: 30924 name: spark-blockmanager-port protocol: TCP nodePort: 30924 - port: 30925 name: spark-executor-port protocol: TCP nodePort: 30925 selector: app: my-executor-pod type: NodePort
4、在大數據環境的機器中全部配置hosts爲StatefulSet的DNS域名:$(podname).(headless server name).namespace.svc.cluster.local,IP地址設置爲K8S中的任意IP即可
192.168.0.12 my-executor-statefulset-0.my-executor.test2.svc.cluster.local