Spark在执行任务时,需要访问到Executor的许多端口,而这些端口是随机的,又是通过主机名称访问。所以Kubernetes环境与大数据环境之间难以直接访问。可通过以下配置实现大数据集群访问到Kubernetes环境中运行的Spark Executor
1、Spark Executor在执行时,有许多随机端口,在K8S环境中运行时需要固定其端口,端口的范围为K8S集群NodePort分配的端口范围:30000-32767
#driver监听的接口。这用于和executors以及独立的master通信(默认随机) spark_driver_port: 30920 #driver的文件服务器监听的端口(默认随机) spark_fileserver_port: 30921 #driver的HTTP广播服务器监听的端口(默认随机) spark_broadcast_port: 30922 #driver的HTTP类服务器监听的端口(默认随机) spark_replClassServer_port: 30923 #块管理器监听的端口。这些同时存在于driver和executors(默认随机) spark_blockManager_port: 30924 #executor监听的端口。用于与driver通信(默认随机) spark_executor_port: 30925
2、为Spark Executor创建一个StatefulSet,可以得到一个DNS域名:$(podname).(headless server name).namespace.svc.cluster.local
apiVersion: apps/v1 kind: StatefulSet metadata: name: my-executor-statefulset namespace: [namespace] labels: app: my-executor-statefulset spec: serviceName: my-executor replicas: 1 selector: matchLabels: app: my-executor-pod version: [version] template: metadata: labels: app: my-executor-pod version: [version] spec: containers: - name: my-executor-pod image: 192.168.0.12:9090/eyes/my-executor-[namespace]:[version]-[ru] imagePullPolicy: Always ports: - containerPort: 5011 hostAliases: - hostnames: - hadoop-master01 ip: 192.168.0.10 - hostnames: - hadoop-slave02 ip: 192.168.0.11
3、为Spark Executor创建一个NodePort类型的Service,需要配置刚刚第一步配置好的固定端口
apiVersion: v1 kind: Service metadata: name: my-executor-svc namespace: [namespace] labels: app: my-executor-pod spec: ports: - port: 5011 name: tcp-port protocol: TCP - port: 4040 name: spark-http-port protocol: TCP nodePort: 30028 - port: 30920 name: spark-driver-port protocol: TCP nodePort: 30920 - port: 30921 name: spark-fileserver-port protocol: TCP nodePort: 30921 - port: 30922 name: spark-broadcast-port protocol: TCP nodePort: 30922 - port: 30923 name: spark-eplclassserver-port protocol: TCP nodePort: 30923 - port: 30924 name: spark-blockmanager-port protocol: TCP nodePort: 30924 - port: 30925 name: spark-executor-port protocol: TCP nodePort: 30925 selector: app: my-executor-pod type: NodePort
4、在大数据环境的机器中全部配置hosts为StatefulSet的DNS域名:$(podname).(headless server name).namespace.svc.cluster.local,IP地址设置为K8S中的任意IP即可
192.168.0.12 my-executor-statefulset-0.my-executor.test2.svc.cluster.local