使用隱藏的 REST API 提交 SPARK 任務

在做spark應用開發的時候,有兩種方式可以提交任務到集羣中去執行,spark 官網上,給出的提交任務的方式是spark-submit 腳本的方式,一種是 使用spark 隱藏的rest api 。

一、Spark Submit

# Use spark-submit to run your application
$ YOUR_SPARK_HOME/bin/spark-submit \
  --class "SimpleApp" \
  --master local[4] \
  target/scala-2.12/simple-project_2.12-1.0.jar

二、REST API from outside the Spark cluster

1. 提交任務到SPARK集羣

curl -X POST http://spark-cluster-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ "myAppArgument1" ],
  "appResource" : "file:/myfilepath/spark-job-1.0.jar",
  "clientSparkVersion" : "2.4.4",
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "mainClass" : "com.mycompany.MyJob",
  "sparkProperties" : {
    "spark.jars" : "file:/myfilepath/spark-job-1.0.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "MyJob",
    "spark.eventLog.enabled": "true",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "spark://spark-cluster-ip:7077"
  }
}'

參數說明:

spark-cluster-ip:spark master地址。默認的rest服務端口是6066,如果被佔用會依次查找6067,6068…
“action” : “CreateSubmissionRequest”:請求的內容是提交程序,固定值。
“appArgs” : [ “args1, args2,…” ]:我們的程序jar包所需要的參數,如kafka topic,使用的模型等(說明:如果程序沒有需要的參數,這裏寫”appArgs”:[],不能不寫,否則會把appResource後面的一條解析爲appArgs引發未知錯誤)
“appResource” : “file:/spark.jar”:程序jar包的路徑
“clientSparkVersion” : “2.4.4”:spark的版本
“environmentVariables” : {“SPARK_ENV_LOADED” : “1”}:是否加載Spark環境變量(此項必須要寫,否則會報NullPointException)
“mainClass” : “mainClass”:程序的主類main方法
“sparkProperties” : {…}:spark的參數配置

返回結果:

{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20200115102452-0000",
  "serverSparkVersion" : "2.4.4",
  "submissionId" : "driver-20200115102452-0000",
  "success" : true
}

2. 獲取已提交程序的執行狀態

curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20200115102452-0000

其中driver-20200115102452-0000 爲提交程序後返回的submission Id,也可以在spark 監控頁面查看,running driver 或者完成driver的 submission Id。

返回結果:

{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FINISHED",
  "serverSparkVersion" : "2.4.4",
  "submissionId" : "driver-20200115102452-0000",
  "success" : true,
  "workerHostPort" : "128.96.104.10:37588",
  "workerId" : "worker-20201016084158-128.96.104.10-37588"
}

driverState表示程序的運行狀態,包括以下幾個類型:

ERROR(因錯誤沒有提交成功,會顯示錯誤信息),

SUBMITTED(已提交但未開始執行),

RUNNIG(正在運行),

FAILED(執行失敗,會拋出異常),

FINISHED(執行成功)

3. 結束已提交的程序

curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-20200115102452-0000

返回結果:

{
  "action" : "KillSubmissionResponse",
  "message" : "Kill request for driver-20181016102452-0000 submitted",
  "serverSparkVersion" : "2.4.4",
  "submissionId" : "driver-20200115102452-0000",
  "success" : true
}

4. 查看spark集羣work的運行信息

curl http://spark-cluster-ip:8080/json/
發佈了31 篇原創文章 · 獲贊 33 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章