Spark提交任務的相關參數解析

Options:


--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

配置spark運行的模式

  • local local[k] 本地模式
    • 使用一/多個worker線程提交任務,適合小量數據在本地調試代碼
  • spark://host:port Standalone模式
    • spark自帶的機器模式,配置比較麻煩,一般用於入門學習,不作爲生產環境
  • yarn yarn模式
    • 使用yarn作爲spark第哦啊度任務的框架,有yarn-client、 yarn-client,生產最常用的一項
  • mesos mesos模式
    • Mesos是Apache下的開源分佈式資源管理框架,和yarn相似

       


--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).

master參數必須選擇yarn

  • client yarn-client模式
    • yarn-client模式使用yarn調度資源運行任務,Driver運行在本地,所在服務器網絡壓力較大,好處是日誌在本地打印方便調試,適合測試
  • cluster yarn-cluster模式
    • yarn-cluster模式式使用yarn調度資源運行任務,Driver運行在NodeManager上,每次運行都時隨機分配到NodeManager機器上,適合生產環境

--class CLASS_NAME Your application's main class (for Java / Scala apps).

要運行的main方法,--class com.report.AttackDetailReport


--name NAME A name of your application.

任務的名稱,使用yarn-client模式提交,appname是代碼裏設置的,yarn-cluster模式提交,appname變爲執行類的全類名,例如com.aa.bb.Main


--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.

任務要依賴的jar報的路徑(本地),多個路徑使用英文逗號隔開,--jar /opt/c.jar,/opt/d.jar


--packages Comma-separated list of maven coordinates of jars to include

on the driver and executor classpaths. Will search the local

maven repo, then maven central and any additional remote

repositories given by --repositories. The format for the

coordinates should be groupId:artifactId:version.

要依賴的jar包的maven地址,--repositories 爲mysql-connector-java包的maven地址,若不給定,則會使用該機器安裝的maven默認源中下載


--exclude-packages Comma-separated list of groupId:artifactId, to exclude while

resolving the dependencies provided in --packages to avoid

dependency conflicts.

排除怕衝突的maven依賴


--repositories Comma-separated list of additional remote repositories to

search for the maven coordinates given with --packages.

maven地址,多個地址英文逗號分隔


--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATH for Python apps.

需要加載的外部文件,python版本,多個地址英文逗號分隔


--files FILES Comma-separated list of files to be placed in the working

directory of each executor.

需要加載的外部文件,java、scala版本,多個地址英文逗號分隔


--conf PROP=VALUE Arbitrary Spark configuration property.

以kv的形式往spark configuration裏面傳參數

例子:打印driver的gc信息 --conf "spark.driver.extraJavaOptions= -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"


--properties-file FILE Path to a file from which to load extra properties. If not

specified, this will look for conf/spark-defaults.conf.

在--properties-file中定義的屬性就不必要在spark-sumbit中再定義了,比如在conf/spark-defaults.conf 定義了spark.master,就可以不使用--master了


--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).

spark driver的內存(比如 1000M, 2G)默認1GB,yarn-client、yarn-cluster下也是1G,如果yarn-cluster模式下container內存不足退出,考慮是否是dirver內存不足


--driver-java-options Extra Java options to pass to the driver.

添加java的參數


--driver-library-path Extra library path entries to pass to the driver.

添加java的包


--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with --jars are automatically included in the

classpath.

添加依賴的驅動,常在使用到mysql的時候,添加mysql的連接包,--driver-class-path /opt/gttx/spark/task/lib/mysql-connector-java-5.1.47.jar


--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

一個executor的內存大小,默認1G,yarn模式下executor是跑在container中的,而yarn的container能申請到的最大內存是可配置有上限的


--proxy-user NAME User to impersonate when submitting the application.

模擬提交應用程序的用戶


--help, -h Show this help message and exit

幫助信息


--verbose, -v Print additional debug output

生成更詳細的運行信息


--version, Print the version of current Spark

打印版本信息

Spark standalone with cluster deploy mode only:


--driver-cores NUM Cores for driver (Default: 1).

Spark standalone or Mesos with cluster deploy mode only:

這個參數僅僅在standalone集羣deploy模式下使用,Driver的核數,默認是1


--supervise If given, restarts the driver on failure.

Driver失敗時,重啓driver,在mesos或者standalone下使用


--kill SUBMISSION_ID If given, kills the driver specified.

殺掉driver進程


--status SUBMISSION_ID If given, requests the status of the driver specified.

查看driver進程

Spark standalone and Mesos only:


--total-executor-cores NUM Total cores for all executors.

所有executor總共的核數,僅僅在mesos或者standalone下使用

Spark standalone and YARN only:


--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,

or all available cores on the worker in standalone mode)

每個excutor的核數,僅僅在yarn或者standalone下使用,默認1核

YARN-only:


--driver-cores NUM Number of cores used by the driver, only in cluster mode

(Default: 1).

driver核數,默認1核,僅yarn模式


--queue QUEUE_NAME The YARN queue to submit to (Default: "default").

在spark隊列,僅yarn模式


--num-executors NUM Number of executors to launch (Default: 2).

啓動executors的數量,默認爲2,僅yarn模式


--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

逗號分隔的歸檔文件列表,會被解壓到每個Executor的工作目錄中,僅yarn模式


--principal PRINCIPAL Principal to be used to login to KDC, while running on

secure HDFS.

安全相關,僅yarn模式


--keytab KEYTAB The full path to the file that contains the keytab for the

principal specified above. This keytab will be copied to

the node running the Application Master via the Secure

Distributed Cache, for renewing the login tickets and the

delegation tokens periodically.

安全相關,僅yarn模式

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章