spark submit 提交腳本的參數詳解

原創

2019-08-03 08:29

在將項目打包之後，可能需要將其提交到大數據平臺上去跑，這個時候就涉及到腳本處理。這裏主要是有關spark的運行任務…一些常用的提交參數配置如下所示：

參數名	參數說明
- -master	master 的地址，提交任務到哪裏執行，例如 spark://host:port, yarn, local
- -deploy-mode	在本地 (client) 啓動 driver 或在 cluster 上啓動，默認是 client
- -name	應用程序的名稱
- -class	應用程序的主類，僅針對 java 或 scala 應用
- -jars	用逗號分隔的本地 jar 包，設置後，這些 jar 將包含在 driver 和 executor 的 classpath 下
- -packages	包含在driver 和executor 的 classpath 中的 jar 的 maven 座標
- -exclude-packages	爲了避免衝突而指定不包含的 package
- -repositories	遠程 repository
- -conf	指定 spark 配置屬性的值
- -properties-file	加載的配置文件，默認爲 conf/spark-defaults.conf
- -driver-memory	Driver內存，默認 1G
- -driver-java-options	傳給 driver 的額外的 Java 選項
- -driver-library-path	傳給 driver 的額外的庫路徑
- -driver-class-path	傳給 driver 的額外的類路徑
- -driver-cores	Driver 的核數，默認是1。在 yarn 或者 standalone 下使用
- -executor-memory	每個 executor 的內存，默認是1G
–total-executor-cores	所有 executor 總共的核數。僅僅在 mesos 或者 standalone 下使用
–num-executors	啓動的 executor 數量。默認爲2。在 yarn 下使用
–executor-core	每個 executor 的核數。在yarn或者standalone下使用

提交的腳本示例：

spark2-submit 
--conf spark.yarn.submit.waitAppCompletion=false 
--queue xxxx.xxx 
--proxy-user xxx 
--master yarn-cluster 
--class xxxx.xxx.xxxTask （對應包下的類的路徑)
--name xxxTask 
--conf kafka.version=0.10  
--executor-cores 2  
--executor-memory 2048M 
--driver-memory 512M 
--num-executors 1 
hdfs://xxxx(提交jar包中所顯示的完整路徑，jar包將對應保存在hdfs的相應目錄裏面)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

spark submit 提交腳本的參數詳解

elasticsearch中對於空字符串""的過濾操作

Spark中連接Mysql

Java 中 list集合中有幾十萬條數據，每次100條爲一組取出（一定的量取出）

spark submit 提交腳本的參數詳解

Java之Socket學習

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結