spark's deploy mode

Two deploy mode: client and cluster.

 A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within thespark-submit process which acts as a client to the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to usecluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for Mesos clusters. Currently only YARN supports cluster mode for Python applications

When I look up the help of spark-submit by command spark-submit --help, I get :

--deploy-mode: Whether to launch the driver program locally("client") or on one of the worker machines inside the cluster("cluster") (default:client)

顯然,如果我們在集羣中的master上提交程序,master上會跑driver program,採用的是client mode.

但是如果是在集羣的其他的節點呢,提交程序呢?

這裏面涉及driver program 會在哪裏運行的問題。

So what's driver program?

Driver program it the process running the main() function of the application and creating the sprakContext.

在哪個節點提交,driver program就在哪個節點運行。


references:

[1]http://spark.apache.org/docs/latest/submitting-applications.html(Accessed:2016-06-02)

[2]http://spark.apache.org/docs/latest/cluster-overview.html(Accessed:2016-06-02)

發佈了82 篇原創文章 · 獲贊 15 · 訪問量 25萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章