Spark-deploy

@(spark)[deploy|yarn]

寫在前面的話

請參考Spark源碼分析之-deploy模塊，雖然是13年的文章，但是作者寫的比我明白多了。so 我寫一半就算了。。。

在前文Spark源碼分析之-scheduler模塊中提到了Spark在資源管理和調度上採用了Hadoop YARN的方式：外層的資源管理器和應用內的任務調度器；並且分析了Spark應用內的任務調度模塊。本文就Spark的外層資源管理器-deploy模塊進行分析，探究Spark是如何協調應用之間的資源調度和管理的。

Spark最初是交由Mesos進行資源管理，爲了使得更多的用戶，包括沒有接觸過Mesos的用戶使用Spark，Spark的開發者添加了Standalone的部署方式，也就是deploy模塊。因此deploy模塊只針對不使用Mesos進行資源管理的部署方式。

Deploy模塊整體架構
deploy模塊主要包含3個子模塊：master, worker, client。他們繼承於Actor，通過actor實現互相之間的通信。

Master：master的主要功能是接收worker的註冊並管理所有的worker，接收client提交的application，(FIFO)調度等待的application並向worker提交。

Worker：worker的主要功能是向master註冊自己，根據master發送的application配置進程環境，並啓動StandaloneExecutorBackend。
Client：client的主要功能是向master註冊並監控application。當用戶創建SparkContext時會實例化SparkDeploySchedulerBackend，而實例化SparkDeploySchedulerBackend的同時就會啓動client，通過向client傳遞啓動參數和application有關信息，client向master發送請求註冊application並且在slave node上啓動StandaloneExecutorBackend。

ClientArguments

Command-line parser for the driver client.

  var master: String = ""                                                                                                                                               
  var jarUrl: String = ""                                                                                                                                               
  var mainClass: String = ""                                                                                                                                            
  var supervise: Boolean = DEFAULT_SUPERVISE                                                                                                                            
  var memory: Int = DEFAULT_MEMORY                                                                                                                                      
  var cores: Int = DEFAULT_CORES                                                                                                                                        
  private var _driverOptions = ListBuffer[String]()                                                                                                                     
  def driverOptions = _driverOptions.toSeq

Command

private[spark] case class Command(
mainClass: String,
arguments: Seq[String],
environment: Map[String, String],
classPathEntries: Seq[String],
libraryPathEntries: Seq[String],
javaOpts: Seq[String]) {
}

ApplicationDescription

private[spark] class ApplicationDescription(                                                                                                                            
    val name: String,                                                                                                                                                   
    val maxCores: Option[Int],                                                                                                                                          
    val memoryPerSlave: Int,                                                                                                                                            
    val command: Command,                                                                                                                                               
    var appUiUrl: String,                                                                                                                                               
    val eventLogDir: Option[URI] = None,                                                                                                                                
    // short name of compression codec used when writing event logs, if any (e.g. lzf)                                                                                  
    val eventLogCodec: Option[String] = None)                                                                                                                           
  extends Serializable {

DriverDescription

private[spark] class DriverDescription(                                                                                                                                 
    val jarUrl: String,                                                                                                                                                 
    val mem: Int,                                                                                                                                                       
    val cores: Int,                                                                                                                                                     
    val supervise: Boolean,                                                                                                                                             
    val command: Command)                                                                                                                                               
  extends Serializable {

ExecutorState

private[spark] object ExecutorState extends Enumeration {                                                                                                               

  val LAUNCHING, LOADING, RUNNING, KILLED, FAILED, LOST, EXITED = Value                                                                                                 

  type ExecutorState = Value                                                                                                                                            

  def isFinished(state: ExecutorState): Boolean = Seq(KILLED, FAILED, LOST, EXITED).contains(state)                                                                     
}

ExecutorDescription

private[spark] class ExecutorDescription(                                                                                                                               
    val appId: String,                                                                                                                                                  
    val execId: Int,                                                                                                                                                    
    val cores: Int,                                                                                                                                                     
    val state: ExecutorState.Value)                                                                                                                                     
  extends Serializable {

SparkSubmitArguments

 * Parses and encapsulates arguments from the spark-submit script.                                                                                                      
 * The env argument is used for testing.                                                                                                                                
 */                                                                                                                                                                     
private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, String] = sys.env) {

SparkSubmitDriverBootstrapper

/**                                                                                                                                                                     
 * Launch an application through Spark submit in client mode with the appropriate classpath,                                                                            
 * library paths, java options and memory. These properties of the JVM must be set before the                                                                           
 * driver JVM is launched. The sole purpose of this class is to avoid handling the complexity                                                                           
 * of parsing the properties file for such relevant configs in Bash.                                                                                                    
 *                                                                                                                                                                      
 * Usage: org.apache.spark.deploy.SparkSubmitDriverBootstrapper <submit args>                                                                                           
 */                                                                                                                                                                     
private[spark] object SparkSubmitDriverBootstrapper {

JsonProtocol

private[spark] object JsonProtocol {   
把XXXinfo和XXXDescription轉化成json

Client

AppClientListener

/**                                                                                                                                                                     
 * Callbacks invoked by deploy client when various events happen. There are currently four events:                                                                      
 * connecting to the cluster, disconnecting, being given an executor, and having an executor                                                                            
 * removed (either due to failure or due to revocation).                                                                                                                
 *                                                                                                                                                                      
 * Users of this API should *not* block inside the callback methods.                                                                                                    
 */                                                                                                                                                                     
private[spark] trait AppClientListener {

AppClient

/**                                                                                                                                                                     
 * Interface allowing applications to speak with a Spark deploy cluster. Takes a master URL,                                                                            
 * an app description, and a listener for cluster events, and calls back the listener when various                                                                      
 * events occur.                                                                                                                                                        
 *                                                                                                                                                                      
 * @param masterUrls Each url should look like spark://host:port.                                                                                                       
 */                                                                                                                                                                     
private[spark] class AppClient(                                                                                                                                         
    actorSystem: ActorSystem,                                                                                                                                           
    masterUrls: Array[String],                                                                                                                                          
    appDescription: ApplicationDescription,                                                                                                                             
    listener: AppClientListener,                                                                                                                                        
    conf: SparkConf)                                                                                                                                                    
  extends Logging {

ClientActor

/**                                                                                                                                                                     
 * Proxy that relays messages to the driver.                                                                                                                            
 */                                                                                                                                                                     
private class ClientActor(driverArgs: ClientArguments, conf: SparkConf)                                                                                                 
  extends Actor with ActorLogReceive with Logging {

PythonRunner

/**                                                                                                                                                                     
 * A main class used to launch Python applications. It executes python as a                                                                                             
 * subprocess and then has it connect back to the JVM to access system properties, etc.                                                                                 
 */                                                                                                                                                                     
object PythonRunner {

Spark-deploy

Spark-deploy

寫在前面的話

ClientArguments

Command

ApplicationDescription

DriverDescription

ExecutorState

ExecutorDescription

SparkSubmitArguments

SparkSubmitDriverBootstrapper

JsonProtocol

Client

AppClientListener

AppClient

ClientActor

PythonRunner

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

Spark-shuffle

spark-broadcast

Document數據庫 VS 關係數據庫

spark-sql-catalyst

Postgresql-xl 調研

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結