Spark-deploy

Spark-deploy

@(spark)[deploy|yarn]

寫在前面的話

請參考Spark源碼分析之-deploy模塊,雖然是13年的文章,但是作者寫的比我明白多了。so 我寫一半就算了。。。

在前文Spark源碼分析之-scheduler模塊中提到了Spark在資源管理和調度上採用了Hadoop YARN的方式:外層的資源管理器和應用內的任務調度器;並且分析了Spark應用內的任務調度模塊。本文就Spark的外層資源管理器-deploy模塊進行分析,探究Spark是如何協調應用之間的資源調度和管理的。

Spark最初是交由Mesos進行資源管理,爲了使得更多的用戶,包括沒有接觸過Mesos的用戶使用Spark,Spark的開發者添加了Standalone的部署方式,也就是deploy模塊。因此deploy模塊只針對不使用Mesos進行資源管理的部署方式。

Deploy模塊整體架構
deploy模塊主要包含3個子模塊:master, worker, client。他們繼承於Actor,通過actor實現互相之間的通信。

Master:master的主要功能是接收worker的註冊並管理所有的worker,接收client提交的application,(FIFO)調度等待的application並向worker提交。

Worker:worker的主要功能是向master註冊自己,根據master發送的application配置進程環境,並啓動StandaloneExecutorBackend。
Client:client的主要功能是向master註冊並監控application。當用戶創建SparkContext時會實例化SparkDeploySchedulerBackend,而實例化SparkDeploySchedulerBackend的同時就會啓動client,通過向client傳遞啓動參數和application有關信息,client向master發送請求註冊application並且在slave node上啓動StandaloneExecutorBackend。

ClientArguments

Command-line parser for the driver client.

  var master: String = ""                                                                                                                                               
  var jarUrl: String = ""                                                                                                                                               
  var mainClass: String = ""                                                                                                                                            
  var supervise: Boolean = DEFAULT_SUPERVISE                                                                                                                            
  var memory: Int = DEFAULT_MEMORY                                                                                                                                      
  var cores: Int = DEFAULT_CORES                                                                                                                                        
  private var _driverOptions = ListBuffer[String]()                                                                                                                     
  def driverOptions = _driverOptions.toSeq  

Command

private[spark] case class Command(
mainClass: String,
arguments: Seq[String],
environment: Map[String, String],
classPathEntries: Seq[String],
libraryPathEntries: Seq[String],
javaOpts: Seq[String]) {
}

ApplicationDescription

private[spark] class ApplicationDescription(                                                                                                                            
    val name: String,                                                                                                                                                   
    val maxCores: Option[Int],                                                                                                                                          
    val memoryPerSlave: Int,                                                                                                                                            
    val command: Command,                                                                                                                                               
    var appUiUrl: String,                                                                                                                                               
    val eventLogDir: Option[URI] = None,                                                                                                                                
    // short name of compression codec used when writing event logs, if any (e.g. lzf)                                                                                  
    val eventLogCodec: Option[String] = None)                                                                                                                           
  extends Serializable {   

DriverDescription

private[spark] class DriverDescription(                                                                                                                                 
    val jarUrl: String,                                                                                                                                                 
    val mem: Int,                                                                                                                                                       
    val cores: Int,                                                                                                                                                     
    val supervise: Boolean,                                                                                                                                             
    val command: Command)                                                                                                                                               
  extends Serializable {    

ExecutorState

private[spark] object ExecutorState extends Enumeration {                                                                                                               

  val LAUNCHING, LOADING, RUNNING, KILLED, FAILED, LOST, EXITED = Value                                                                                                 

  type ExecutorState = Value                                                                                                                                            

  def isFinished(state: ExecutorState): Boolean = Seq(KILLED, FAILED, LOST, EXITED).contains(state)                                                                     
}       

ExecutorDescription

private[spark] class ExecutorDescription(                                                                                                                               
    val appId: String,                                                                                                                                                  
    val execId: Int,                                                                                                                                                    
    val cores: Int,                                                                                                                                                     
    val state: ExecutorState.Value)                                                                                                                                     
  extends Serializable {    

SparkSubmitArguments

 * Parses and encapsulates arguments from the spark-submit script.                                                                                                      
 * The env argument is used for testing.                                                                                                                                
 */                                                                                                                                                                     
private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, String] = sys.env) {      

SparkSubmitDriverBootstrapper

/**                                                                                                                                                                     
 * Launch an application through Spark submit in client mode with the appropriate classpath,                                                                            
 * library paths, java options and memory. These properties of the JVM must be set before the                                                                           
 * driver JVM is launched. The sole purpose of this class is to avoid handling the complexity                                                                           
 * of parsing the properties file for such relevant configs in Bash.                                                                                                    
 *                                                                                                                                                                      
 * Usage: org.apache.spark.deploy.SparkSubmitDriverBootstrapper <submit args>                                                                                           
 */                                                                                                                                                                     
private[spark] object SparkSubmitDriverBootstrapper {   

JsonProtocol

private[spark] object JsonProtocol {   
把XXXinfo和XXXDescription轉化成json

Client

AppClientListener

/**                                                                                                                                                                     
 * Callbacks invoked by deploy client when various events happen. There are currently four events:                                                                      
 * connecting to the cluster, disconnecting, being given an executor, and having an executor                                                                            
 * removed (either due to failure or due to revocation).                                                                                                                
 *                                                                                                                                                                      
 * Users of this API should *not* block inside the callback methods.                                                                                                    
 */                                                                                                                                                                     
private[spark] trait AppClientListener {  

AppClient

/**                                                                                                                                                                     
 * Interface allowing applications to speak with a Spark deploy cluster. Takes a master URL,                                                                            
 * an app description, and a listener for cluster events, and calls back the listener when various                                                                      
 * events occur.                                                                                                                                                        
 *                                                                                                                                                                      
 * @param masterUrls Each url should look like spark://host:port.                                                                                                       
 */                                                                                                                                                                     
private[spark] class AppClient(                                                                                                                                         
    actorSystem: ActorSystem,                                                                                                                                           
    masterUrls: Array[String],                                                                                                                                          
    appDescription: ApplicationDescription,                                                                                                                             
    listener: AppClientListener,                                                                                                                                        
    conf: SparkConf)                                                                                                                                                    
  extends Logging {    

ClientActor

/**                                                                                                                                                                     
 * Proxy that relays messages to the driver.                                                                                                                            
 */                                                                                                                                                                     
private class ClientActor(driverArgs: ClientArguments, conf: SparkConf)                                                                                                 
  extends Actor with ActorLogReceive with Logging {                                                                                                                     

PythonRunner

/**                                                                                                                                                                     
 * A main class used to launch Python applications. It executes python as a                                                                                             
 * subprocess and then has it connect back to the JVM to access system properties, etc.                                                                                 
 */                                                                                                                                                                     
object PythonRunner {       
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章