大數據：Spark Standalone 集羣調度（一）從遠程調試開始說application創建

遠程debug，特別是在集羣方式時候，會很方便了解代碼的運行方式，這也是碼農比較喜歡的方式

雖然scala的語法和java不一樣，但是scala是運行在JVM虛擬機上的，也就是scala最後編譯成字節碼運行在JVM上，那麼遠程調試方式就是JVM調試方式

在服務器端：

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y

客戶端通過socket就能遠程調試代碼

1. 調試submit, master, worker代碼

1.1 Submit 調試

客戶端client 運行Submit，這裏就不描述，通常spark的用例都是用

spark-submit

提交一個spark任務

其本質就是類似下面命令

/usr/java/jdk1.8.0_111/bin/java -cp /work/spark-2.1.0-bin-hadoop2.7/conf/:/work/spark-2.1.0-bin-hadoop2.7/jars/* -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7000,suspend=y -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://raintungmaster:7077 --class rfcexample --jars /work/spark-2.1.0-bin-hadoop2.7/examples/jars/scopt_2.11-3.3.0.jar,/work/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar /tmp/machinelearning.jar

調用SparkSubmit的類去提交任務，debug的參數直接往上加就是了

1.2 master, worker 的設置調試

export SPARK_WORKER_OPTS="-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=8000,suspend=n"
export SPARK_MASTER_OPTS="-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=8001,suspend=n"

在啓動的時候設置環境變量就可以了

2. 調試executor 代碼

發現設置woker環境參數，但確一直都無法調試在spark executor 運行的代碼，既然executor是在worker上運行的，當然是可以遠程debug，但爲啥executor不能調試呢？

3. Spark standalone 的集羣調度

既然executor不能調試，我們需要把submit, master, worker的調度關係搞清楚

3.1 Submit 提交任務

剛纔已經描述過submit實際上初始化了SparkSubmit的類，在SparkSubmit的main方法中調用了runMain方法

try {
      mainMethod.invoke(null, childArgs.toArray)
    } catch {
      case t: Throwable =>
        findCause(t) match {
          case SparkUserAppException(exitCode) =>
            System.exit(exitCode)

          case t: Throwable =>
            throw t
        }
    }

而核心就是調用了在我們提交的類的main方法，在上面的例子裏就是參數

--class rfcexample

調用了rfcexample的main方法

通常我們在寫spark的運行的類的方法，會初始化spark的上下文

val sc = new SparkContext(conf)

SparkContext初始化的時候會啓動一個task的任務

// Create and start the scheduler
    val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
    _schedulerBackend = sched
    _taskScheduler = ts
    _dagScheduler = new DAGScheduler(this)
    _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)

    // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
    // constructor
    _taskScheduler.start()

在standalone 的模式下最後會調用StandaloneSchedulerBackend.scala 的start方法

    val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
    client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
    client.start()
    launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
    waitForRegistration()
    launcherBackend.setState(SparkAppHandle.State.RUNNING)

構建Application的描述符號，啓動一個StandaloneAppClient 去connect 的master

3.2 Master 分配任務

Submit 裏創建了一個客戶端構建了一個application的描述，註冊application 到master中，在master的dispatcher分發消息會收到registerapplication的消息

case RegisterApplication(description, driver) =>
      // TODO Prevent repeated registrations from some driver
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
      } else {
        logInfo("Registering app " + description.name)
        val app = createApplication(description, driver)
        registerApplication(app)
        logInfo("Registered app " + description.name + " with ID " + app.id)
        persistenceEngine.addApplication(app)
        driver.send(RegisteredApplication(app.id, self))
        schedule()
      }

創建一個新的application id, 註冊這個application，一個application只能綁定一個客戶端端口，同一個客戶端的ip:port只能註冊一個application，在schedule裏通過計算application的內存，core的要求，進行對有效的worker分配executor

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    worker.addExecutor(exec)
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }

在worker的endpoint發送了LaunchExecutor的序列化消息

3.3 Worker 分配任務

在worker.scala中dispatcher收到了LaunchExecutor 消息

case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
      if (masterUrl != activeMasterUrl) {
        logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
      } else {
        try {
          logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))

          // Create the executor's working directory
          val executorDir = new File(workDir, appId + "/" + execId)
          if (!executorDir.mkdirs()) {
            throw new IOException("Failed to create directory " + executorDir)
          }

          // Create local dirs for the executor. These are passed to the executor via the
          // SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the
          // application finishes.
          val appLocalDirs = appDirectories.getOrElse(appId,
            Utils.getOrCreateLocalRootDirs(conf).map { dir =>
              val appDir = Utils.createDirectory(dir, namePrefix = "executor")
              Utils.chmod700(appDir)
              appDir.getAbsolutePath()
            }.toSeq)
          appDirectories(appId) = appLocalDirs
          val manager = new ExecutorRunner(
            appId,
            execId,
            appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
            cores_,
            memory_,
            self,
            workerId,
            host,
            webUi.boundPort,
            publicAddress,
            sparkHome,
            executorDir,
            workerUri,
            conf,
            appLocalDirs, ExecutorState.RUNNING)
          executors(appId + "/" + execId) = manager
          manager.start()
          coresUsed += cores_
          memoryUsed += memory_
          sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
        } catch {
          case e: Exception =>
            logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
            if (executors.contains(appId + "/" + execId)) {
              executors(appId + "/" + execId).kill()
              executors -= appId + "/" + execId
            }
            sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
              Some(e.toString), None))
        }
      }

創建了一個工作目錄，啓動了ExecutorRunner

private[worker] def start() {
    workerThread = new Thread("ExecutorRunner for " + fullId) {
      override def run() { fetchAndRunExecutor() }
    }
    workerThread.start()
    // Shutdown hook that kills actors on shutdown.
    shutdownHook = ShutdownHookManager.addShutdownHook { () =>
      // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
      // be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
      if (state == ExecutorState.RUNNING) {
        state = ExecutorState.FAILED
      }
      killProcess(Some("Worker shutting down")) }
  }

在ExecutorRunner.scala的start的方法裏，啓動了線程ExecutorRunner for xxx, 運行executor，難道application裏的方法就是在這個線程裏運行的？

private def fetchAndRunExecutor() {
    try {
      // Launch the process
      val builder = CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),
        memory, sparkHome.getAbsolutePath, substituteVariables)
      val command = builder.command()
      val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
      .....
      process = builder.start()
      ......
      val exitCode = process.waitFor()
      state = ExecutorState.EXITED
      val message = "Command exited with code " + exitCode
      worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
    } catch {
      ......
    }
  }

在看fetchAndRunExecutor的方法裏，我們看到了builder.start，這是一個ProcessBuilder，也就是當前線程啓動了一個子進程運行命令

這就是爲什麼我們無法通過debug worker的方式去debug executor, 因爲這是另一個進程

4. 調試executor進程

我們剛纔跟了代碼一路，發現在master接受到RegisterApplication消息到發送調度worker的LaunchExecutor消息，並沒有對消息進行處理，最後子進程的運行命令是從ApplicationDescription中的command獲取到，而我們也知道ApplicationDescription 就是3.1種的Submit創建的，那就回到StandaloneSchedulerBackend.scala 的start方法

val driverUrl = RpcEndpointAddress(
      sc.conf.get("spark.driver.host"),
      sc.conf.get("spark.driver.port").toInt,
      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
    val args = Seq(
      "--driver-url", driverUrl,
      "--executor-id", "{{EXECUTOR_ID}}",
      "--hostname", "{{HOSTNAME}}",
      "--cores", "{{CORES}}",
      "--app-id", "{{APP_ID}}",
      "--worker-url", "{{WORKER_URL}}")
    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
      .map(Utils.splitCommandString).getOrElse(Seq.empty)
    val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
      .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
    val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
      .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)

    // When testing, expose the parent class path to the child. This is processed by
    // compute-classpath.{cmd,sh} and makes all needed jars available to child processes
    // when the assembly is built with the "*-provided" profiles enabled.
    val testingClassPath =
      if (sys.props.contains("spark.testing")) {
        sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
      } else {
        Nil
      }

    // Start executors with a few necessary configs for registering with the scheduler
    val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
    val javaOpts = sparkJavaOpts ++ extraJavaOpts

我們看到了executor 的java的參數是在javaOpts裏控制的，也就是

val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")

原來是參數spark.executor.extraJavaOptions控制的，反過來去翻spark文檔，雖然有點晚

spark.executor.extraJavaOptions (none)

A string of extra JVM options to pass to executors. For instance, GC settings or other logging. Note that it is illegal to set Spark properties or maximum heap size (-Xmx)settings with this option. Spark properties should be set using a SparkConf object or the spark-defaults.conf file used with the spark-submit script. Maximum heap size settings can be set with spark.executor.memory.

在這個文檔裏，我們可以通過設置conf 對spark_submit 進行executor 進行JVM參數設置

--conf "spark.executor.extraJavaOptions=-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y"

整個運行的submit的進程就是

/usr/java/jdk1.8.0_111/bin/java -cp /work/spark-2.1.0-bin-hadoop2.7/conf/:/work/spark-2.1.0-bin-hadoop2.7/jars/* -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7000,suspend=y -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://raintungmaster:7077 --class rfcexample --jars /work/spark-2.1.0-bin-hadoop2.7/examples/jars/scopt_2.11-3.3.0.jar,/work/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar --conf "spark.executor.extraJavaOptions=-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y" /tmp/machinelearning.jar

注意：

如果你的worker 不能起多個executor，畢竟監聽端口在一起機器上只能起一個。

大數據：Spark Standalone 集羣調度（一）從遠程調試開始說application創建

1. 調試submit, master, worker代碼

1.1 Submit 調試

1.2 master, worker 的設置調試

2. 調試executor 代碼

3. Spark standalone 的集羣調度

3.1 Submit 提交任務

3.2 Master 分配任務

3.3 Worker 分配任務

4. 調試executor進程

Soot 靜態分析框架（二）Soot的核心

Soot 靜態分析框架（三）Soot 過程分析

【TensorFlow基礎函數】tf.transpose函數說明和用法

Soot 靜態分析框架（一）整體框架

CVE-2016-1000031 Apache Commons FileUpload 反序列化漏洞深入分析

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結