遠程debug,特別是在集羣方式時候,會很方便了解代碼的運行方式,這也是碼農比較喜歡的方式
雖然scala的語法和java不一樣,但是scala是運行在JVM虛擬機上的,也就是scala最後編譯成字節碼運行在JVM上,那麼遠程調試方式就是JVM調試方式
在服務器端:
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y
客戶端通過socket就能遠程調試代碼
1. 調試submit, master, worker代碼
1.1 Submit 調試
客戶端client 運行Submit,這裏就不描述,通常spark的用例都是用spark-submit
提交一個spark任務
其本質就是類似下面命令
/usr/java/jdk1.8.0_111/bin/java -cp /work/spark-2.1.0-bin-hadoop2.7/conf/:/work/spark-2.1.0-bin-hadoop2.7/jars/* -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7000,suspend=y -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://raintungmaster:7077 --class rfcexample --jars /work/spark-2.1.0-bin-hadoop2.7/examples/jars/scopt_2.11-3.3.0.jar,/work/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar /tmp/machinelearning.jar
調用SparkSubmit的類去提交任務,debug的參數直接往上加就是了
1.2 master, worker 的設置調試
export SPARK_WORKER_OPTS="-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=8000,suspend=n"
export SPARK_MASTER_OPTS="-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=8001,suspend=n"
在啓動的時候設置環境變量就可以了2. 調試executor 代碼
3. Spark standalone 的集羣調度
3.1 Submit 提交任務
try {
mainMethod.invoke(null, childArgs.toArray)
} catch {
case t: Throwable =>
findCause(t) match {
case SparkUserAppException(exitCode) =>
System.exit(exitCode)
case t: Throwable =>
throw t
}
}
而核心就是調用了在我們提交的類的main方法,在上面的例子裏就是參數
--class rfcexample
val sc = new SparkContext(conf)
// Create and start the scheduler
val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
_schedulerBackend = sched
_taskScheduler = ts
_dagScheduler = new DAGScheduler(this)
_heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)
// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
// constructor
_taskScheduler.start()
在standalone 的模式下最後會調用StandaloneSchedulerBackend.scala 的start方法
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
client.start()
launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
waitForRegistration()
launcherBackend.setState(SparkAppHandle.State.RUNNING)
構建Application的描述符號,啓動一個StandaloneAppClient 去connect 的master
3.2 Master 分配任務
case RegisterApplication(description, driver) =>
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
driver.send(RegisteredApplication(app.id, self))
schedule()
}
創建一個新的application id, 註冊這個application,一個application只能綁定一個客戶端端口,同一個客戶端的ip:port只能註冊一個application,在schedule裏通過計算application的內存,core的要求,進行對有效的worker分配executor
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
worker.addExecutor(exec)
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}
在worker的endpoint發送了LaunchExecutor的序列化消息
3.3 Worker 分配任務
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
} else {
try {
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
// Create the executor's working directory
val executorDir = new File(workDir, appId + "/" + execId)
if (!executorDir.mkdirs()) {
throw new IOException("Failed to create directory " + executorDir)
}
// Create local dirs for the executor. These are passed to the executor via the
// SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the
// application finishes.
val appLocalDirs = appDirectories.getOrElse(appId,
Utils.getOrCreateLocalRootDirs(conf).map { dir =>
val appDir = Utils.createDirectory(dir, namePrefix = "executor")
Utils.chmod700(appDir)
appDir.getAbsolutePath()
}.toSeq)
appDirectories(appId) = appLocalDirs
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception =>
logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
private[worker] def start() {
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() { fetchAndRunExecutor() }
}
workerThread.start()
// Shutdown hook that kills actors on shutdown.
shutdownHook = ShutdownHookManager.addShutdownHook { () =>
// It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
// be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
if (state == ExecutorState.RUNNING) {
state = ExecutorState.FAILED
}
killProcess(Some("Worker shutting down")) }
}
在ExecutorRunner.scala的start的方法裏,啓動了線程ExecutorRunner for xxx, 運行executor,難道application裏的方法就是在這個線程裏運行的?
private def fetchAndRunExecutor() {
try {
// Launch the process
val builder = CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),
memory, sparkHome.getAbsolutePath, substituteVariables)
val command = builder.command()
val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
.....
process = builder.start()
......
val exitCode = process.waitFor()
state = ExecutorState.EXITED
val message = "Command exited with code " + exitCode
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
} catch {
......
}
}
在看fetchAndRunExecutor的方法裏,我們看到了builder.start,這是一個ProcessBuilder,也就是當前線程啓動了一個子進程運行命令
4. 調試executor進程
val driverUrl = RpcEndpointAddress(
sc.conf.get("spark.driver.host"),
sc.conf.get("spark.driver.port").toInt,
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
val args = Seq(
"--driver-url", driverUrl,
"--executor-id", "{{EXECUTOR_ID}}",
"--hostname", "{{HOSTNAME}}",
"--cores", "{{CORES}}",
"--app-id", "{{APP_ID}}",
"--worker-url", "{{WORKER_URL}}")
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
.map(Utils.splitCommandString).getOrElse(Seq.empty)
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
// When testing, expose the parent class path to the child. This is processed by
// compute-classpath.{cmd,sh} and makes all needed jars available to child processes
// when the assembly is built with the "*-provided" profiles enabled.
val testingClassPath =
if (sys.props.contains("spark.testing")) {
sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
} else {
Nil
}
// Start executors with a few necessary configs for registering with the scheduler
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
我們看到了executor 的java的參數是在javaOpts裏控制的,也就是val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
原來是參數spark.executor.extraJavaOptions控制的,反過來去翻spark文檔,雖然有點晚spark.executor.extraJavaOptions (none) | A string of extra JVM options to pass to executors. For instance, GC settings or other logging. Note that it is illegal to set Spark properties or maximum heap size (-Xmx)settings with this option. Spark properties should be set using a SparkConf object or the spark-defaults.conf file used with the spark-submit script. Maximum heap size settings can be set with spark.executor.memory. |
--conf "spark.executor.extraJavaOptions=-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y"
整個運行的submit的進程就是/usr/java/jdk1.8.0_111/bin/java -cp /work/spark-2.1.0-bin-hadoop2.7/conf/:/work/spark-2.1.0-bin-hadoop2.7/jars/* -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7000,suspend=y -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://raintungmaster:7077 --class rfcexample --jars /work/spark-2.1.0-bin-hadoop2.7/examples/jars/scopt_2.11-3.3.0.jar,/work/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar --conf "spark.executor.extraJavaOptions=-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y" /tmp/machinelearning.jar