YARN Distributedshell解析

Hadoop 2.0的源代碼中實現了兩個基於yarn的application,一個是MapReduce,另一個是被當做如何寫application的示例程序----Distributedshell,可以認爲它就是YARN的workcount示例程序.

distributedshell作用和它名字一樣,分佈式shell執行,將用戶提交的一串shell命令或者一個shell腳本,由ApplicationMaster控制,分配到不同的container中執行。

distributedshell的源代碼在"hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell"

包含了實現一個application的三個要求:

客戶端和RM (Client.java)

客戶端提交application

AM和RM (ApplicationMaster.java)
註冊AM,申請分配container
AM和NM (ApplicationMaster.java)
啓動container

執行命令:
hadoop jar hadoop-yarn-applications-distributedshell-2.0.5-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar hadoop-yarn-applications-distributedshell-2.0.5-alpha.jar -shell_command '/bin/date' -num_containers 10
啓動10個container,每個都執行`date`命令

執行代碼流程:
1. 客戶端通過org.apache.hadoop.yarn.applications.distributedshell.Client提交application到RM,需提供ApplicationSubmissionContext
2. org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster提交containers請求,執行用戶提交的命令ContainerLaunchContext.commands

客戶端(Client.java):
1. YarnClient.getNewApplication
2. 填充ApplicationSubmissionContext,ContainerLaunchContext(啓動AM的Container)​
3. YarnClient.submitApplication​
4. 每隔一段時間調用YarnClient.getApplicationReport獲得Application Status
	// 創建AM的上下文信息
	ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);
	// 設置本地資源,AppMaster.jar包,log4j.properties
	amContainer.setLocalResources(localResources);
	// 環境變量,shell腳本在hdfs的地址, CLASSPATH
	amContainer.setEnvironment(env);
	// 設置啓動AM的命令和參數
	Vector<CharSequence> vargs = new Vector<CharSequence>(30);
	vargs.add("${JAVA_HOME}" + "/bin/java");
	vargs.add("-Xmx" + amMemory + "m");
	// AM主類
	vargs.add("org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster?");
	vargs.add("--container_memory " + String.valueOf(containerMemory));
	vargs.add("--num_containers " + String.valueOf(numContainers));
	vargs.add("--priority " + String.valueOf(shellCmdPriority));
	if (!shellCommand.isEmpty()) {
	vargs.add("--shell_command " + shellCommand + "");
	}
	if (!shellArgs.isEmpty()) {
	vargs.add("--shell_args " + shellArgs + "");
	}
	for (Map.Entry<String, String> entry : shellEnv.entrySet()) {
	vargs.add("--shell_env " + entry.getKey() + "=" + entry.getValue());
	}
	vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stdout");
	vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stderr");

	amContainer.setCommands(commands);
	// 設置Resource需求,目前只設置memory
	capability.setMemory(amMemory);
	amContainer.setResource(capability);
	appContext.setAMContainerSpec(amContainer);
	// 提交application到RM
	super.submitApplication(appContext);

ApplicationMaster(ApplicationMaster.java​)
1. AMRMClient.registerApplicationMaster​​
2. 提供ContainerRequest到AMRMClient.addContainerRequest​
3. 通過AMRMClient.allocate獲得container
4. container放入新建的LaunchContainerRunnable線程內執行
5. 創建ContainerLaunchContext​,設置localResource,shellcommand, shellArgs等​​container啓動信息
6. ContainerManager.startContainer(startReq)​​
7. 下次RPC call後得到的Response信息,AMResponse.getCompletedContainersStatuses​​
8. AMRMClient.unregisterApplicationMaster​​
	// 新建AMRMClient,2.1beta版本實現了異步AMRMClient,這裏還是同步的方式
	resourceManager = new AMRMClientImpl(appAttemptID);
	resourceManager.init(conf);
	resourceManager.start();
	// 向RM註冊自己
	RegisterApplicationMasterResponse response = resourceManager
	  .registerApplicationMaster(appMasterHostname, appMasterRpcPort,
		  appMasterTrackingUrl);
	while (numCompletedContainers.get() < numTotalContainers && !appDone) {
	// 封裝Container請求,設置Resource需求,這邊只設置了memory
	ContainerRequest containerAsk = setupContainerAskForRM(askCount);
	resourceManager.addContainerRequest(containerAsk);

	// Send the request to RM
	LOG.info("Asking RM for containers" + ", askCount=" + askCount);
	AMResponse amResp = sendContainerAskToRM();

	// Retrieve list of allocated containers from the response
	List<Container> allocatedContainers = amResp.getAllocatedContainers();
	for (Container allocatedContainer : allocatedContainers) {
		//新建一個線程來提交container啓動請求,這樣主線程就不會被block住了
		LaunchContainerRunnable runnableLaunchContainer = new LaunchContainerRunnable(
		  allocatedContainer);
		Thread launchThread = new Thread(runnableLaunchContainer);
		launchThreads.add(launchThread);
		launchThread.start();
	}
	List<ContainerStatus> completedContainers = amResp.getCompletedContainersStatuses();
	}
	// 向RM註銷自己
	resourceManager.unregisterApplicationMaster(appStatus, appMessage, null);

附上AM的log信息

containerNode=dev81.hadoop:56100, containerNodeURI=dev81.hadoop:8042, containerStateNEW, containerResourceMemory1024
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Current available resources in the cluster <memory:26624, vCores:-5>
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=5
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1376966186147_0006_01_000007, state=COMPLETE, exitStatus=0, diagnostics=
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1376966186147_0006_01_000007
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1376966186147_0006_01_000008, state=COMPLETE, exitStatus=0, diagnostics=
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Connecting to ContainerManager at dev81.hadoop:56100
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1376966186147_0006_01_000008
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1376966186147_0006_01_000009, state=COMPLETE, exitStatus=0, diagnostics=
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1376966186147_0006_01_000009
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1376966186147_0006_01_000006, state=COMPLETE, exitStatus=0, diagnostics=
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1376966186147_0006_01_000011
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1376966186147_0006_01_000006
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Setting user in ContainerLaunchContext to: hadoop
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1376966186147_0006_01_000010, state=COMPLETE, exitStatus=0, diagnostics=
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1376966186147_0006_01_000010
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Current application state: loop=3, appDone=false, total=10, requested=10, completed=9, failed=0, currentAllocated=10
13/08/26 17:15:09 INFO distributedshell.ApplicationMaster: Current application state: loop=4, appDone=false, total=10, requested=10, completed=9, failed=0, currentAllocated=10
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Asking RM for containers, askCount=0
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Sending request to RM for containers, progress=0.9
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=0
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Current available resources in the cluster <memory:26624, vCores:-5>
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1376966186147_0006_01_000011, state=COMPLETE, exitStatus=0, diagnostics=
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1376966186147_0006_01_000011
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Current application state: loop=4, appDone=true, total=10, requested=10, completed=10, failed=0, currentAllocated=10
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM
13/08/26 17:15:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.AMRMClientImpl is stopped.
13/08/26 17:15:10 INFO distributedshell.ApplicationMaster: Application Master completed successfully. exiting


參考例子:


本文鏈接http://blog.csdn.net/lalaguozhe/article/details/10361367,轉載請註明


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章