前言
在上文Yarn源碼剖析(二) --- spark-submit,我們介紹了spark任務通過spark-submit提交任務至yarn申請資源至啓動的全流程,本篇將介紹啓動過程中ApplicationMaster(後文簡稱AM)是如何啓動。
AM的啓動與Container的申請
1. 在Yarn源碼剖析(二)中yarnClient最終調用submitApplication方法提交任務,傳入的參數帶有AM啓動的上下文,因此AM的啓動就是在yarn這個方法中實現的
val containerContext = createContainerLaunchContext(newAppResponse) //封裝AM啓動的上下文
val appContext = createApplicationSubmissionContext(newApp, containerContext) //App的上下文
yarnClient.submitApplication(appContext) //提交任務
2. AM的啓動異常的複雜,篇幅巨大,下面我會摘選重要的部分做分析,spark在此處封裝好AM運行的上下文後,最終在yarn的事件處理機制會運行這些上下文,回調到spark中的AM類,client模式和cluster模式運行的類是不一樣的,具體的運行類如下所示:
val amClass =
if (isClusterMode) { //集羣cluster模式
Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
} else { //client模式
Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
}
3. 那顯而易見的,我們應該從yarnClient.submitApplication(appContext)去分析hadoop端的代碼,分析yarn是如何來通過這個上下文來啓動spark自己封裝的這個AM的,顯而易見這個接口由YarnClientImpl實現,在該方法內部又調用了ApplicationClientProtocol.submitApplication,這個類是yarn利用rpc相互通信的關鍵類,這裏也不多做介紹了,我們看到提交任務後會啓動一個死循環,等待任務提交完成。
//request是包含了我們服務整體參數以及腳本的對象,提交至RM
rmClient.submitApplication(request);
int pollCount = 0;
long startTime = System.currentTimeMillis();
EnumSet<YarnApplicationState> waitingStates =
EnumSet.of(YarnApplicationState.NEW,
YarnApplicationState.NEW_SAVING,
YarnApplicationState.SUBMITTED);
EnumSet<YarnApplicationState> failToSubmitStates =
EnumSet.of(YarnApplicationState.FAILED,
YarnApplicationState.KILLED);
while (true) {
try {
ApplicationReport appReport = getApplicationReport(applicationId);
YarnApplicationState state = appReport.getYarnApplicationState();
if (!waitingStates.contains(state)) {
if(failToSubmitStates.contains(state)) {
throw new YarnException("Failed to submit " + applicationId +
" to YARN : " + appReport.getDiagnostics());
}
LOG.info("Submitted application " + applicationId);
break;
}
}
4. 這個submitApplication是由ClientRMService來實現的,我把整段方法都貼進來了,所以我把分析內容放到了代碼的註釋中
//爲了保證安全性ApplicationSubmissionContext在這裏會被驗證,哪些獨立於RM
//字段在此處驗證,而依賴於RM發的則在RMAppManager被驗證
String user = null;
try {
user = UserGroupInformation.getCurrentUser().getShortUserName();
} catch (IOException ie) {
LOG.warn("Unable to get the current user.", ie);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
ie.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw RPCUtil.getRemoteException(ie);
}
//確認app是否被放在了rmContext中,如果是則響應
if (rmContext.getRMApps().get(applicationId) != null) {
LOG.info("This is an earlier submitted application: " + applicationId);
return SubmitApplicationResponse.newInstance();
}
//判斷任務隊列
if (submissionContext.getQueue() == null) {
submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
}
//判斷是否是無效的任務名稱
if (submissionContext.getApplicationName() == null) {
submissionContext.setApplicationName(
YarnConfiguration.DEFAULT_APPLICATION_NAME);
}
//任務類型判斷
if (submissionContext.getApplicationType() == null) {
submissionContext
.setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
} else {
if (submissionContext.getApplicationType().length() >
YarnConfiguration.APPLICATION_TYPE_LENGTH) {
submissionContext.setApplicationType(submissionContext
.getApplicationType().substring(0,
YarnConfiguration.APPLICATION_TYPE_LENGTH));
}
}
try {
// call RMAppManager to submit application directly
//讓RMAppManager立即提交應用
//關於ApplicationManager大家可以參考我基礎組件分析的那一章節
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
LOG.info("Exception in submitting application with id " +
applicationId.getId(), e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw e;
}
SubmitApplicationResponse response = recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
return response;
}
5. 從上列的代碼中可知,我們應該進到rmAppManager.submitApplication()去分析,該方法內部有一個createAndPopulateNewRMApp(),我們來看一下
private RMAppImpl createAndPopulateNewRMApp(
ApplicationSubmissionContext submissionContext, long submitTime,
String user, boolean isRecovery) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
//檢查AM的請求,對資源做檢查
ResourceRequest amReq =
validateAndCreateResourceRequest(submissionContext, isRecovery);
// Create RMApp
//此處封裝了一個狀態機,這是yarn的一個重大機制,每個服務隨着狀態不斷改變而做出操作
RMAppImpl application =
new RMAppImpl(applicationId, rmContext, this.conf,
submissionContext.getApplicationName(), user,
submissionContext.getQueue(),
submissionContext, this.scheduler, this.masterService,
submitTime, submissionContext.getApplicationType(),
submissionContext.getApplicationTags(), amReq);
return application;
}
6. 最後提交了一個START事件,上文我們可以知道new了一個RMAppImpl,這裏就是觸發它的狀態機對應事件,這段代碼的意思是處理START事件,任務狀態從NEW轉換到NEW_SAVING並,觸發了RMAppNewlySavingTransition轉換
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
7. 那很明顯,我們要看RMAppNewlySavingTransition(),點進去看內部的代碼很簡單,代碼
private static final class RMAppNewlySavingTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
///如果恢復配置被啓用,那麼將應用程序信息存儲在非阻塞調用中,
// 因此要確保RM已經存儲了在RM重新啓動後能夠重啓AM所需的信息,而無需再次與客戶端通信
LOG.info("Storing application with id " + app.applicationId);
app.rmContext.getStateStore().storeNewApplication(app);
}
}
8. 那我們來看這個存儲App信息的方法做了些什麼,觸發了一個STORE_APP事件,由StoreAppTransition處理
public void storeNewApplication(RMApp app) {
ApplicationSubmissionContext context = app.getApplicationSubmissionContext();
assert context instanceof ApplicationSubmissionContextPBImpl;
ApplicationStateData appState =
ApplicationStateData.newInstance(
app.getSubmitTime(), app.getStartTime(), context, app.getUser());
dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
}
9. 可以看到內部有一段代碼,通知rm提交APP_NEW_SAVED事件,這個事件由AddApplicationToSchedulerTransition
處理
try {
store.storeApplicationStateInternal(appId, appState);
store.notifyApplication(new RMAppEvent(appId,
RMAppEventType.APP_NEW_SAVED));
} catch (Exception e) {
LOG.error("Error storing app: " + appId, e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
10. 這個事件很簡單從表面意思也能讀懂,就是將應用程序交個調度器去處理,所以他提交了一個APP_ADDED事件,我們分析默認調度器Capatity Scheduler,所以此時就去看Capatity中的代碼
case APP_ADDED:
{
AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
String queueName =
resolveReservationQueueName(appAddedEvent.getQueue(),
appAddedEvent.getApplicationId(),
appAddedEvent.getReservationID());
if (queueName != null) {
if (!appAddedEvent.getIsAppRecovering()) {
//這裏是告知隊列有任務提交了,隊列會統計任務數量
addApplication(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
} else {
addApplicationOnRecovery(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
}
}
// 提交了APP_ACCEPTED事件
queue.getMetrics().submitApp(user);
SchedulerApplication<FiCaSchedulerApp> application =
new SchedulerApplication<FiCaSchedulerApp>(queue, user);
applications.put(applicationId, application);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queueName);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
}
11. 從上面的代碼可以看出,Capatity調度器提交了事件APP_ACCEPTED,狀態從SUBMITTED轉成了ACCEPTED並觸發
StartAppAttemptTransition()
.addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())
12. 這個類在內部創建了一個新的RMAppAttempt,然後提交了事件RMAppAttemptEventType.START,觸發了AttemptStartedTransition(),很明顯這個類對象使我們剛new出來的,那匹配的狀態機的初始狀態就是NEW,現在由於提交了START事件,狀態變爲了SUBMITTED
// Transitions from NEW State
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
RMAppAttemptEventType.START, new AttemptStartedTransition())
13. 那我們來看看這個方法AttemptStartedTransition(),其實我們要看一下registerAppAttempt方法
//註冊AM的service
appAttempt.masterService
.registerAppAttempt(appAttempt.applicationAttemptId);
// Add the applicationAttempt to the scheduler and inform the scheduler
// whether to transfer the state from previous attempt.
appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));
14. 我們看到把response設置成了-1,這個Id會在AM後面每次的通信中自增,會藉助這個id來判斷請求是重複請求還是新的請求,還是舊的請求。
AllocateResponse response =
recordFactory.newRecordInstance(AllocateResponse.class);
// set response id to -1 before application master for the following
// attemptID get registered
response.setResponseId(-1);
LOG.info("Registering app attempt : " + attemptId);
responseMap.put(attemptId, new AllocateResponseLock(response));
rmContext.getNMTokenSecretManager().registerApplicationAttempt(attemptId);
15. 回到AttemptStartedTransition()方法中,最後它提交了一個事件SchedulerEventType.APP_ATTEMPT_ADDED,這個事件交回給Capatity調度器去處理
case APP_ATTEMPT_ADDED:
{
AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
(AppAttemptAddedSchedulerEvent) event;
addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
appAttemptAddedEvent.getTransferStateFromPreviousAttempt(),
appAttemptAddedEvent.getIsAttemptRecovering());
}
16. 那我們自然是進入addApplicationAttempt方法去分析,內部我選了部分代碼做分析,下面這段new了一個FiCaSchedulerApp,在內部設置了AM啓動資源信息
FiCaSchedulerApp attempt =
new FiCaSchedulerApp(applicationAttemptId, application.getUser(),
queue, queue.getActiveUsersManager(), rmContext);
17. 設置完後提交了RMAppAttemptEventType.ATTEMPT_ADDED事件
rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptEvent(applicationAttemptId,
RMAppAttemptEventType.ATTEMPT_ADDED));
這裏的意思是提交了ATTEMPT_ADDED事件使得狀態從SUBMITTED轉變,轉變的結果可能有LAUNCHED_UNMANAGED_SAVING或者SCHEDULED,而後狀態機會根據返回的不同狀態信息再做處理
.addTransition(RMAppAttemptState.SUBMITTED,
EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.ATTEMPT_ADDED,
new ScheduleTransition())
18. 我們接着分析ScheduleTransition(),if入口的開關subCtx.getUnmanagedAM()是獲取RM是否應該管理AM的執行。如果爲真,那麼RM將不會爲AM分配一個容器並啓動它,默認是false。那很明顯我們這裏要返回的狀態是SCHEDULED
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
//獲取RM是否應該管理AM的執行。如果爲真,那麼RM將不會爲AM分配一個容器並啓動它,默認是false
if (!subCtx.getUnmanagedAM()) {
//在創建新的嘗試之前需要重置容器,因爲這個請求將被傳遞給調度器,調度器將在AM容器分配後扣除這個數字
appAttempt.amReq.setNumContainers(1);
appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
/* 表示爲任一機器 */
appAttempt.amReq.setResourceName(ResourceRequest.ANY);
appAttempt.amReq.setRelaxLocality(true);
//調度器分配資源
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
Collections.singletonList(appAttempt.amReq),
EMPTY_CONTAINER_RELEASE_LIST, null, null);
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
19. 上面的代碼中應該能看到一行令人振奮的代碼appAttempt.scheduler.allocate(),這裏做的是資源的調度,我們這不做詳細的分析,在後文AM申請資源時也會調用這個接口申請剩下的Container,後文會有詳細的介紹,我們剛剛知道了上文返回了SCHEDULED狀態,之前添加轉換的方法是會根據返回的狀態形成新的轉換,這個時候就會調用到下面這個轉換,觸發了AMContainerAllocatedTransition()
.addTransition(RMAppAttemptState.SCHEDULED,
EnumSet.of(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.CONTAINER_ALLOCATED,
new AMContainerAllocatedTransition())
20. 具體的分析見代碼塊,發現在這裏也調用了allocate,但是傳入沒有傳入請求,在allocate方法中做了判斷的,如果傳入的空的請求就是去嘗試獲取之前申請過的容器,而不是再做一次資源調度
// Acquire the AM container from the scheduler.
//從調度器獲取AM容器
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,
null);
//至少分配一個容器,因爲一個container_allocation是在構建一個RMContainer之後發出的,
// 並將其放入到requerapplication # newallocatedcontainers中。
// 注意,YarnScheduler#分配不能保證能夠獲取它,
// 因爲由於某些原因(如DNS不可用導致未生成容器令牌)容器可能無法獲取。
// 因此,我們返回到以前的狀態並繼續重試,直到獲取am容器。
if (amContainerAllocation.getContainers().size() == 0) {
appAttempt.retryFetchingAMContainer(appAttempt);
return RMAppAttemptState.SCHEDULED;
}
// Set the masterContainer
appAttempt.setMasterContainer(amContainerAllocation.getContainers()
.get(0));
RMContainerImpl rmMasterContainer = (RMContainerImpl)appAttempt.scheduler
.getRMContainer(appAttempt.getMasterContainer().getId());
rmMasterContainer.setAMContainer(true);
// NMTokenSecrentManager中的節點集用於標記該節點是否已向AM發出NMToken。
// 當AM容器分配給RM本身時,分配這個AM容器的節點被標記爲已經發送的NMToken。
// 因此,清除這個節點集,以便以下來自AM的分配請求能夠檢索相應的NMToken。
appAttempt.rmContext.getNMTokenSecretManager()
.clearNodeSetForAttempt(appAttempt.applicationAttemptId);
appAttempt.getSubmissionContext().setResource(
appAttempt.getMasterContainer().getResource());
appAttempt.storeAttempt();
return RMAppAttemptState.ALLOCATED_SAVING;
21. 我們看到最終返回了ALLOCATED_SAVING,與之前一樣根據返回的狀態觸發另一個事件
.addTransition(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.ALLOCATED,
RMAppAttemptEventType.ATTEMPT_NEW_SAVED, new AttemptStoredTransition())
這個事件終於看到了啓動的方法,launchAttempt()這個方法內部提交了一個LAUNCH事件
private static final class AttemptStoredTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
appAttempt.registerClientToken();
appAttempt.launchAttempt();
}
}
22. 走到這,我們終於發現了令人振奮的類ApplicationMasterLauncher,剛剛提交了LAUNCH事件,自然走launch()方法
AMLauncherEventType event = appEvent.getType();
RMAppAttempt application = appEvent.getAppAttempt();
switch (event) {
case LAUNCH:
launch(application);
break;
case CLEANUP:
cleanup(application);
break;
default:
break;
}
23. 在這裏我們首先要分析一下ApplicationMasterLauncher的初始化和啓動,這個屬於RM的子服務,那在Yarn源碼剖析(一) --- RM與NM服務啓動以及心跳通信我們也提到過,RM會逐一初始化和啓動它的子服務,很明顯這裏最重要的是啓動了一個線程用來處理相關的事件,那我們來看一下線程的run方法
@Override
protected void serviceInit(Configuration conf) throws Exception {
int threadCount = conf.getInt(
YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
ThreadFactory tf = new ThreadFactoryBuilder()
.setNameFormat("ApplicationMasterLauncher #%d")
.build();
launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
launcherPool.setThreadFactory(tf);
Configuration newConf = new YarnConfiguration(conf);
newConf.setInt(CommonConfigurationKeysPublic.
IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SOCKET_TIMEOUTS_KEY,
conf.getInt(YarnConfiguration.RM_NODEMANAGER_CONNECT_RETIRES,
YarnConfiguration.DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES));
setConfig(newConf);
super.serviceInit(newConf);
}
@Override
protected void serviceStart() throws Exception {
launcherHandlingThread.start();
super.serviceStart();
}
可以看到run方法是逐一從masterEvents隊列中取出事件進行處理
while (!this.isInterrupted()) {
Runnable toLaunch;
try {
toLaunch = masterEvents.take();
launcherPool.execute(toLaunch);
} catch (InterruptedException e) {
LOG.warn(this.getClass().getName() + " interrupted. Returning.");
return;
}
}
24. 這個時候我們回到之前的lunch()方法,很明顯,內部調用了createRunnableLauncher,new了一個AMLauncher,並傳入 AMLauncherEventType.LAUNCH事件,最後由ApplicationMasterLauncher線程來處理
private void launch(RMAppAttempt application) {
Runnable launcher = createRunnableLauncher(application,
AMLauncherEventType.LAUNCH);
masterEvents.add(launcher);
}
protected Runnable createRunnableLauncher(RMAppAttempt application,
AMLauncherEventType event) {
Runnable launcher =
new AMLauncher(context, application, event, getConfig());
return launcher;
}
25. 那就會觸發AMLauncher的run方法,裏面有一個lunch()方法,以及提交了一個事件RMAppAttemptEventType.LAUNCHED,這個事件的提交是爲了啓動AM監控線程的,所以就不做分析了,重點來看lunch()方法
case LAUNCH:
try {
LOG.info("Launching master" + application.getAppAttemptId());
launch();
handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
RMAppAttemptEventType.LAUNCHED));
}
26. 這裏終於取出了隨spark-submit傳入的啓動AM的上下文,並放在了StartContainerRequest請求中,然後利用調用了startContainers方法
ContainerLaunchContext launchContext =
createAMContainerLaunchContext(applicationContext, masterContainerID);
StartContainerRequest scRequest =
StartContainerRequest.newInstance(launchContext,
masterContainer.getContainerToken());
List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();
list.add(scRequest);
StartContainersRequest allRequests =
StartContainersRequest.newInstance(list);
StartContainersResponse response =
containerMgrProxy.startContainers(allRequests);
27. 終於開始啓動AM所在的Container了,這裏由ContainerManagerImpl實現,首先內部做了一些校驗,執行了關鍵方法startContainerInternal(nmTokenIdentifier, containerTokenIdentifier, request);這段代碼非常的多,所以也只選取關鍵的部分,我們看到它提交了事件INIT_APPLICATION。跟着代碼看進去,發現最終調用了RequestResourcesTransition()方法,我們這裏不分
析資源本地化的特性,有興趣瞭解的可以自己查閱相關的資料,這個方法的篇幅很長,所以我選了關鍵的代碼來分析,container.sendLaunchEvent()內部提交了ContainersLauncherEventType.LAUNCH_CONTAINER事件,這個事件交由ContainerLuncher類來處理
container.sendLaunchEvent();
container.metrics.endInitingContainer();
return ContainerState.LOCALIZED;
28. containerLuncher是一個線程池對線,所以這裏非常清楚的看到,new了一個ContainerLuncher對線交由線程池來處理,這裏再提一下,前文也涉及到過,spark自己封裝的AM啓動上下文就是在這裏傳進去來啓動AM的
case LAUNCH_CONTAINER:
Application app =
context.getApplications().get(
containerId.getApplicationAttemptId().getApplicationId());
ContainerLaunch launch =
new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
event.getContainer(), dirsHandler, containerManager);
containerLauncher.submit(launch);
running.put(containerId, launch);
break;
29. 那到這,AM的啓動基本就結束了,關於我們ContainerLuncher線程到底做了什麼,大家可以自己去看內部的call()方法,這裏我也不做贅述了。
總結
本文講述了AM啓動的全過程,內部的代碼真的很複雜,也涉及到許多別的模塊的的東西,蛋撻在這並沒有全部分析,如果要統籌分析會使得思路變得混亂,對於一些蛋撻感興趣的模塊如狀態機、rpc通信這些,在後續Yarn的研究中也會慢慢的學習的。後文將要介紹AM是如何註冊到RM上,以及AM申請Container和Container的啓動。
作者:蛋撻
日期:2018.08.28