Job提交到Yarn過程詳解

主要組件介紹:

     Yarn是個資源管理,任務調度的框架,主要包括三大模塊:ResouceManager,NodeManager,ApplicationMaster

     ResouceManager:資源管理器,整個集羣資源的協調者,調度者,管理者

     NodeManagerNM是每個節點上的資源和任務管理器。它會定時地向RM彙報本節點上的資源使用情況和各個Container的運行狀態;同時會接收並處理來自AM的Container 啓動/停止等請求。

      ApplicationMaster負責應用的監控,跟蹤應用執行狀態,重啓失敗任務等

      ContainerContainer是YARN中的資源抽象,它封裝了某個節點上的多維度資源,如內存、CPU、磁盤、網絡等,當AM向RM申請資源時,RM爲AM返回的資源便是用Container 表示的。 YARN會爲每個任務分配一個Container且該任務只能使用該Container中描述的資源。

MR提教導yarn運行過程:

     1:當我們在Client端提交任務,WaitforCompelet(true)方法中會調用submit()方法,根據所傳參數,決定是否開啓饒舌模式,週期性地報告作業進展,否則週期性得詢問作業是否完成


     2:進入submit方法中,我們可以看到submit方法主要幹四件事

              1:確定沒有重複提交

              2:根據配置信息確定是否採用新的API

              3:與集羣的連接,調用connect方法,創建Cluster對象cluster,早cluster這個類的構造方法中調用了initialize方法,根據系統配置創建用戶所要求的ClientProtocol,無非就是LocalRunner還是YarnRunner


                4:生成submitter對象調用submitJobInternal方法




                        4:傳入cluster參數生成了JobSubmitter對象submitter,因此submitter也獲得了遠程與RM通信的能力。

             5:最終submit這個方法return 的 submitteer.submitJobInternal,後續的job資源上傳到hdfs中都是在這個方法搞定的


    3:我們進入submitteer.submitJobInternal()裏

JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {

    //validate the jobs output specs 
    checkSpecs(job);                //檢查輸出格式等配置的合理性

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //獲取目錄路徑
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //獲取本節點的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本節點ip地址的字符串形式
      submitHostName = ip.getHostName();     //本節點名稱
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一個作業ID
    job.setJobID(jobId);    //將作業id寫入job對象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //獲取與namenode打交道所需的證件
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map與reduce數據流動的密碼
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);    //將可執行文件之類拷貝到HDFS中

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片,以切片的數量決定mapper數量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作業調度隊列默認爲default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //將conf的內容寫入一個xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //獲取作業提交狀態
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失敗拋出異常
      }
    } finally {
      if (status == null) {           //如果提交失敗了,要把之前穿件的目錄刪除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  

      }
    }
  }
裏面比較重要的兩個方法主要在於下面這兩個
copyAndConfigureFiles(job, submitJobDir  //將可執行文件之類拷貝到HDFS中
writeSplits(job, submitJobDir);    //生成切片,以切片的數量決定mapper數量

我們先到第一個方法中看一下它到底幹了什麼,都把那些文件拷貝到了hdfs中


再次進入uploadFiles()這個方法裏面

public void uploadFiles(Job job, Path submitJobDir) throws IOException {
  Configuration conf = job.getConfiguration();
  short replication =                                                  //獲取副本數,默認是10
      (short) conf.getInt(Job.SUBMIT_REPLICATION,
          Job.DEFAULT_SUBMIT_REPLICATION);

  if (!(conf.getBoolean(Job.USED_GENERIC_PARSER, false))) {
    LOG.warn("Hadoop command-line option parsing not performed. "
        + "Implement the Tool interface and execute your application "
        + "with ToolRunner to remedy this.");
  }

  // get all the command line arguments passed in by the user conf
  String files = conf.get("tmpfiles");      //下面這四個文件就是此方法要上傳的內容了每樣上傳10份,當然下面還有很多代碼。。。。
  String libjars = conf.get("tmpjars");
  String archives = conf.get("tmparchives");
  String jobJar = job.getJar();

我們在進入writeSplits這個方法內,看一下切片到底如何劃分的


下面我們在進入writeNewSplits方法,發現InputFormat對象的input調用的getsplits方法是FileInputFormat類重寫的方法,最後又調用了computeSplitSize方法,返回塊的大小.

protected long computeSplitSize(long blockSize, long minSize,
                                maxSize) {
  return Math.max(minSize, Math.min(maxSize, blockSize)); //maxSize設置的大小,blockSize=128,minSize=1
}

最後將10份file,jobjars,libjars,archieves和切片信息全上傳到了hdfs中,上傳的目錄是在stagingDir目錄下,建一個以jobid爲文件名的目錄。

    4:向ResouceManage(RM) submit application

    5:RM收到submitAoolication()消息後,變將 請求傳遞給Yarn調度器根據application,RM調用start container,調度器分配一個容易,,然後RM 在節點管理器的管理下在容器中啓動application master的進程

  6:    MRAppMaster進行initializeJob

  7:MRAppMaster從hdfs中調用retrieve inputsplits獲取切片信息

  8:MRAppMaster向ResourceManager調用allocateResource方法,獲取資源

 9:MRAppMaster開啓一個容器,在NodeManager的管理下在容器中啓動一個taskJVM,在jvm中啓動一個YarnChild

 10:YarnChild從hdfs中獲取jobresources

 11:run  maptask,默認map完成5%,開啓reduceTask,但是這個可以調,通常設置80%

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章