Job提交到Yarn過程詳解

主要組件介紹：

Yarn是個資源管理，任務調度的框架，主要包括三大模塊：ResouceManager，NodeManager，ApplicationMaster

ResouceManager：資源管理器，整個集羣資源的協調者，調度者，管理者

NodeManager：NM是每個節點上的資源和任務管理器。它會定時地向RM彙報本節點上的資源使用情況和各個Container的運行狀態；同時會接收並處理來自AM的Container 啓動/停止等請求。

ApplicationMaster：負責應用的監控，跟蹤應用執行狀態，重啓失敗任務等

Container：Container是YARN中的資源抽象，它封裝了某個節點上的多維度資源，如內存、CPU、磁盤、網絡等，當AM向RM申請資源時，RM爲AM返回的資源便是用Container 表示的。 YARN會爲每個任務分配一個Container且該任務只能使用該Container中描述的資源。

MR提教導yarn運行過程：

1：當我們在Client端提交任務，WaitforCompelet（true）方法中會調用submit（）方法，根據所傳參數，決定是否開啓饒舌模式，週期性地報告作業進展，否則週期性得詢問作業是否完成

2：進入submit方法中，我們可以看到submit方法主要幹四件事

1：確定沒有重複提交

2：根據配置信息確定是否採用新的API

3：與集羣的連接，調用connect方法，創建Cluster對象cluster，早cluster這個類的構造方法中調用了initialize方法，根據系統配置創建用戶所要求的ClientProtocol，無非就是LocalRunner還是YarnRunner

4：生成submitter對象調用submitJobInternal方法

4：傳入cluster參數生成了JobSubmitter對象submitter，因此submitter也獲得了遠程與RM通信的能力。

5：最終submit這個方法return 的 submitteer.submitJobInternal，後續的job資源上傳到hdfs中都是在這個方法搞定的

3：我們進入submitteer.submitJobInternal（）裏

JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {

    //validate the jobs output specs 
    checkSpecs(job);                //檢查輸出格式等配置的合理性

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //獲取目錄路徑
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //獲取本節點的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本節點ip地址的字符串形式
      submitHostName = ip.getHostName();     //本節點名稱
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一個作業ID
    job.setJobID(jobId);    //將作業id寫入job對象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //獲取與namenode打交道所需的證件
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map與reduce數據流動的密碼
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);    //將可執行文件之類拷貝到HDFS中

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片，以切片的數量決定mapper數量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作業調度隊列默認爲default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //將conf的內容寫入一個xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //獲取作業提交狀態
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失敗拋出異常
      }
    } finally {
      if (status == null) {           //如果提交失敗了，要把之前穿件的目錄刪除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  

      }
    }
  }

裏面比較重要的兩個方法主要在於下面這兩個

copyAndConfigureFiles(job, submitJobDir  //將可執行文件之類拷貝到HDFS中

writeSplits(job, submitJobDir);    //生成切片，以切片的數量決定mapper數量

我們先到第一個方法中看一下它到底幹了什麼，都把那些文件拷貝到了hdfs中

再次進入uploadFiles（）這個方法裏面

public void uploadFiles(Job job, Path submitJobDir) throws IOException {
  Configuration conf = job.getConfiguration();
  short replication =                                                  //獲取副本數，默認是10
      (short) conf.getInt(Job.SUBMIT_REPLICATION,
          Job.DEFAULT_SUBMIT_REPLICATION);

  if (!(conf.getBoolean(Job.USED_GENERIC_PARSER, false))) {
    LOG.warn("Hadoop command-line option parsing not performed. "
        + "Implement the Tool interface and execute your application "
        + "with ToolRunner to remedy this.");
  }

  // get all the command line arguments passed in by the user conf
  String files = conf.get("tmpfiles");      //下面這四個文件就是此方法要上傳的內容了每樣上傳10份，當然下面還有很多代碼。。。。
  String libjars = conf.get("tmpjars");
  String archives = conf.get("tmparchives");
  String jobJar = job.getJar();

我們在進入writeSplits這個方法內，看一下切片到底如何劃分的

下面我們在進入writeNewSplits方法，發現InputFormat對象的input調用的getsplits方法是FileInputFormat類重寫的方法，最後又調用了computeSplitSize方法，返回塊的大小.

protected long computeSplitSize(long blockSize, long minSize,
                                maxSize) {
  return Math.max(minSize, Math.min(maxSize, blockSize)); //maxSize設置的大小，blockSize=128，minSize=1
}

最後將10份file，jobjars，libjars，archieves和切片信息全上傳到了hdfs中，上傳的目錄是在stagingDir目錄下，建一個以jobid爲文件名的目錄。

4：向ResouceManage（RM） submit application

5：RM收到submitAoolication（）消息後，變將請求傳遞給Yarn調度器根據application，RM調用start container，調度器分配一個容易，，然後RM 在節點管理器的管理下在容器中啓動application master的進程

6： MRAppMaster進行initializeJob

7：MRAppMaster從hdfs中調用retrieve inputsplits獲取切片信息

8：MRAppMaster向ResourceManager調用allocateResource方法，獲取資源

9：MRAppMaster開啓一個容器，在NodeManager的管理下在容器中啓動一個taskJVM，在jvm中啓動一個YarnChild

10：YarnChild從hdfs中獲取jobresources

11：run maptask，默認map完成5%，開啓reduceTask,但是這個可以調，通常設置80%

Job提交到Yarn過程詳解

主要組件介紹：

MR提教導yarn運行過程：

GreenPlum5.20.1在Centos7上的離線安裝

MongoDB 在Centos7下安裝部署。

HDFS-NFS

Zeppelin求學之路（3）—Zeppelin基本模塊介紹和Paragraph源碼深入瞭解以及Note,NoteBook 簡介，

Job提交到Yarn過程詳解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結