Hadoop之JobTracker源碼分析

JobTracker源碼分析

前言

JobTracker是Hadoop中的一個重要角色，負責任務的調度和分配，和client端的任務提交也有關係，這次主要分析JobTracker中JobTracker和TaskTracker心跳機制在JobTracker這端的詳細實現過程以及client提交的任務是如何被處理然後分配給TaskTracker的。

JobTracker啓動

在hadoop-1.1.2的src/mapred裏面的org.apache.hadoop.mapred裏面可以找到JobTracker.java，說到啓動，JobTracker是以一個單獨的進程運行的，因此可以發現JobTracker.java裏面是有主函數的，進入到main()函數

 public static void main(String argv[]
                          ) throws IOException, InterruptedException {
//打印一條啓動的日誌
    StringUtils.startupShutdownMessage(JobTracker.class, argv, LOG);
    try {
      if(argv.length == 0) {
        JobTracker tracker = startTracker(new JobConf());　//核心的啓動函數，在裏面實例了幾個重要對象
        tracker.offerService();　//將某些重要對象啓動，作爲rpcserver或者httpserver
      }
      else {
        if ("-dumpConfiguration".equals(argv[0]) && argv.length == 1) {
          dumpConfiguration(new PrintWriter(System.out));
        }
        else {
          System.out.println("usage: JobTracker [-dumpConfiguration]");
          System.exit(-1);
        }
      }
    } catch (Throwable e) {
      LOG.fatal(StringUtils.stringifyException(e));
      System.exit(-1);
    }
  }

進入到startTracker函數，經過幾次默認參數的調用最後到達的startTracker如下

  public static JobTracker startTracker(JobConf conf, String identifier, boolean initialize) 
  throws IOException, InterruptedException {
    DefaultMetricsSystem.initialize("JobTracker");
    JobTracker result = null;
    while (true) {
      try {
        result = new JobTracker(conf, identifier); //JobTracker構造函數，其中identifier是通過日期產生的
        result.taskScheduler.setTaskTrackerManager(result);　//將taskScheduler的taskTrackerManager設置爲JobTracker自身，這個
　　　　　　　　　　　　　　　　　　　　　　　　　　　//的作用在後面的代碼中會說明
        break;
      } catch (VersionMismatch e) {
        throw e;
      } catch (BindException e) {
        throw e;
      } catch (UnknownHostException e) {
        throw e;
      } catch (AccessControlException ace) {
        // in case of jobtracker not having right access
        // bail out
        throw ace;
      } catch (IOException e) {
        LOG.warn("Error starting tracker: " + 
                 StringUtils.stringifyException(e));
      }
      Thread.sleep(1000);
    }
    if (result != null) {
      JobEndNotifier.startNotifier();　//開啓一個Job結束的通知器，其實內部是開一個線程觀察一個BlockingQueue<jobEndStatusInfo>的隊列，當TaskTracker通過RPC告訴JobTracker Job的運行狀態後，如果結束就會在某個地方往這個隊列裏面加入任務的結束狀態信息，然後
該觀察線程就會發出HttpNotification，而接受這個Notification的應該是JobTracker內部的一個HttpServer
      MBeans.register("JobTracker", "JobTrackerInfo", result);
      if(initialize == true) {
        result.setSafeModeInternal(SafeModeAction.SAFEMODE_ENTER);
        result.initializeFilesystem();　//初始化文件系統，其實就是通過`return FileSystem.get(conf);`返回一個文件系統類，內部有一個緩存，首先判斷緩存中是否有Key（uri,conf）對象的文件系統類，有就直接返回，否則通過反射生成一個。uri是conf中fs.defult.name的屬性值，如果該值沒有設置則使用file:///
        result.setSafeModeInternal(SafeModeAction.SAFEMODE_LEAVE);
        result.initialize(); //初始化，主要是對JobHistory做了初始化，然後設置了httpserver的屬性，並且開啓了一個HDFS monitro線程，這裏不是這次的關注點，可以略過
      }
    }
    return result;
  }

上個代碼段中很重要的一個調用就是JobTracker的構造函數，幾個重要對象例如taskScheduler和interTrackerServer以及HttpServer都是在裏面實例化的。

Class<? extends TaskScheduler> schedulerClass
      = conf.getClass("mapred.jobtracker.taskScheduler",
          JobQueueTaskScheduler.class, TaskScheduler.class); //其實就是看conf裏面有沒有配置不同的TaskScheduler，系統默認的是採用
                                                                                            //JobQueueTaskScheduler，這個也是這次講的時候也以它爲例說明
    taskScheduler = (TaskScheduler) ReflectionUtils.newInstance(schedulerClass, conf);　//通過反射實例化taskScheduler
 int handlerCount = conf.getInt("mapred.job.tracker.handler.count", 10);　// Handler的個數，默認10個，Handler後面會降到，這裏先略過
    this.interTrackerServer = 
      RPC.getServer(this, addr.getHostName(), addr.getPort(), handlerCount, 
          false, conf, secretManager);　　//interTrackerServer是一個RPC　server，它內部有listener, reader ,responder, handler這樣幾個比較重要的對象，用來接受處理和響應RPC請求。
infoServer = new HttpServer("job", infoBindAddress, tmpInfoPort, 
        tmpInfoPort == 0, conf, aclsManager.getAdminsAcl());　//HttpServer，也就是通過50070端口可以訪問到JObTracker信息的server
    infoServer.setAttribute("job.tracker", this);　　　　　　　　　　//不是這次的說明範圍內，大致知道它是一個httpserver就行了
    infoServer.addServlet("reducegraph", "/taskgraph", TaskGraphServlet.class);　
    infoServer.start();

JobTracker中的幾個重要對象

taskScheduler：任務調度器，這裏以默認的JobQueueTaskScheduler來說明其作用，JobQueueTaskScheduler內部會初始化JobQueueJobInProgressListener和EagerTaskInitializationListener,其中EagerTaskInitializationListener會開啓內部的一個初始化線程監聽JobInitQueue，然後如果發現隊列裏面有待初始化的Job則取出然後調用了ttm(task tracker manager)的initJob函數，其實會發現這個ttm就是jobtracker本身，這樣就會調用jobtracker的initJob函數。而jobQueueJobInProgressListener則是在jobtracker接受到client端的submitjob請求的時候會調用jobAdded添加job到jobQUeue中然後在tasktracker與jobtracker的心跳中，分發任務的時候會從jobQUeue中取出來。

intertrackerServer：其實就是一個RPC的server，但是裏面通過了listener、reader、handler、responder這幾個對象和NIO實現了非阻塞異步IO。其中listener主要用來監聽do_accept,reader監聽do_read，handler處理rpc請求內容，responder負責根據調用結果返回給client

HttpServer: 這個就是jobtracker web接口的server,這裏沒有具體去分析。

intertackerServer分析

intertrackerServer的啓動在offerService函數中，intertrackerServer.start()，其實就是開啓一個rpc　server。要理解intertackerServer幹了些什麼，這個rpc server是如何接受和處理請求的，得從intertrackerServer = RPC.getServer看起

 public static Server getServer(final Object instance, final String bindAddress, final int port,
                                 final int numHandlers,
                                 final boolean verbose, Configuration conf) 
    throws IOException {
    return getServer(instance, bindAddress, port, numHandlers, verbose, conf, null);
  }
  /** Construct a server for a protocol implementation instance listening on a
   * port and address, with a secret manager. */
  public static Server getServer(final Object instance, final String bindAddress, final int port,
                                 final int numHandlers,
                                 final boolean verbose, Configuration conf,
                                 SecretManager<? extends TokenIdentifier> secretManager) 
    throws IOException {
    return new Server(instance, conf, bindAddress, port, numHandlers, verbose, secretManager);
  }
//這個Server函數是RPC類中的內部類Server，它的super類是ipc.Server
 public Server(Object instance, Configuration conf, String bindAddress,  int port,
                  int numHandlers, boolean verbose, 
                  SecretManager<? extends TokenIdentifier> secretManager) 
        throws IOException {
      super(bindAddress, port, Invocation.class, numHandlers, conf,
          classNameBase(instance.getClass().getName()), secretManager);　//第三個參數Incocation.class需要注意，它是rpc中的調用信息                                                                                                                //的存儲類，在這個調用中會實例化listener和responder
      this.instance = instance;
      this.verbose = verbose;
    }

其實可以看到，getServer最後會調用new Server生成一個rpcserver，new Server(instance, conf, bindAddress,port, numHandlers,verbose,secretManager)這個函數需要注意的是第一個參數instance,它其實就是jobtracker的實例，在RPC.Call函數中將要用到這個變量。

在jobTracker的 offerservice函數中會調用interTrackerServer.start函數，由於RPC.Server繼承了ipc.Server，其實就是調用ipc.Server的start方法如下：

  /** Starts the service.  Must be called before any calls will be handled. */
  public synchronized void start() {
    responder.start(); //開啓一個響應線程
    listener.start();　//開啓一個監聽和接收的線程
    handlers = new Handler[handlerCount];　//實際的處理線程
    for (int i = 0; i < handlerCount; i++) {　//handlerCount是從conf中獲得的
      handlers[i] = new Handler(i);
      handlers[i].start();
    }
  }

listener、responder、handler之間其實就是使用了selector機制+blockingqueue傳遞一些調用對象。
listener：

 public Listener() throws IOException {
      address = new InetSocketAddress(bindAddress, port); 
      // Create a new server socket and set to non blocking mode
      acceptChannel = ServerSocketChannel.open();　//打開serversocketChannel
      acceptChannel.configureBlocking(false);　//設置爲非阻塞，那麼connect,read,write都是非阻塞的
      // Bind the server socket to the local host and port
      bind(acceptChannel.socket(), address, backlogLength);　//綁定到地址
      port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port
      // create a selector;
      selector= Selector.open();　//初始化selector
      readers = new Reader[readThreads];　//初始化了reader，負責OP_READ事件的處理
      readPool = Executors.newFixedThreadPool(readThreads);　//對read操作使用了線程池
      for (int i = 0; i < readThreads; i++) {
        Selector readSelector = Selector.open();
        Reader reader = new Reader(readSelector);
        readers[i] = reader;
        readPool.execute(reader);
      }
      // Register accepts on the server socket with the selector.
      acceptChannel.register(selector, SelectionKey.OP_ACCEPT);　//註冊acceptChannel對ACCEPT事件感興趣
      this.setName("IPC Server listener on " + port);
      this.setDaemon(true);　//設置爲守護線程，當主線程結束時跟着結束
    }
//listener線程會負責accept事件，doAccept爲其處理accept事件的函數
 void doAccept(SelectionKey key) throws IOException,  OutOfMemoryError {
      Connection c = null;
      ServerSocketChannel server = (ServerSocketChannel) key.channel();
      SocketChannel channel;
      while ((channel = server.accept()) != null) {
        channel.configureBlocking(false);
        channel.socket().setTcpNoDelay(tcpNoDelay);
        Reader reader = getReader();　//獲取reader
        try {
          reader.startAdd();　　//會讓reader wait 然後在設置好channel的關注事件等等之後會通過finishAdd將其喚醒，執行read的操作
          SelectionKey readKey = reader.registerChannel(channel); //給channel註冊OP_READ事件
          c = new Connection(readKey, channel, System.currentTimeMillis());　//創建Connection對象
          readKey.attach(c);　//將connection綁定到readkey
          synchronized (connectionList) {
            connectionList.add(numConnections, c);
            numConnections++;
          }
          if (LOG.isDebugEnabled())
            LOG.debug("Server connection from " + c.toString() +
                "; # active connections: " + numConnections +
                "; # queued calls: " + callQueue.size());          
        } finally {
          reader.finishAdd(); 　//喚醒並讓reader進入到reader的事件處理過程中
        }
      }
    }

reader:

      Reader(Selector readSelector) {
        this.readSelector = readSelector;
      }
      public void run() {
        LOG.info("Starting SocketReader");
        synchronized (this) {
          while (running) {　//不停的循環執行
            SelectionKey key = null;
            try {
              readSelector.select(); //阻塞的監聽，當readSelector.wakeup被調用或者有一個channel有關注的事件發生才往下執行
              while (adding) {　//當listener調用startAdd的時候會設置adding爲true
                this.wait(1000);
              }              
              Iterator<SelectionKey> iter = readSelector.selectedKeys().iterator();
              while (iter.hasNext()) {
                key = iter.next();
                iter.remove();
                if (key.isValid()) {
                  if (key.isReadable()) {
                    doRead(key); //調用doRead讀取一些rpcHeader，做一些驗證和校驗以及反序列話ConnectionHeader。把call(id,param,connection)放到callQueue裏面（其中param是反序列化的Invocation對象），handler線程會監視callQueque然後從中取出call，然後調用
                  }
                }
                key = null;
              }
            } catch (InterruptedException e) {
              if (running) {                      // unexpected -- log it
                LOG.info(getName() + " caught: " +
                         StringUtils.stringifyException(e));
              }
            } catch (IOException ex) {
              LOG.error("Error in Reader", ex);
            }
          }
        }
      }

Handler

  @Override
    public void run() {
      LOG.info(getName() + ": starting");
      SERVER.set(Server.this);
      ByteArrayOutputStream buf = 
        new ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);
      while (running) {
        try {
          final Call call = callQueue.take(); // pop the queue; maybe blocked here
                                                           //讀取隊列中的call對象
          if (LOG.isDebugEnabled())
            LOG.debug(getName() + ": has #" + call.id + " from " +
                      call.connection);
          String errorClass = null;
          String error = null;
          Writable value = null;
          CurCall.set(call);
          try {
            // Make the call as the user via Subject.doAs, thus associating
            // the call with the Subject
            if (call.connection.user == null) {
              value = call(call.connection.protocol, call.param, 
                           call.timestamp);　//call.connection.protocol其實是在ConnectionHeader裏面得到的，在jobTracker和tasktracker調用的時候該prototcol是intertrackerServer.class，在client提交任務和jobTracker交互時是JobSubmissionProtocol
            } else {
              value = 
                call.connection.user.doAs
                  (new PrivilegedExceptionAction<Writable>() {
                     @Override
                     public Writable run() throws Exception {
                       // make the call
                       return call(call.connection.protocol, 
                                   call.param, call.timestamp);
                     }
                   }
                  );
            }
          } catch (Throwable e) {
            LOG.info(getName()+", call "+call+": error: " + e, e);
            errorClass = e.getClass().getName();
            error = StringUtils.stringifyException(e);
          }
          CurCall.set(null);
          synchronized (call.connection.responseQueue) {
            // setupResponse() needs to be sync'ed together with 
            // responder.doResponse() since setupResponse may use
            // SASL to encrypt response data and SASL enforces
            // its own message ordering.
            setupResponse(buf, call, 
                        (error == null) ? Status.SUCCESS : Status.ERROR, 
                        value, errorClass, error);　//　　　　//把value結果＋call.id＋status狀態序列化到一個ByteBuffer裏面，然後設置給call對象中的response對象
          // Discard the large buf and reset it back to 
          // smaller size to freeup heap
          if (buf.size() > maxRespSize) {
            LOG.warn("Large response size " + buf.size() + " for call " + 
                call.toString());
              buf = new ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);
            }
            responder.doRespond(call);　// 將執行的結果反饋給client端
          }
        } catch (InterruptedException e) {
          if (running) {                          // unexpected -- log it
            LOG.info(getName() + " caught: " +
                     StringUtils.stringifyException(e));
          }
        } catch (Exception e) {
          LOG.info(getName() + " caught: " +
                   StringUtils.stringifyException(e));
        }
      }
      LOG.info(getName() + ": exiting");
    }
  }
//被調用的Call函數
    public Writable call(Class<?> protocol, Writable param, long receivedTime) 
    throws IOException {
      try {
        Invocation call = (Invocation)param; //將從rpc通信中反序列化過來的Invocation對象向下轉型
        if (verbose) log("Call: " + call);
        Method method =
          protocol.getMethod(call.getMethodName(),
                                   call.getParameterClasses());　//通過反射獲得方法對象
        method.setAccessible(true);　　//設置爲可以訪問，setAccessible相關的內容可以參見我的另一篇博客java security manager分析
        long startTime = System.currentTimeMillis();
        Object value = method.invoke(instance, call.getParameters());　//通過java反射去調用執行jobTracker的rpc方法
        int processingTime = (int) (System.currentTimeMillis() - startTime);
        int qTime = (int) (startTime-receivedTime);
        if (LOG.isDebugEnabled()) {
          LOG.debug("Served: " + call.getMethodName() +
                    " queueTime= " + qTime +
                    " procesingTime= " + processingTime);
        }
        rpcMetrics.addRpcQueueTime(qTime);
        rpcMetrics.addRpcProcessingTime(processingTime);
        rpcMetrics.addRpcProcessingTime(call.getMethodName(), processingTime);
        if (verbose) log("Return: "+value);
        return new ObjectWritable(method.getReturnType(), value);　//調用完畢後的返回結果
      } catch (InvocationTargetException e) {
        Throwable target = e.getTargetException();
        if (target instanceof IOException) {
          throw (IOException)target;
        } else {
          IOException ioe = new IOException(target.toString());
          ioe.setStackTrace(target.getStackTrace());
          throw ioe;
        }
      } catch (Throwable e) {
        if (!(e instanceof IOException)) {
          LOG.error("Unexpected throwable object ", e);
        }
        IOException ioe = new IOException(e.toString());
        ioe.setStackTrace(e.getStackTrace());
        throw ioe;
      }
    }

Responder:
在前面的Handler的run裏面會調用responder.doResponde(call)，內部調用如下，使用了一個同步操作，

 void doRespond(Call call) throws IOException {
      synchronized (call.connection.responseQueue) {　//同步操作，將call放到一個隊列裏面然後調用processResponse來處理這個隊列，    　　　                                                                        因爲有多個handler所以要加synchronized同步關鍵詞，這個responseQueue其實主要是爲了當將執行的結果寫到對應的channel的時候沒有寫完，那麼會加入到這個隊列，然後由responder異步的去從隊列裏面去取，然後寫到對應的channel中
        call.connection.responseQueue.addLast(call);
        if (call.connection.responseQueue.size() == 1) {
          processResponse(call.connection.responseQueue, true);
        }
      }
    }
//responder的run函數，裏面主要是監聽channel是否可寫然後調用processResponse函數，processResponse函數其實就是將call.reponse這個對象寫出到對應的channel即可，裏面如果發生一次沒有寫完的情況，會通知responder去異步寫。如果返回的數據過大，會以很多chunk塊的方式去寫。
 @Override
    public void run() {
      LOG.info(getName() + ": starting");
      SERVER.set(Server.this);
      long lastPurgeTime = 0;   // last check for old calls.　　//引入了一個purge　interval時間，當一次circle也就是執行一遍，如果異步寫調用操作時間過長，或者有些channel長時間select沒有發現可寫狀態，那麼就會將這些連接關閉
      while (running) {
        try {
          waitPending();     // If a channel is being registered, wait.
          writeSelector.select(PURGE_INTERVAL);　//監聽可寫狀態，超時時間是purge_interval
          Iterator<SelectionKey> iter = writeSelector.selectedKeys().iterator();
          while (iter.hasNext()) {
            SelectionKey key = iter.next();
            iter.remove();
            try {
              if (key.isValid() && key.isWritable()) {
                  doAsyncWrite(key);   //調用異步寫函數，內部其實就是調用了processResponse
              }
            } catch (IOException e) {
              LOG.info(getName() + ": doAsyncWrite threw exception " + e);
            }
          }
          long now = System.currentTimeMillis();
          if (now < lastPurgeTime + PURGE_INTERVAL) {
            continue;
          }
          lastPurgeTime = now;
          //
          // If there were some calls that have not been sent out for a
          // long time, discard them.
          //
          LOG.debug("Checking for old call responses.");
          ArrayList<Call> calls;
          // get the list of channels from list of keys.
          synchronized (writeSelector.keys()) {
            calls = new ArrayList<Call>(writeSelector.keys().size());
            iter = writeSelector.keys().iterator();
            while (iter.hasNext()) {
              SelectionKey key = iter.next();
              Call call = (Call)key.attachment();
              if (call != null && key.channel() == call.connection.channel) { 
                calls.add(call);
              }
            }
          }
          for(Call call : calls) {
            try {
              doPurge(call, now);　//關閉那些超時的connection，會把connection裏面的channel關閉
            } catch (IOException e) {
              LOG.warn("Error in purging old calls " + e);
            }
          }
        } catch (OutOfMemoryError e) {
          //
          // we can run out of memory if we have too many threads
          // log the event and sleep for a minute and give
          // some thread(s) a chance to finish
          //
          LOG.warn("Out of Memory in server select", e);
          try { Thread.sleep(60000); } catch (Exception ie) {}
        } catch (Exception e) {
          LOG.warn("Exception in Responder " + 
                   StringUtils.stringifyException(e));
        }
      }
      LOG.info("Stopping " + this.getName());
    }

TaskScheduler分析

直接看JobQueueTaskScheduler吧

  public JobQueueTaskScheduler() {
    this.jobQueueJobInProgressListener = new JobQueueJobInProgressListener(); //初始化jobQueueJobInProgressListener，它主要監聽
來自job提交那邊的操作，它維護了一個jobQueue隊列，client那邊提交job的時候就會往裏面加入job。當然它還有jobUpdated，jobRemoved等方法。
  }
  @Override
  public synchronized void start() throws IOException {
    super.start();
    taskTrackerManager.addJobInProgressListener(jobQueueJobInProgressListener);
    eagerTaskInitializationListener.setTaskTrackerManager(taskTrackerManager);　//把tasktrackermanager設置爲jobTracker自身
    eagerTaskInitializationListener.start();　　//eagerTaskInitializationListener是用來初始化任務的，最後其實會調用jobTracker的initJOb函數。這裏就不具體貼代碼了
    taskTrackerManager.addJobInProgressListener(
        eagerTaskInitializationListener);
  }
//JobTracker的initJob函數，其實主要是其中直接調用了job.initTasks函數，其他的主要是job狀態更新之類的東西。
public void initJob(JobInProgress job) {
    if (null == job) {
      LOG.info("Init on null job is not valid");
      return;
    }
    try {
      JobStatus prevStatus = (JobStatus)job.getStatus().clone();
      LOG.info("Initializing " + job.getJobID());
      job.initTasks();　　//核心的初始化函數，裏面主要通過讀取metasplitinfo，然後根據map和reduce的數目生成nonRunningMapCache和nonRunningReduceCache之類的東西。
      // Inform the listeners if the job state has changed
      // Note : that the job will be in PREP state.
      JobStatus newStatus = (JobStatus)job.getStatus().clone();
      if (prevStatus.getRunState() != newStatus.getRunState()) {
        JobStatusChangeEvent event = 
          new JobStatusChangeEvent(job, EventType.RUN_STATE_CHANGED, prevStatus, 
              newStatus);
        synchronized (JobTracker.this) {
          updateJobInProgressListeners(event);
        }
      }
    } catch (KillInterruptedException kie) {
      //   If job was killed during initialization, job state will be KILLED
      LOG.error("Job initialization interrupted:\n" +
          StringUtils.stringifyException(kie));
      killJob(job);
    } catch (Throwable t) {
      String failureInfo = 
        "Job initialization failed:\n" + StringUtils.stringifyException(t);
      // If the job initialization is failed, job state will be FAILED
      LOG.error(failureInfo);
      job.getStatus().setFailureInfo(failureInfo);
      failJob(job);
    }
     }

job.initTasks是核心的初始化函數，裏面主要是實例化了NonRunningMapCache(Map類型)和NonRunningReduces(Set)，這些對象在tasktracker和jobTracker心跳分配任務的時候，jobTracker調用sheduler.assignTasks的時候會在內部使用這些對象然後根據TaskInprogress以及taskid生成Task，然後封裝成TaskAction然後發
送給TaskTracker

 */
  public synchronized void initTasks() 
  throws IOException, KillInterruptedException, UnknownHostException {
    if (tasksInited || isComplete()) {
      return;
    }
    synchronized(jobInitKillStatus){
      if(jobInitKillStatus.killed || jobInitKillStatus.initStarted) {
        return;
      }
      jobInitKillStatus.initStarted = true;
    }
    LOG.info("Initializing " + jobId);
    final long startTimeFinal = this.startTime;
    // log job info as the user running the job
    try {
    userUGI.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        JobHistory.JobInfo.logSubmitted(getJobID(), conf, jobFile, 
            startTimeFinal, hasRestarted());
        return null;
      }
    });
    } catch(InterruptedException ie) {
      throw new IOException(ie);
    }
    // log the job priority
    setPriority(this.priority);
    //
    // generate security keys needed by Tasks
    //
    generateAndStoreTokens();
    //
    // read input splits and create a map per a split
    //
    TaskSplitMetaInfo[] splits = createSplits(jobId);
    if (numMapTasks != splits.length) {
      throw new IOException("Number of maps in JobConf doesn't match number of " +
              "recieved splits for job " + jobId + "! " +
              "numMapTasks=" + numMapTasks + ", #splits=" + splits.length);
    }
    numMapTasks = splits.length;
    // Sanity check the locations so we don't create/initialize unnecessary tasks
    for (TaskSplitMetaInfo split : splits) {
      NetUtils.verifyHostnames(split.getLocations());
    }
    jobtracker.getInstrumentation().addWaitingMaps(getJobID(), numMapTasks);
    jobtracker.getInstrumentation().addWaitingReduces(getJobID(), numReduceTasks);
    this.queueMetrics.addWaitingMaps(getJobID(), numMapTasks);
    this.queueMetrics.addWaitingReduces(getJobID(), numReduceTasks);
    maps = new TaskInProgress[numMapTasks];
    for(int i=0; i < numMapTasks; ++i) {
      inputLength += splits[i].getInputDataLength();
      maps[i] = new TaskInProgress(jobId, jobFile, 
                                   splits[i], 
                                   jobtracker, conf, this, i, numSlotsPerMap);
    }
    LOG.info("Input size for job " + jobId + " = " + inputLength
        + ". Number of splits = " + splits.length);
    // Set localityWaitFactor before creating cache
    localityWaitFactor = 
      conf.getFloat(LOCALITY_WAIT_FACTOR, DEFAULT_LOCALITY_WAIT_FACTOR);
    if (numMapTasks > 0) { 
      nonRunningMapCache = createCache(splits, maxLevel);
    }
    // set the launch time
    this.launchTime = jobtracker.getClock().getTime();
    //
    // Create reduce tasks
    //
    this.reduces = new TaskInProgress[numReduceTasks];
    for (int i = 0; i < numReduceTasks; i++) {
      reduces[i] = new TaskInProgress(jobId, jobFile, 
                                      numMapTasks, i, 
                                      jobtracker, conf, this, numSlotsPerReduce);
      nonRunningReduces.add(reduces[i]);
    }
    // Calculate the minimum number of maps to be complete before 
    // we should start scheduling reduces
    completedMapsForReduceSlowstart = 
      (int)Math.ceil(
          (conf.getFloat("mapred.reduce.slowstart.completed.maps", 
                         DEFAULT_COMPLETED_MAPS_PERCENT_FOR_REDUCE_SLOWSTART) * 
           numMapTasks));
    // ... use the same for estimating the total output of all maps
    resourceEstimator.setThreshhold(completedMapsForReduceSlowstart);
    // create cleanup two cleanup tips, one map and one reduce.
    cleanup = new TaskInProgress[2];
    // cleanup map tip. This map doesn't use any splits. Just assign an empty
    // split.
    TaskSplitMetaInfo emptySplit = JobSplit.EMPTY_TASK_SPLIT;
    cleanup[0] = new TaskInProgress(jobId, jobFile, emptySplit, 
            jobtracker, conf, this, numMapTasks, 1);
    cleanup[0].setJobCleanupTask();
    // cleanup reduce tip.
    cleanup[1] = new TaskInProgress(jobId, jobFile, numMapTasks,
                       numReduceTasks, jobtracker, conf, this, 1);
    cleanup[1].setJobCleanupTask();
    // create two setup tips, one map and one reduce.
    setup = new TaskInProgress[2];
    // setup map tip. This map doesn't use any split. Just assign an empty
    // split.
    setup[0] = new TaskInProgress(jobId, jobFile, emptySplit, 
            jobtracker, conf, this, numMapTasks + 1, 1);
    setup[0].setJobSetupTask();
    // setup reduce tip.
    setup[1] = new TaskInProgress(jobId, jobFile, numMapTasks,
                       numReduceTasks + 1, jobtracker, conf, this, 1);
    setup[1].setJobSetupTask();
    synchronized(jobInitKillStatus){
      jobInitKillStatus.initDone = true;
      // set this before the throw to make sure cleanup works properly
      tasksInited = true;
      if(jobInitKillStatus.killed) {
        throw new KillInterruptedException("Job " + jobId + " killed in init");
      }
    }
    JobHistory.JobInfo.logInited(profile.getJobID(), this.launchTime, 
                                 numMapTasks, numReduceTasks);
   // Log the number of map and reduce tasks
   LOG.info("Job " + jobId + " initialized successfully with " + numMapTasks
            + " map tasks and " + numReduceTasks + " reduce tasks.");
  }

總結

看Hadoop源碼我覺得更好的方式是理解它的架構和通信機制，整體上有把握，對於細節在重要的地方例如任務提交，具體的任務分配的流程是怎麼樣的，代碼中的異步機制等等多加留心，至於每一次封裝和有些非重點對象可以不必仔細去看，畢竟不是真正需要基於Hadoop搞二次開發。

Hadoop之JobTracker源碼分析

JobTracker源碼分析

前言

JobTracker啓動

JobTracker中的幾個重要對象

intertackerServer分析

TaskScheduler分析

總結

Wireshark 安裝+使用（一）

css中的position、z-index、clearfix

Http解析 OCP原則設計模式

hadoop mapreduce問題排查和解決

apache rewrite配置

jquery內部實現原理分析

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結