Giraph 運行流程（一）

前言

本文主要分析了 Giraph1.3 SNAPSHOT 的 Job 提交和初始化的過程。其中 Job 提交部分的分析根據運行在 Standalone 模式下的 Hadoop 部分進行，分析僅涉及本地運行時執行的代碼，而初始化部分則主要根據集羣模式進行分析。

示例 Job

該部分不屬於源碼，而是爲了方便分析運行的一個示例 Job，Job 的具體配置和運行在 Giraph 編程實踐及源碼編譯調試一文中已經說明。通常情況下， Job 提交過程如下：

GiraphConfiguration conf = new GiraphConfiguration(new Configuration());
//指定計算類
conf.setComputationClass(Shortestpath.class);
//設置輸入和輸出格式
conf.setVertexInputFormatClass(JsonLongDoubleFloatDoubleVertexInputFormat.class);
conf.setVertexOutputFormatClass(IdWithValueTextOutputFormat.class);
//設置本地運行模式，方便調試查看源碼
conf.setLocalTestMode(true);
//設置 Worker 配置
conf.setWorkerConfiguration(1, 1, 100);
//本地模式下運行不分 Master 和 Worker
GiraphConstants.SPLIT_Master_Worker.set(conf, false);

GiraphJob job = new GiraphJob(conf, Shortestpath.class.getSimpleName());
//設置輸入和輸出路徑
GiraphTextInputFormat.setVertexInputPath(conf, new Path(INPUT_PATH));
GiraphTextOutputFormat.setOutputPath(job.getInternalJob(), new Path(OUTPUT_PATH));
••••••
//向 Giraph 提交 Job
job.run(true);

首先指定一系列參數，然後調用 job.run(true) 向 Giraph 提交 Job

Giraph 向 Hadoop 提交 Job

Giraph 是基於 Hadoop 開發的，因此在向 Giraph 提交 Job 之後，Giraph 內部還會向 Hadoop 提交 Job。本部分主要分析 Giraph 如何向 Hadoop 提交 Job。首先查看 run 方法：

org.apache.giraph.job.GiraphJob#run

/**
 * Runs the actual graph application through Hadoop Map-Reduce.
 *
 * @param verbose If true, provide verbose output, false otherwise
 * @return True if success, false otherwise
 * @throws ClassNotFoundException
 * @throws InterruptedException
 * @throws IOException
 */
public final boolean run(boolean verbose)
  throws IOException, InterruptedException, ClassNotFoundException {
  //更改 Job 的 counter 數量限制
  setIntConfIfDefault("mapreduce.job.counters.limit", 512);

  //設置 Giraph 中 Worker 或者 Master 內存上限
  setIntConfIfDefault("mapred.job.map.memory.mb", 1024);
  setIntConfIfDefault("mapred.job.reduce.memory.mb", 0);

  // Speculative execution doesn't make sense for Giraph
  giraphConfiguration.setBoolean(
      "mapred.map.tasks.speculative.execution", false);

  // Set the ping interval to 5 minutes instead of one minute
  Client.setPingInterval(giraphConfiguration, 60000 * 5);

  // 設置優先使用用戶上傳的 Jar 包的 class
  giraphConfiguration.setBoolean("mapreduce.user.classpath.first", true);
  giraphConfiguration.setBoolean("mapreduce.job.user.classpath.first", true);

  //不做 Checkpoint 的時候最大嘗試數爲 1，爲了讓不能恢復的 Job 更快的結束
  if (giraphConfiguration.getCheckpointFrequency() == 0) {
    int oldMaxTaskAttempts = giraphConfiguration.getMaxTaskAttempts();
    giraphConfiguration.setMaxTaskAttempts(1);
    
    ••••••
  }

  
  ImmutableClassesGiraphConfiguration conf =
      new ImmutableClassesGiraphConfiguration(giraphConfiguration);
  checkLocalJobRunnerConfiguration(conf);

  int tryCount = 0;
  //默認是 org.apache.giraph.job.DefaultGiraphJobRetryChecker
  GiraphJobRetryChecker retryChecker = conf.getJobRetryChecker();
  while (true) {
    ••••••

    tryCount++;
    //創建一個 Hadoop Job
    Job submittedJob = new Job(conf, jobName);
    if (submittedJob.getJar() == null) {
      submittedJob.setJarByClass(getClass());
    }
    //Giraph 不需要執行 Reduce 任務
    submittedJob.setNumReduceTasks(0);
    //設置 Mapper
    submittedJob.setMapperClass(GraphMapper.class);
    //設置輸入格式
    submittedJob.setInputFormatClass(BspInputFormat.class);
    //設置輸出格式，默認情況是 org.apache.giraph.bsp.BspOutputFormat
    submittedJob.setOutputFormatClass(
        GiraphConstants.HADOOP_OUTPUT_FORMAT_CLASS.get(conf));
    ••••••
    //提交 Job 
    submittedJob.submit();
    
    ••••••
    //獲取 Job 運行結果
    boolean passed = submittedJob.waitForCompletion(verbose);
    
    ••••••

    //如果運行失敗則會嘗試重啓 Job
    if (!passed) {
      //默認情況（指沒有指定 JobRetryChecker 情況）返回 null，即永遠不會重啓 Job
      String restartFrom = retryChecker.shouldRestartCheckpoint(submittedJob);
      if (restartFrom != null) {
        GiraphConstants.RESTART_JOB_ID.set(conf, restartFrom);
        continue;
      }
    }

    //如果 Job 運行成功或者失敗情況下不嘗試重新運行（默認情況下永遠不會嘗試嘗試重新運行）
    if (passed || !retryChecker.shouldRetry(submittedJob, tryCount)) {
      return passed;
    }
    •••••••
  }
}

run 方法中首先會對 Hadoop 和 Giraph 進行配置，然後創建一個 Hadoop Job 對象。在設置好 Hadoop Job 的 MapperClass 和輸入輸出格式等相關信息後，即會調用 submit 向 Hadoop 提交 Job。從代碼中可以看到整個過程與提交普通的 Hadoop Job 基本無異。

Hadoop 內部運行

在 Giraph 調用 submit 向 Hadoop 提交 Job 之後，程序的運行就會進入到 Hadoop 內部，對於該部分主要需要了解 Hadoop 如何啓動 Giraph 的 MapTask。

內部提交 Job

org.apache.hadoop.mapreduce.Job#submit

public void submit() throws IOException, InterruptedException, ClassNotFoundException {
  ensureState(JobState.DEFINE);
  //設置用新的 API
  setUseNewAPI();
  connect();
  final JobSubmitter submitter = 
      getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
  //提交 Job 到系統
  status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
      return submitter.submitJobInternal(Job.this, cluster);
      }
  });
  state = JobState.RUNNING;
  ••••••
}

submit 方法內部會創建 JobSubmitter 對象，然後通過 submitJobInternal 方法進一步提交 Job。

org.apache.hadoop.mapreduce.JobSubmitter#submitJobInternal

JobStatus submitJobInternal(Job job, Cluster cluster) 
throws ClassNotFoundException, InterruptedException, IOException {

    ••••••

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    //獲得暫存目錄， 默認情況下路徑生成在 /tmp/hadoop/mapred/staging 下
    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
    ••••••
    //生成 Job ID
    JobID jobId = submitClient.getNewJobID();
    //設置 Job ID
    job.setJobID(jobId);
    //獲得提交 Job 的目錄
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    ••••••
    
    ••••••
    //實際提交 Job
    status = submitClient.submitJob(
        jobId, submitJobDir.toString(), job.getCredentials());
    ••••••
}

在 submitJobInternal 中，Hadoop 會通過 submitClient 實際提交 Job。submitClient 是一個 ClientProtocol 接口，其有兩個實現，由於提交 Job 的時候 Hadoop 運行在 Standalone 模式下，所以這裏 submitClient 的實現是 LocalJobRunner。

啓動 MapTask

org.apache.hadoop.mapred.LocalJobRunner#submitJob

public org.apache.hadoop.mapreduce.JobStatus submitJob(
    org.apache.hadoop.mapreduce.JobID jobid, String jobSubmitDir,
    Credentials credentials) throws IOException {
  Job job = new Job(JobID.downgrade(jobid), jobSubmitDir);
  job.job.setCredentials(credentials);
  return job.status;
}

org.apache.hadoop.mapred.LocalJobRunner.Job#Job

public Job(JobID jobid, String jobSubmitDir) throws IOException {
    ••••••

    this.start();
}

submitJob 內部會創建一個 Job 對象，這裏的 Job 是繼承了 Thread 的 LocalJobRunner 的內部類。通過構造方法可以知道，submitJob 在創建 Job 的同時也開啓了線程，所以需要查看 Job#run 方法。

org.apache.hadoop.mapred.LocalJobRunner.Job#run

@Override
public void run() {
    JobID jobId = profile.getJobID();
    JobContext jContext = new JobContextImpl(job, jobId);
    
    ••••••

    Map<TaskAttemptID, MapOutputFile> mapOutputFiles =
        Collections.synchronizedMap(new HashMap<TaskAttemptID, MapOutputFile>());
    
    //獲取需要執行的任務
    List<RunnableWithThrowable> mapRunnables = getMapTaskRunnables(
        taskSplitMetaInfos, jobId, mapOutputFiles);
            
    initCounters(mapRunnables.size(), numReduceTasks);
    ExecutorService mapService = createMapExecutor();
    //運行任務
    runTasks(mapRunnables, mapService, "map");

    ••••••
    // delete the temporary directory in output directory
    outputCommitter.commitJob(jContext);
    status.setCleanupProgress(1.0f);

    ••••••
}

org.apache.hadoop.mapred.LocalJobRunner.Job#getMapTaskRunnables

protected List<RunnableWithThrowable> getMapTaskRunnables(
        TaskSplitMetaInfo [] taskInfo, JobID jobId,
        Map<TaskAttemptID, MapOutputFile> mapOutputFiles) {

    int numTasks = 0;
    ArrayList<RunnableWithThrowable> list =
        new ArrayList<RunnableWithThrowable>();
  	//生成對應數量的 MapTaskRunnable
    for (TaskSplitMetaInfo task : taskInfo) {
    list.add(new MapTaskRunnable(task, numTasks++, jobId,
        mapOutputFiles));
    }

    return list;
}

org.apache.hadoop.mapred.LocalJobRunner.Job#runTasks

private void runTasks(List<RunnableWithThrowable> runnables,
        ExecutorService service, String taskType) throws Exception {
    //提交任務
    for (Runnable r : runnables) {
    service.submit(r);
    }

    try {
    service.shutdown(); // Instructs queue to drain.

    // Wait for tasks to finish; do not use a time-based timeout.
    // (See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6179024)
    LOG.info("Waiting for " + taskType + " tasks");
    service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
    } catch (InterruptedException ie) {
    // Cancel all threads.
    service.shutdownNow();
    throw ie;
    }
    ••••••
}

對於 Job#run 方法應該主要關注 MapTaskRunnable 的生成和執行，可以看到 Hadoop 會通過 getMapTaskRunnables 方法根據分配的 Task 的數量生成對應數量的 MapTaskRunnable，然後會調用 runTasks 方法向線程池提交任務。

MapTaskRunnable 任務提交到線程池後繼續關注 MapTaskRunnable#run 方法

org.apache.hadoop.mapred.LocalJobRunner.Job.MapTaskRunnable#run

public void run() {
    try {
      ••••••
      MapTask map = new MapTask(systemJobFile.toString(), mapId, taskId,
        info.getSplitIndex(), 1);
      ••••••
      try {
        ••••••
        map.run(localConf, Job.this);
        ••••••
    } catch (Throwable e) {
      this.storedException = e;
    }
  }
}

從 MapTaskRunnable#run 中可以看到其創建了一個 MapTask 對象，並調用了 MapTask#run 方法。

org.apache.hadoop.mapred.MapTask#run

@Override
public void run(final JobConf job, final TaskUmbilicalProtocol umbilical)
throws IOException, ClassNotFoundException, InterruptedException {

    ••••••
    //org.apache.hadoop.mapreduce.Job#submit 設置了 useNewApi，所以返回 true
    boolean useNewApi = job.getUseNewMapper();

    ••••••

    if (useNewApi) {
        runNewMapper(job, splitMetaInfo, umbilical, reporter);
    } else {
        runOldMapper(job, splitMetaInfo, umbilical, reporter);
    }
    done(umbilical, reporter);
}

MapTask#run 中會調用 runNewMapper 方法，所以繼續查看該方法

org.apache.hadoop.mapred.MapTask#runNewMapper

private <INKEY,INVALUE,OUTKEY,OUTVALUE> void runNewMapper(final JobConf job,
                    final TaskSplitIndex splitIndex,
                    final TaskUmbilicalProtocol umbilical,
                    TaskReporter reporter
                    ) throws IOException, ClassNotFoundException,
                             InterruptedException {
    // make a task context so we can get the classes
    org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
        new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job, 
                                                                    getTaskID(),
                                                                    reporter);
    // 反射獲取設置的 MapperClass 實例對象
    org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE> mapper =
        (org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>)
        ReflectionUtils.newInstance(taskContext.getMapperClass(), job);

    ••••••

    //創建 Context
    org.apache.hadoop.mapreduce.MapContext<INKEY, INVALUE, OUTKEY, OUTVALUE> 
    mapContext = 
        new MapContextImpl<INKEY, INVALUE, OUTKEY, OUTVALUE>(job, getTaskID(), 
            input, output, 
            committer, 
            reporter, split);

    org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>.Context 
        mapperContext = 
            new WrappedMapper<INKEY, INVALUE, OUTKEY, OUTVALUE>().getMapContext(
                mapContext);

    try {
        ••••••
        mapper.run(mapperContext);
        ••••••
    } finally {
        ••••••
    }
}

MapTask#runNewMapper 方法中會通過反射創建設置的 MapperClass 的對象，即 org.apache.giraph.job.GiraphJob#run 中設定的 GraphMapper 類的對象。在獲取到 GraphMapper 對象後，系統會調用其 run 方法，從而使得程序的執行進入到 Giraph 部分。

Giraph 執行 Job

org.apache.giraph.graph.GraphMapper#run

@Override
public void run(Context context) throws IOException, InterruptedException {
    // Notify the master quicker if there is Worker failure rather than
    // waiting for ZooKeeper to timeout and delete the ephemeral znodes
    try {
        //初始化
        setup(context);
        //執行計算
        while (context.nextKeyValue()) {
        graphTaskManager.execute();
        }
        //清理
        cleanup(context);
    } catch (RuntimeException e) {
        ••••••
    }
}

從 GraphMapper#run 方法可以看到一個 Giraph Job 的執行能夠分爲三個過程：

初始化
執行計算
清理

下面針對初始化過程進行分析

初始化

org.apache.giraph.graph.GraphMapper#setup

@Override
public void setup(Context context)
  throws IOException, InterruptedException {
  // Execute all Giraph-related role(s) assigned to this compute node.
  // Roles can include "Master," "Worker," "zookeeper," or . . . ?
  graphTaskManager = new GraphTaskManager<I, V, E>(context);
  graphTaskManager.setup(
    DistributedCache.getLocalCacheArchives(context.getConfiguration()));
}

GraphMapper#setup 方法中會創建 GraphTaskManager 對象，並調用其 setup 方法

org.apache.giraph.graph.GraphTaskManager#setup

public void setup(Path[] zkPathList) throws IOException, InterruptedException {
    Configuration hadoopConf = context.getConfiguration();
    //初始化一些配置
    conf = new ImmutableClassesGiraphConfiguration<I, V, E>(hadoopConf);
    ••••••
    //從配置中讀取 Zookeeper 的連接信息，沒有提供外部 Zookeeper 情況下爲空
    String serverPortList = conf.getZookeeperList();
    //如果沒有提供外部 Zookeeper 則需要自己啓動 Zookeeper
    if (serverPortList.isEmpty()) {
        if (startZooKeeperManager()) {
            return; // ZK connect/startup failed
        }
    } else {
        createZooKeeperCounter(serverPortList);
    }
    ••••••
    this.graphFunctions = determineGraphFunctions(conf, zkManager);
    if (zkManager != null && this.graphFunctions.isMaster()) {
        //將由 Master 創建的文件夾標記爲刪除，文件系統關閉時將會刪除文件
        zkManager.cleanupOnExit();
    }
    try {
        //初始化 BSP 服務
        instantiateBspService();
    } catch (IOException e) {
        ••••••
    }
}

GraphTaskManager#setup 方法主要做三件事：

獲取 Zookeeper 連接信息
決定進程的角色
初始化 BSP 服務

獲取 Zookeeper 連接信息

GraphTaskManager#setup 中會通過 conf.getZookeeperList() 獲得 Zookeeper 的連接信息。如果提供了外部 Zookeeper 則直接返回連接信息，但如果沒有提供外部 Zookeeper 時，getZookeeperList() 會返回空值。此時 GraphTaskManager#setup 會調用 startZooKeeperManager 方法在某一個 Task 啓動 Zookeeper。

org.apache.giraph.graph.GraphTaskManager#startZooKeeperManager

/**
 * Instantiate and configure ZooKeeperManager for this job. This will
 * result in a Giraph-owned Zookeeper instance, a connection to an
 * existing quorum as specified in the job configuration, or task failure
 * @return true if this task should terminate
 */
private boolean startZooKeeperManager() throws IOException, InterruptedException {
    zkManager = new ZooKeeperManager(context, conf);
    context.setStatus("setup: Setting up Zookeeper manager.");
    zkManager.setup();
    //如果計算已經結束則不需要再啓動 Zookeeper，
    //大部分情況應該會在沒有提供外部 Zookeeper 且重啓 Task 時候起作用
    if (zkManager.computationDone()) {
        done = true;
        return true;
    }
    zkManager.onlineZooKeeperServer();
    //更新 Zookeeper 連接信息，創建計數器
    String serverPortList = zkManager.getZooKeeperServerPortString();
    conf.setZookeeperList(serverPortList);
    createZooKeeperCounter(serverPortList);
    return false;
}

startZooKeeperManager 中首先會創建 ZooKeeperManager 對象，然後調用其 setup 方法

org.apache.giraph.zk.ZooKeeperManager#setup

public void setup() throws IOException, InterruptedException {
    createCandidateStamp();
    getZooKeeperServerList();
}

ZooKeeperManager#setup 方法中會首先調用 createCandidateStamp 方法

org.apache.giraph.zk.ZooKeeperManager#createCandidateStamp

/**
 * Create a HDFS stamp for this task.  If another task already
 * created it, then this one will fail, which is fine.
 */
public void createCandidateStamp() {
    ••••••
    fs.mkdirs(baseDirectory);
    ••••••
    fs.mkdirs(serverDirectory);
    ••••••
    if (!fs.getFileStatus(baseDirectory).isDir()) {
    throw new IllegalArgumentException(
        "createCandidateStamp: " + baseDirectory +
        " is not a directory, but should be.");
    }

    ••••••
    //根據 hostname 和 taskPartition 生成文件名
    Path myCandidacyPath = new Path(
        taskDirectory, myHostname +
        HOSTNAME_TASK_SEPARATOR + taskPartition);
    try {
        ••••••
        fs.createNewFile(myCandidacyPath);
    } catch (IOException e) {
        LOG.error("createCandidateStamp: Failed (maybe previous task " +
            "failed) to create filestamp " + myCandidacyPath, e);
    }
}

在 createCandidateStamp 方法中，每個 Task 會根據自己的 hostname 和 taskPartition 在 _bsp/_defaultZkManagerDir/_task 下創建對應文件，這些文件將會在系統選擇某個 Task 啓動 Zookeeper 服務時用到。具體結果如下圖所示：

圖中 hostname 是 localhost 的原因在於，運行源碼的時候 Hadoop 處於 Standalone 模式。

在 createCandidateStamp 執行完成之後，ZooKeeperManager#setup 會接着調用 getZooKeeperServerList

org.apache.giraph.zk.ZooKeeperManager#getZooKeeperServerList

private void getZooKeeperServerList() throws IOException,
      InterruptedException {
    String serverListFile;

    //taskPartition 爲 0 的 Task 會創建 zooKeeperServerList 
    if (taskPartition == 0) {
      //0 號 Task 如果重啓檢查到已經有 serverList 則不會重新創建
      serverListFile = getServerListFile();
      if (serverListFile == null) {
        //創建 serverList
        createZooKeeperServerList();
      }
    }

    while (true) {
      //其餘 Task 等待 serverList 的創建
      serverListFile = getServerListFile();
      ••••••
      if (serverListFile != null) {
        break;
      }
      //減少 CPU 的佔用
      try {
        Thread.sleep(pollMsecs);
      } catch (InterruptedException e) {
        LOG.warn("getZooKeeperServerList: Strange interrupted " +
            "exception " + e.getMessage());
      }

    }

    //解析 serverList 中的信息
    String[] serverHostList = serverListFile.substring(
        ZOOKEEPER_SERVER_LIST_FILE_PREFIX.length()).split(
            HOSTNAME_TASK_SEPARATOR);
    ••••••

    //獲得 Zookeeper 服務所在節點的 hostname
    zkServerHost = serverHostList[0];
    //獲得應該啓動 Zookeeper 服務的 Task 的 taskPartition
    zkServerTask = Integer.parseInt(serverHostList[1]);
     
    //各個 Task 更新自己的 zkServerPortString
    updateZkPortString();
  }

getZooKeeperServerList 方法會根據 taskPartition 進行判斷，如果是 0 號 Task 則會先調用 createZooKeeperServerList 創建 serverListFile（serverListFile 表明了 Zookeeper 服務所在的 hostname 和 taskPartition），而如果是非 0 號 Task 則會進行輪詢來獲取 serverListFile 的文件名。在獲取到文件名後會對其進行解析來更新 zkServerHost、zkServerTask 以及 zkServerPortString。

接下來會對 createZooKeeperServerList 和 getZooKeeperServerList 進行分析以便更好的理解系統如何選取啓動 Zookeeper 服務的 Task

org.apache.giraph.zk.ZooKeeperManager#createZooKeeperServerList

private void createZooKeeperServerList() throws IOException, InterruptedException {
  String host;
  String task;
  while (true) {
    //返回 Task 下文件的元數據，會有一個文件名格式校驗的過程，會去掉以 . 開頭和 crc 結尾文件
    FileStatus [] fileStatusArray = fs.listStatus(taskDirectory);
    if (fileStatusArray.length > 0) {
      //選取第一位的元數據標識的 Task 去啓動 Zookeeper 服務
      FileStatus fileStatus = fileStatusArray[0];
      //解析信息
      String[] hostnameTaskArray =
          fileStatus.getPath().getName().split(
              HOSTNAME_TASK_SEPARATOR);
      ••••••
      host = hostnameTaskArray[0];
      task = hostnameTaskArray[1];
      break;
    }
    Thread.sleep(pollMsecs);
  }
  //根據解析的信息生成 serverListFile 文件名
  String serverListFile =
      ZOOKEEPER_SERVER_LIST_FILE_PREFIX + host +
      HOSTNAME_TASK_SEPARATOR + task;
  Path serverListPath =
      new Path(baseDirectory, serverListFile);
  ••••••
  }
  //創建文件
  fs.createNewFile(serverListPath);
}

createZooKeeperServerList 中會獲取所有 Task 在 createCandidateStamp 方法中創建的文件的文件名，然後選取返回數組中第一個元素標識的 Task 信息去創建 serverListFile。

org.apache.giraph.zk.ZooKeeperManager#getServerListFile

private String getServerListFile() throws IOException {
  String serverListFile = null;
  //baseDirectory 是 _bsp/_defaultZkManagerDir，列出文件夾下的文件元數據
  FileStatus [] fileStatusArray = fs.listStatus(baseDirectory);
  for (FileStatus fileStatus : fileStatusArray) {
    //篩選文件名中含有 zkServerList_ 的文件，即 taskpartition 爲 0 task 創建的 serverListFile
    if (fileStatus.getPath().getName().startsWith(
        ZOOKEEPER_SERVER_LIST_FILE_PREFIX)) {
      serverListFile = fileStatus.getPath().getName();
      break;
    }
  }
  return serverListFile;
}

getServerListFile 會獲取 baseDirectory 下的文件元數據，然後篩選出對應的 serverListFile，最後返回其文件名。

接着回到 startZooKeeperManager 方法中，在選定了啓動 Zookeeper 服務的 Task 後，系統會首先判斷計算是否完成，如果已經完成則表明無需再繼續運行。否則會調用 onlineZooKeeperServer 方法啓動 Zookeeper 服務。

org.apache.giraph.zk.ZooKeeperManager#onlineZooKeeperServer

public void onlineZooKeeperServer() throws IOException {
  //如果當前 task 的 taskPartition 等於 zkServerTask，則需要啓動 Zookeeper 服務 
  if (zkServerTask == taskPartition) {
    File zkDirFile = new File(this.zkDir);
    try {
      //刪除舊的文件夾
      ••••••
      FileUtils.deleteDirectory(zkDirFile);
    } catch (IOException e) {
      ••••••
    }
    //生成 Zookeeper 配置
    generateZooKeeperConfig();
    synchronized (this) {
      zkRunner = createRunner();
      //啓動 Zookeeper 服務
      int port = zkRunner.start(zkDir, config);
      if (port > 0) {
        zkBasePort = port;
        updateZkPortString();
      }
    }

    // Once the server is up and running, notify that this server is up
    // and running by dropping a ready stamp.
    int connectAttempts = 0;
    final int maxConnectAttempts =
        conf.getZookeeperConnectionAttempts();
    while (connectAttempts < maxConnectAttempts) {
      try {
        ••••••
        //連接 Zookeeper 服務
        InetSocketAddress zkServerAddress =
            new InetSocketAddress(myHostname, zkBasePort);
        Socket testServerSock = new Socket();
        testServerSock.connect(zkServerAddress, 5000);
        ••••••
        break;
      } catch (SocketTimeoutException e) {
        LOG.warn("onlineZooKeeperServers: Got " +
            "SocketTimeoutException", e);
      } catch (ConnectException e) {
        LOG.warn("onlineZooKeeperServers: Got " +
            "ConnectException", e);
      } catch (IOException e) {
        LOG.warn("onlineZooKeeperServers: Got " +
            "IOException", e);
      }

      ++connectAttempts;
      try {
        Thread.sleep(pollMsecs);
      } catch (InterruptedException e) {
        LOG.warn("onlineZooKeeperServers: Sleep of " + pollMsecs +
            " interrupted - " + e.getMessage());
      }
    }
    //超過最大的嘗試數，連接失敗
    if (connectAttempts == maxConnectAttempts) {
      throw new IllegalStateException(
          "onlineZooKeeperServers: Failed to connect in " +
              connectAttempts + " tries!");
    }
    //
    Path myReadyPath = new Path(
        serverDirectory, myHostname +
        HOSTNAME_TASK_SEPARATOR + taskPartition +
        HOSTNAME_TASK_SEPARATOR + zkBasePort);
    try {
      ••••••
      //創建文件表明 Zookeeper 服務已經準備好，並且提供連接的信息
      fs.createNewFile(myReadyPath);
    } catch (IOException e) {
      ••••••
    }
  } else {
    //其餘 Task 等待 Zookeeper 服務的啓動
    int readyRetrievalAttempt = 0;
    String foundServer = null;
    while (true) {
      try {
        FileStatus [] fileStatusArray =
            fs.listStatus(serverDirectory);
        //檢查 serverDirectory 文件夾下是否生成了 Zookeeper 連接信息文件
        if ((fileStatusArray != null) &&
            (fileStatusArray.length > 0)) {
          //解析文件中的連接信息
          for (int i = 0; i < fileStatusArray.length; ++i) {
            String[] hostnameTaskArray =
                fileStatusArray[i].getPath().getName().split(
                    HOSTNAME_TASK_SEPARATOR);
            if (hostnameTaskArray.length != 3) {
              throw new RuntimeException(
                  "getZooKeeperServerList: Task 0 failed " +
                      "to parse " +
                      fileStatusArray[i].getPath().getName());
            }
            //zookeeper 服務所在地址
            foundServer = hostnameTaskArray[0];
            //zookeeper 服務的連接端口
            zkBasePort = Integer.parseInt(hostnameTaskArray[2]);
            //更新 zookeeper 的連接信息
            updateZkPortString();
          }
          ••••••
          //查看 hostname 是否相同，相同則跳出等待，具體場景尚未想到
          if (zkServerHost.equals(foundServer)) {
            break;
          }
        } else {
          ••••••
        }
        Thread.sleep(pollMsecs);
        ++readyRetrievalAttempt;
      } catch (IOException e) {
        throw new RuntimeException(e);
      } catch (InterruptedException e) {
        ••••••
      }
    }
  }
}

可以看到 onlineZooKeeperServer 實際做了兩件事情：一是在選定的 Task 上啓動 Zookeeper 服務，並創建文件表明服務已經準備好。二是所有未啓動 Zookeeper 服務的 Task 去更新 Zookeeper 的連接信息。

分配角色

在啓動完成 Zookeeper 服務之後系統會更新 Zookeeper 相關的配置信息然後返回到 org.apache.giraph.graph.GraphTaskManager#setup 方法中，之後會調用 determineGraphFunctions

org.apache.giraph.graph.GraphTaskManager#determineGraphFunctions

private static GraphFunctions determineGraphFunctions(
  ImmutableClassesGiraphConfiguration conf,
  ZooKeeperManager zkManager) {
  //判斷是本地模式還是集羣模式，本地模式只會啓動一個 Task
  boolean splitMasterWorker = conf.getSplitMasterWorker();
  //獲取當前 Task 的 taskPartition
  int taskPartition = conf.getTaskPartition();
  //判斷是否提供了外部的 Zookeeper
  boolean zkAlreadyProvided = conf.isZookeeperExternal();
  //初始時刻 Task 的角色
  GraphFunctions functions = GraphFunctions.UNKNOWN;
  
  if (!splitMasterWorker) {
    //本地模式下如果是內部啓動 Zookeeper 則 Task 充當所有的角色，否則充當 Master 和 Worker
    if ((zkManager != null) && zkManager.runsZooKeeper()) {
      functions = GraphFunctions.ALL;
    } else {
      functions = GraphFunctions.ALL_EXCEPT_ZOOKEEPER;
    }
  } else {
    if (zkAlreadyProvided) {
      //如果有外部 Zookeeper 則 0 號 Task 就是 Master，其餘的都是 Worker
      if (taskPartition == 0) {
        functions = GraphFunctions.Master_ONLY;
      } else {
        functions = GraphFunctions.Worker_ONLY;
      }
    } else {
      //如果是內部啓動的 Zookeeper 服務，
      //則啓動 Zookeeper 服務的 Task 充當 Master 和 zookeeper 角色，其餘爲 Worker
      if ((zkManager != null) && zkManager.runsZooKeeper()) {
        functions = GraphFunctions.Master_ZOOKEEPER_ONLY;
      } else {
        functions = GraphFunctions.Worker_ONLY;
      }
    }
  }
  return functions;
}

determineGraphFunctions 主要是對 Task 的角色進行判斷，系統提供了 6 種角色：

UNKNOWN

表明 Task 的角色還未知
Master_ONLY

表明 Task 是 Master
Master_ZOOKEEPER_ONLY

表明 Task 既是 Master 也是 Zookeeper
Worker_ONLY

表明 Task 只是 Worker
ALL

表明 Task 既是 Master 也是 Worker 和 Zookeeper
ALL_EXCEPT_ZOOKEEPER

表明 Task 既是 Master 也是 Worker

初始化 BSP

在決定各個 Task 的角色之後，系統會調用 instantiateBspService 初始化 BSP 服務。

org.apache.giraph.graph.GraphTaskManager#instantiateBspService

private void instantiateBspService()
throws IOException, InterruptedException {
  if (graphFunctions.isMaster()) {
    ••••••
    //創建 Master 對象
    serviceMaster = new BspServiceMaster<I, V, E>(context, this);
    //Master 運行在線程裏面
    MasterThread = new MasterThread<I, V, E>(serviceMaster, context);
    MasterThread.setUncaughtExceptionHandler(
        createUncaughtExceptionHandler());
    MasterThread.start();
  }
  if (graphFunctions.isWorker()) {
    ••••••
    //創建 Worker 對象
    serviceWorker = new BspServiceWorker<I, V, E>(context, this);
    installGCMonitoring();
    ••••••
  }
}

instantiateBspService 中對於 Master 主要是創建 serviceMaster 對象，然後啓動 MasterThread 線程，對於 Worker 則是創建 serviceWorker 對象。

總結

總的來說，Giraph 的 Job 提交和初始化依據以下流程來執行：

用戶向 Giraph 提交 Job
Giraph 向 Hadoop 提交 Job
Hadoop 啓動 MapTask，並執行 GraphMapper 的 run 方法
GraphMapper 創建 GraphTaskManager 對象進行初始化
初始化過程首先獲取 Zookeeper 連接信息，如果沒有外置 Zookeeper 則需要從所有 MapTask 中進行選取 Task 來啓動 Zookeeper 服務。
獲取到 Zookeeper 連接信息之後會根據 determineGraphFunctions 分配角色，由此區分 MapTask 中的 Master 和 Worker
分配完角色之後則會通過 instantiateBspService 來初始化 BSP 服務，由此結束整個初始化過程。

Giraph 運行流程（一）

前言

示例 Job

Giraph 向 Hadoop 提交 Job

Hadoop 內部運行

內部提交 Job

啓動 MapTask

Giraph 執行 Job

初始化

獲取 Zookeeper 連接信息

分配角色

初始化 BSP

總結

Giraph 運行流程（一）

Giraph 簡介

Giraph 編程實踐及源碼編譯調試

Giraph 環境搭建

Hadoop 基礎之 HDFS 入門

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結