文章目錄

前言

在之前的文章中，筆者分析過Ozone Datanode內的數據處理過程(Ozone Datanode的分佈式元數據管理)，包括Container，Chunk文件級別的操作處理邏輯。本文筆者繼續闡述Datanode服務內部的另外一部分的處理過程：Datanode服務啓動以及心跳發送給SCM服務的過程。瞭解此本部分過程，能更加地讓我們瞭解Datanode服務的正常運行過程是怎樣的。如文章標題所述，下面筆者將會分成2個過程進行闡述。

Ozone Datanode的服務啓動

Ozone Datanode服務在名稱上雖說和HDFS Datanode是同名的，但在服務本身的實現還是存在較大差異的。Ozone Datanode在這裏提供的是基於Container級別的容器存儲服務,被SCM服務所管理。

下面我們來看看Datanode服務的正式啓動過程。總的來說，Datanode在啓動過程中經歷了兩次的狀態變化：

一個爲Datanode State的變化
另一個爲Datanode Running State內部的Endpoint State的變化

上面說的這2點筆者在下文中還會繼續提到。這裏的啓動過程是這樣的：

首先是Datanode service的啓動，隨後會啓動DatanodeStateMachine daemon線程服務。

  public void start() {
    ...
    OzoneConfiguration.activate();
    HddsUtils.initializeMetrics(conf, "HddsDatanode");
    try {
      ...
      datanodeStateMachine = new DatanodeStateMachine(datanodeDetails, conf,
          dnCertClient, this::terminateDatanode);
      try {
        httpServer = new HddsDatanodeHttpServer(conf);
        httpServer.start();
      } catch (Exception ex) {
        LOG.error("HttpServer failed to start.", ex);
      }
      startPlugins();
      // 啓動Datanode狀態機線程
      datanodeStateMachine.startDaemon();
    } catch (IOException e) {
      throw new RuntimeException("Can't start the HDDS datanode plugin", e);
    } catch (AuthenticationException ex) {
      throw new RuntimeException("Fail to authentication when starting" +
          " HDDS datanode plugin", ex);
    }
  }

隨後DatanodeStateMachine的start方法內會進行週期性的狀態機操作執行：

  /**
   * Runs the state machine at a fixed frequency.
   */
  private void start() throws IOException {
    long now = 0;
    ...
    // 如果Datanode當前狀態不是SHUTDOWN狀態，則繼續進行loop循環
    while (context.getState() != DatanodeStates.SHUTDOWN) {
      try {
        LOG.debug("Executing cycle Number : {}", context.getExecutionCount());
        long heartbeatFrequency = context.getHeartbeatFrequency();
        nextHB.set(Time.monotonicNow() + heartbeatFrequency);
        // 執行當前Datanode狀態對應應當執行的task任務，在RUNNING狀態時，指的就是heartbeat任務
        context.execute(executorService, heartbeatFrequency,
            TimeUnit.MILLISECONDS);
        now = Time.monotonicNow();
        if (now < nextHB.get()) {
          if(!Thread.interrupted()) {
            // 睡眠等待下一次heartbeat的執行時間
            Thread.sleep(nextHB.get() - now);
          }
        }
      } catch (InterruptedException e) {
        // Some one has sent interrupt signal, this could be because
        // 1. Trigger heartbeat immediately
        // 2. Shutdown has be initiated.
      } catch (Exception e) {
        LOG.error("Unable to finish the execution.", e);
      }
    }
    ...
  }

進入上面的context的execute方法，

  public void execute(ExecutorService service, long time, TimeUnit unit)
      throws InterruptedException, ExecutionException, TimeoutException {
    stateExecutionCount.incrementAndGet();
    // 1)獲取當前狀態對應的task
    DatanodeState<DatanodeStateMachine.DatanodeStates> task = getTask();

    // Adding not null check, in a case where datanode is still starting up, but
    // we called stop DatanodeStateMachine, this sets state to SHUTDOWN, and
    // there is a chance of getting task as null.
    if (task != null) {
      if (this.isEntering()) {
        task.onEnter();
      }
      // 2)執行此task操作
      task.execute(service);
      // 3)得到此task後的Datanode的下一個State
      DatanodeStateMachine.DatanodeStates newState = task.await(time, unit);
      if (this.state != newState) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Task {} executed, state transited from {} to {}",
              task.getClass().getSimpleName(), this.state, newState);
        }
        if (isExiting(newState)) {
          task.onExit();
        }
        // 4)設置當前Datanode的狀態未下一階段狀態
        this.setState(newState);
      }
      ...
    }
  }

從這裏我們可以看到，Datanode首先進行的自身對應狀態task的執行，在這裏具體地來說是下面2個狀態：

  public DatanodeState<DatanodeStateMachine.DatanodeStates> getTask() {
    switch (this.state) {
    case INIT:
      // 初始狀態任務
      return new InitDatanodeState(this.conf, parent.getConnectionManager(),
          this);
    case RUNNING:
      // 正常運行時Datanode狀態任務
      return new RunningDatanodeState(this.conf, parent.getConnectionManager(),
          this);
    case SHUTDOWN:
      return null;
    default:
      throw new IllegalArgumentException("Not Implemented yet.");
    }
  }

在InitDatanodeState的call執行方法內，主要是一些基本配置信息的添加操作等等，

  public DatanodeStateMachine.DatanodeStates call() throws Exception {
    Collection<InetSocketAddress> addresses = null;
    try {
      addresses = getSCMAddresses(conf);
    } catch (IllegalArgumentException e) {
      if(!Strings.isNullOrEmpty(e.getMessage())) {
        LOG.error("Failed to get SCM addresses: " + e.getMessage());
      }
      return DatanodeStateMachine.DatanodeStates.SHUTDOWN;
    }

    if (addresses.isEmpty()) {
      LOG.error("Null or empty SCM address list found.");
      return DatanodeStateMachine.DatanodeStates.SHUTDOWN;
    } else {
      for (InetSocketAddress addr : addresses) {
        if (addr.isUnresolved()) {
          LOG.warn("One SCM address ({}) can't (yet?) be resolved. Postpone "
              + "initialization.", addr);

          //skip any further initialization. DatanodeStateMachine will try it
          // again after the hb frequency
          return this.context.getState();
        }
      }
      for (InetSocketAddress addr : addresses) {
        connectionManager.addSCMServer(addr);
      }
      InetSocketAddress reconAddress = getReconAddresses(conf);
      if (reconAddress != null) {
        connectionManager.addReconServer(reconAddress);
      }
    }

    // If datanode ID is set, persist it to the ID file.
    persistContainerDatanodeDetails();

    return this.context.getState().getNextState();
  }

RunningDatanodeState纔是我們所主要關心的操作，但是在RunningDatanodeState中，又進行了進一步的狀態劃分，如下所示：

public void execute(ExecutorService executor) {
    ecs = new ExecutorCompletionService<>(executor);
    for (EndpointStateMachine endpoint : connectionManager.getValues()) {
      // 1)獲取當前需要執行的Endpoint task
      Callable<EndPointStates> endpointTask = getEndPointTask(endpoint);
      if (endpointTask != null) {
        // 2)執行Endpoint task
        ecs.submit(endpointTask);
      } else {
        // This can happen if a task is taking more time than the timeOut
        // specified for the task in await, and when it is completed the task
        // has set the state to Shutdown, we may see the state as shutdown
        // here. So, we need to Shutdown DatanodeStateMachine.
        LOG.error("State is Shutdown in RunningDatanodeState");
        context.setState(DatanodeStateMachine.DatanodeStates.SHUTDOWN);
      }
    }
}
  
private Callable<EndpointStateMachine.EndPointStates>
      getEndPointTask(EndpointStateMachine endpoint) {
    switch (endpoint.getState()) {
    case GETVERSION:
      return new VersionEndpointTask(endpoint, conf, context.getParent()
          .getContainer());
    case REGISTER:
      return  RegisterEndpointTask.newBuilder()
          .setConfig(conf)
          .setEndpointStateMachine(endpoint)
          .setContext(context)
          .setDatanodeDetails(context.getParent().getDatanodeDetails())
          .setOzoneContainer(context.getParent().getContainer())
          .build();
    case HEARTBEAT:
      return HeartbeatEndpointTask.newBuilder()
          .setConfig(conf)
          .setEndpointStateMachine(endpoint)
          .setDatanodeDetails(context.getParent().getDatanodeDetails())
          .setContext(context)
          .build();
    case SHUTDOWN:
      break;
    default:
      throw new IllegalArgumentException("Illegal Argument.");
     }
    return null;
   }

上述4種Endpoint的狀態代表的意思是Datanode進入正式RUNNING前的幾個階段步驟：

獲取VERSION信息
註冊行爲
正常心跳彙報行爲
最後是SHUTDOWN服務停止操作

每個Endpoint Task在執行完內部需要的操作後，會更新當期Endpoint的State爲下一State，然後Datanode會在下次重新執行getTask方法時獲取到新的task實例來執行。除了因爲異常導致的SHUTDOWN State，Datanode在穩定運行時執行的狀態是RunningDatanodeState中的HeartbeatEndpointTask。

Datanode在這整個啓動過程中的流程圖如下所示：

在上述過程中，除了admin用戶主動stop Datanode的操作外，出現磁盤沒空間這類的錯誤也會導致Datanode切換到SHUTDOWN狀態，然後停止服務。

  /**
   * Stop the daemon thread of the datanode state machine.
   */
  public synchronized void stopDaemon() {
    try {
      supervisor.stop();
      // 外界停止Datanode daemon進程服務導致狀態切換爲SHUTDOWN
      context.setState(DatanodeStates.SHUTDOWN);
      reportManager.shutdown();
      this.close();
      LOG.info("Ozone container server stopped.");
    } catch (IOException e) {
      LOG.error("Stop ozone container server failed.", e);
    }
  }

另外一種情況是Endpoint task的SHUTDOWN狀態觸發切換爲Datanode的SHUTDOWN狀態

  public EndpointStateMachine.EndPointStates call() throws Exception {
    rpcEndPoint.lock();

    try {
      ...
    } catch (DiskOutOfSpaceException ex) {
      // 當拋出檢磁盤空間不足異常時，Endpoint task的狀態置爲SHUTDOWN
      rpcEndPoint.setState(EndpointStateMachine.EndPointStates.SHUTDOWN);
    } catch(IOException ex) {
      rpcEndPoint.logIfNeeded(ex);
    } finally {
      rpcEndPoint.unlock();
    }
    return rpcEndPoint.getState();
  }
}

  private DatanodeStateMachine.DatanodeStates
      computeNextContainerState(
      List<Future<EndPointStates>> results) {
    for (Future<EndPointStates> state : results) {
      try {
        if (state.get() == EndPointStates.SHUTDOWN) {
          // if any endpoint tells us to shutdown we move to shutdown state.
          return DatanodeStateMachine.DatanodeStates.SHUTDOWN;
        }
      } catch (InterruptedException | ExecutionException e) {
        LOG.error("Error in executing end point task.", e);
      }
    }
    return DatanodeStateMachine.DatanodeStates.RUNNING;
  }

Datanode的心跳彙報過程

下面我們來看Datanode另外一部分心跳彙報的過程。在Ozone中，Datanode彙報的心跳信息主要爲Container，Node，Pipeline這3類信息。在心跳的處理過程中，Datanode主要有2兩部工作要做：

收集獲取自身的Container，Node，Pipeline信息，然後將這些信息heartbeat到SCM服務中去。
收到SCM的response命令回覆，執行後續response的action操作。

此代碼邏輯如下(HeartbeatEndpointTask內的call方法)，

  public EndpointStateMachine.EndPointStates call() throws Exception {
    rpcEndpoint.lock();
    SCMHeartbeatRequestProto.Builder requestBuilder = null;
    try {
      Preconditions.checkState(this.datanodeDetailsProto != null);

      // 1)構建SCM心跳請求實例
      requestBuilder = SCMHeartbeatRequestProto.newBuilder()
          .setDatanodeDetails(datanodeDetailsProto);
      // 2)添加心跳報告等信息到請求中
      addReports(requestBuilder);
      addContainerActions(requestBuilder);
      addPipelineActions(requestBuilder);
      SCMHeartbeatRequestProto request = requestBuilder.build();
      if (LOG.isDebugEnabled()) {
        LOG.debug("Sending heartbeat message :: {}", request.toString());
      }
      // 3)通過SCM Client接口發送心跳
      SCMHeartbeatResponseProto reponse = rpcEndpoint.getEndPoint()
          .sendHeartbeat(request);
      // 4)處理心跳返回命令操作
      processResponse(reponse, datanodeDetailsProto);
      rpcEndpoint.setLastSuccessfulHeartbeat(ZonedDateTime.now());
      rpcEndpoint.zeroMissedCount();
    } catch (IOException ex) {
       ...
      }
      rpcEndpoint.logIfNeeded(ex);
    } finally {
      rpcEndpoint.unlock();
    }
    return rpcEndpoint.getState();
  }

在收集Container Report信息這塊，在Datanode服務內部有ReportManager服務來做Container Report的獲取。ReportManager通過ReportPublisher，從OzoneContainer中獲取節點維護的Container相關信息。OzoneContainer類是負責處理Datanode Container層的邏輯處理的。

而在心跳回覆命令處理過程中，Heartbeat task將返回得到的SCM命令加入到一個Queue內，然後Datanode StateMachine從這個Queue中拉取SCM命令，分發到對應所屬的CommandHandler內，進行處理。

因此在這裏，ReportPublisher和CommandHandler構成了一前一後的主要環節。

  public DatanodeStateMachine(DatanodeDetails datanodeDetails,
      Configuration conf, CertificateClient certClient,
      HddsDatanodeStopService hddsDatanodeStopService) throws IOException {
    OzoneConfiguration ozoneConf = new OzoneConfiguration(conf);
    DatanodeConfiguration dnConf =
        ozoneConf.getObject(DatanodeConfiguration.class);

    ...

    // Command Handler的初始化
    commandDispatcher = CommandDispatcher.newBuilder()
        .addHandler(new CloseContainerCommandHandler())
        .addHandler(new DeleteBlocksCommandHandler(container.getContainerSet(),
            conf))
        .addHandler(new ReplicateContainerCommandHandler(conf, supervisor))
        .addHandler(new DeleteContainerCommandHandler(
            dnConf.getContainerDeleteThreads()))
        .addHandler(new ClosePipelineCommandHandler())
        .addHandler(new CreatePipelineCommandHandler(conf))
        .setConnectionManager(connectionManager)
        .setContainer(container)
        .setContext(context)
        .build();

    // Report Publisher的初始化
    reportManager = ReportManager.newBuilder(conf)
        .setStateContext(context)
        .addPublisherFor(NodeReportProto.class)
        .addPublisherFor(ContainerReportsProto.class)
        .addPublisherFor(CommandStatusReportsProto.class)
        .addPublisherFor(PipelineReportsProto.class)
        .build();
  }

CommandHandler跑在獨立的線程內執行，

  /**
   * Create a command handler thread.
   *
   * @param config
   */
  private void initCommandHandlerThread(Configuration config) {
    ...
    Runnable processCommandQueue = () -> {
      long now;
      while (getContext().getState() != DatanodeStates.SHUTDOWN) {
        // 1）從StateContext中獲取下一條需要執行的SCM回覆命令，HeartbeatEndpointTask會往StateContext的Command Queue中加命令
        SCMCommand command = getContext().getNextCommand();
        if (command != null) {
          // 2）commandDispatcher處理獲取到的命令
          commandDispatcher.handle(command);
          commandsHandled++;
        } else {
          try {
            // Sleep till the next HB + 1 second.
            now = Time.monotonicNow();
            if (nextHB.get() > now) {
              Thread.sleep((nextHB.get() - now) + 1000L);
            }
          } catch (InterruptedException e) {
            // Ignore this exception.
          }
        }
      }
    };

    // We will have only one thread for command processing in a datanode.
    cmdProcessThread = getCommandHandlerThread(processCommandQueue);
    cmdProcessThread.start();
  }

此部分心跳的整個過程如下圖所示，

當然Ozone Datanode內部還有其它運行過程，例如數據的定期自檢過程，類似Data Healthy Scanner，還有數據請求的處理流程，這個在之前的文章Ozone Datanode的分佈式元數據管理)已經闡述過了，感興趣的同學可以繼續閱讀相關文章。

Ozone Datanode啓動過程以及心跳彙報過程分析

文章目錄

前言

Ozone Datanode的服務啓動

Datanode的心跳彙報過程

推薦2款開源、美觀的WinForm UI控件庫

NET9 AspnetCore將整合OpenAPI的文檔生成功能而無需三方庫

在Linux下管理MySQL的大小寫敏感性

HDFS Rolling Upgrade的實現要點分析

Alluxio基於冷熱數據分離的元數據管理策略

存儲系統元數據管理演變升級

Ozone的Erasure Coding方案設計

Ozone數據寫入過程分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結