前言
在之前的文章中,筆者分析過Ozone Datanode內的數據處理過程(Ozone Datanode的分佈式元數據管理),包括Container,Chunk文件級別的操作處理邏輯。本文筆者繼續闡述Datanode服務內部的另外一部分的處理過程:Datanode服務啓動以及心跳發送給SCM服務的過程。瞭解此本部分過程,能更加地讓我們瞭解Datanode服務的正常運行過程是怎樣的。如文章標題所述,下面筆者將會分成2個過程進行闡述。
Ozone Datanode的服務啓動
Ozone Datanode服務在名稱上雖說和HDFS Datanode是同名的,但在服務本身的實現還是存在較大差異的。Ozone Datanode在這裏提供的是基於Container級別的容器存儲服務,被SCM服務所管理。
下面我們來看看Datanode服務的正式啓動過程。總的來說,Datanode在啓動過程中經歷了兩次的狀態變化:
- 一個爲Datanode State的變化
- 另一個爲Datanode Running State內部的Endpoint State的變化
上面說的這2點筆者在下文中還會繼續提到。這裏的啓動過程是這樣的:
首先是Datanode service的啓動,隨後會啓動DatanodeStateMachine daemon線程服務。
public void start() {
...
OzoneConfiguration.activate();
HddsUtils.initializeMetrics(conf, "HddsDatanode");
try {
...
datanodeStateMachine = new DatanodeStateMachine(datanodeDetails, conf,
dnCertClient, this::terminateDatanode);
try {
httpServer = new HddsDatanodeHttpServer(conf);
httpServer.start();
} catch (Exception ex) {
LOG.error("HttpServer failed to start.", ex);
}
startPlugins();
// 啓動Datanode狀態機線程
datanodeStateMachine.startDaemon();
} catch (IOException e) {
throw new RuntimeException("Can't start the HDDS datanode plugin", e);
} catch (AuthenticationException ex) {
throw new RuntimeException("Fail to authentication when starting" +
" HDDS datanode plugin", ex);
}
}
隨後DatanodeStateMachine的start方法內會進行週期性的狀態機操作執行:
/**
* Runs the state machine at a fixed frequency.
*/
private void start() throws IOException {
long now = 0;
...
// 如果Datanode當前狀態不是SHUTDOWN狀態,則繼續進行loop循環
while (context.getState() != DatanodeStates.SHUTDOWN) {
try {
LOG.debug("Executing cycle Number : {}", context.getExecutionCount());
long heartbeatFrequency = context.getHeartbeatFrequency();
nextHB.set(Time.monotonicNow() + heartbeatFrequency);
// 執行當前Datanode狀態對應應當執行的task任務,在RUNNING狀態時,指的就是heartbeat任務
context.execute(executorService, heartbeatFrequency,
TimeUnit.MILLISECONDS);
now = Time.monotonicNow();
if (now < nextHB.get()) {
if(!Thread.interrupted()) {
// 睡眠等待下一次heartbeat的執行時間
Thread.sleep(nextHB.get() - now);
}
}
} catch (InterruptedException e) {
// Some one has sent interrupt signal, this could be because
// 1. Trigger heartbeat immediately
// 2. Shutdown has be initiated.
} catch (Exception e) {
LOG.error("Unable to finish the execution.", e);
}
}
...
}
進入上面的context的execute方法,
public void execute(ExecutorService service, long time, TimeUnit unit)
throws InterruptedException, ExecutionException, TimeoutException {
stateExecutionCount.incrementAndGet();
// 1)獲取當前狀態對應的task
DatanodeState<DatanodeStateMachine.DatanodeStates> task = getTask();
// Adding not null check, in a case where datanode is still starting up, but
// we called stop DatanodeStateMachine, this sets state to SHUTDOWN, and
// there is a chance of getting task as null.
if (task != null) {
if (this.isEntering()) {
task.onEnter();
}
// 2)執行此task操作
task.execute(service);
// 3)得到此task後的Datanode的下一個State
DatanodeStateMachine.DatanodeStates newState = task.await(time, unit);
if (this.state != newState) {
if (LOG.isDebugEnabled()) {
LOG.debug("Task {} executed, state transited from {} to {}",
task.getClass().getSimpleName(), this.state, newState);
}
if (isExiting(newState)) {
task.onExit();
}
// 4)設置當前Datanode的狀態未下一階段狀態
this.setState(newState);
}
...
}
}
從這裏我們可以看到,Datanode首先進行的自身對應狀態task的執行,在這裏具體地來說是下面2個狀態:
public DatanodeState<DatanodeStateMachine.DatanodeStates> getTask() {
switch (this.state) {
case INIT:
// 初始狀態任務
return new InitDatanodeState(this.conf, parent.getConnectionManager(),
this);
case RUNNING:
// 正常運行時Datanode狀態任務
return new RunningDatanodeState(this.conf, parent.getConnectionManager(),
this);
case SHUTDOWN:
return null;
default:
throw new IllegalArgumentException("Not Implemented yet.");
}
}
在InitDatanodeState的call執行方法內,主要是一些基本配置信息的添加操作等等,
public DatanodeStateMachine.DatanodeStates call() throws Exception {
Collection<InetSocketAddress> addresses = null;
try {
addresses = getSCMAddresses(conf);
} catch (IllegalArgumentException e) {
if(!Strings.isNullOrEmpty(e.getMessage())) {
LOG.error("Failed to get SCM addresses: " + e.getMessage());
}
return DatanodeStateMachine.DatanodeStates.SHUTDOWN;
}
if (addresses.isEmpty()) {
LOG.error("Null or empty SCM address list found.");
return DatanodeStateMachine.DatanodeStates.SHUTDOWN;
} else {
for (InetSocketAddress addr : addresses) {
if (addr.isUnresolved()) {
LOG.warn("One SCM address ({}) can't (yet?) be resolved. Postpone "
+ "initialization.", addr);
//skip any further initialization. DatanodeStateMachine will try it
// again after the hb frequency
return this.context.getState();
}
}
for (InetSocketAddress addr : addresses) {
connectionManager.addSCMServer(addr);
}
InetSocketAddress reconAddress = getReconAddresses(conf);
if (reconAddress != null) {
connectionManager.addReconServer(reconAddress);
}
}
// If datanode ID is set, persist it to the ID file.
persistContainerDatanodeDetails();
return this.context.getState().getNextState();
}
RunningDatanodeState纔是我們所主要關心的操作,但是在RunningDatanodeState中,又進行了進一步的狀態劃分,如下所示:
public void execute(ExecutorService executor) {
ecs = new ExecutorCompletionService<>(executor);
for (EndpointStateMachine endpoint : connectionManager.getValues()) {
// 1)獲取當前需要執行的Endpoint task
Callable<EndPointStates> endpointTask = getEndPointTask(endpoint);
if (endpointTask != null) {
// 2)執行Endpoint task
ecs.submit(endpointTask);
} else {
// This can happen if a task is taking more time than the timeOut
// specified for the task in await, and when it is completed the task
// has set the state to Shutdown, we may see the state as shutdown
// here. So, we need to Shutdown DatanodeStateMachine.
LOG.error("State is Shutdown in RunningDatanodeState");
context.setState(DatanodeStateMachine.DatanodeStates.SHUTDOWN);
}
}
}
private Callable<EndpointStateMachine.EndPointStates>
getEndPointTask(EndpointStateMachine endpoint) {
switch (endpoint.getState()) {
case GETVERSION:
return new VersionEndpointTask(endpoint, conf, context.getParent()
.getContainer());
case REGISTER:
return RegisterEndpointTask.newBuilder()
.setConfig(conf)
.setEndpointStateMachine(endpoint)
.setContext(context)
.setDatanodeDetails(context.getParent().getDatanodeDetails())
.setOzoneContainer(context.getParent().getContainer())
.build();
case HEARTBEAT:
return HeartbeatEndpointTask.newBuilder()
.setConfig(conf)
.setEndpointStateMachine(endpoint)
.setDatanodeDetails(context.getParent().getDatanodeDetails())
.setContext(context)
.build();
case SHUTDOWN:
break;
default:
throw new IllegalArgumentException("Illegal Argument.");
}
return null;
}
上述4種Endpoint的狀態代表的意思是Datanode進入正式RUNNING前的幾個階段步驟:
- 獲取VERSION信息
- 註冊行爲
- 正常心跳彙報行爲
- 最後是SHUTDOWN服務停止操作
每個Endpoint Task在執行完內部需要的操作後,會更新當期Endpoint的State爲下一State,然後Datanode會在下次重新執行getTask方法時獲取到新的task實例來執行。除了因爲異常導致的SHUTDOWN State,Datanode在穩定運行時執行的狀態是RunningDatanodeState中的HeartbeatEndpointTask。
Datanode在這整個啓動過程中的流程圖如下所示:
在上述過程中,除了admin用戶主動stop Datanode的操作外,出現磁盤沒空間這類的錯誤也會導致Datanode切換到SHUTDOWN狀態,然後停止服務。
/**
* Stop the daemon thread of the datanode state machine.
*/
public synchronized void stopDaemon() {
try {
supervisor.stop();
// 外界停止Datanode daemon進程服務導致狀態切換爲SHUTDOWN
context.setState(DatanodeStates.SHUTDOWN);
reportManager.shutdown();
this.close();
LOG.info("Ozone container server stopped.");
} catch (IOException e) {
LOG.error("Stop ozone container server failed.", e);
}
}
另外一種情況是Endpoint task的SHUTDOWN狀態觸發切換爲Datanode的SHUTDOWN狀態
public EndpointStateMachine.EndPointStates call() throws Exception {
rpcEndPoint.lock();
try {
...
} catch (DiskOutOfSpaceException ex) {
// 當拋出檢磁盤空間不足異常時,Endpoint task的狀態置爲SHUTDOWN
rpcEndPoint.setState(EndpointStateMachine.EndPointStates.SHUTDOWN);
} catch(IOException ex) {
rpcEndPoint.logIfNeeded(ex);
} finally {
rpcEndPoint.unlock();
}
return rpcEndPoint.getState();
}
}
private DatanodeStateMachine.DatanodeStates
computeNextContainerState(
List<Future<EndPointStates>> results) {
for (Future<EndPointStates> state : results) {
try {
if (state.get() == EndPointStates.SHUTDOWN) {
// if any endpoint tells us to shutdown we move to shutdown state.
return DatanodeStateMachine.DatanodeStates.SHUTDOWN;
}
} catch (InterruptedException | ExecutionException e) {
LOG.error("Error in executing end point task.", e);
}
}
return DatanodeStateMachine.DatanodeStates.RUNNING;
}
Datanode的心跳彙報過程
下面我們來看Datanode另外一部分心跳彙報的過程。在Ozone中,Datanode彙報的心跳信息主要爲Container,Node,Pipeline這3類信息。在心跳的處理過程中,Datanode主要有2兩部工作要做:
- 收集獲取自身的Container,Node,Pipeline信息,然後將這些信息heartbeat到SCM服務中去。
- 收到SCM的response命令回覆,執行後續response的action操作。
此代碼邏輯如下(HeartbeatEndpointTask內的call方法),
public EndpointStateMachine.EndPointStates call() throws Exception {
rpcEndpoint.lock();
SCMHeartbeatRequestProto.Builder requestBuilder = null;
try {
Preconditions.checkState(this.datanodeDetailsProto != null);
// 1)構建SCM心跳請求實例
requestBuilder = SCMHeartbeatRequestProto.newBuilder()
.setDatanodeDetails(datanodeDetailsProto);
// 2)添加心跳報告等信息到請求中
addReports(requestBuilder);
addContainerActions(requestBuilder);
addPipelineActions(requestBuilder);
SCMHeartbeatRequestProto request = requestBuilder.build();
if (LOG.isDebugEnabled()) {
LOG.debug("Sending heartbeat message :: {}", request.toString());
}
// 3)通過SCM Client接口發送心跳
SCMHeartbeatResponseProto reponse = rpcEndpoint.getEndPoint()
.sendHeartbeat(request);
// 4)處理心跳返回命令操作
processResponse(reponse, datanodeDetailsProto);
rpcEndpoint.setLastSuccessfulHeartbeat(ZonedDateTime.now());
rpcEndpoint.zeroMissedCount();
} catch (IOException ex) {
...
}
rpcEndpoint.logIfNeeded(ex);
} finally {
rpcEndpoint.unlock();
}
return rpcEndpoint.getState();
}
在收集Container Report信息這塊,在Datanode服務內部有ReportManager服務來做Container Report的獲取。ReportManager通過ReportPublisher,從OzoneContainer中獲取節點維護的Container相關信息。OzoneContainer類是負責處理Datanode Container層的邏輯處理的。
而在心跳回覆命令處理過程中,Heartbeat task將返回得到的SCM命令加入到一個Queue內,然後Datanode StateMachine從這個Queue中拉取SCM命令,分發到對應所屬的CommandHandler內,進行處理。
因此在這裏,ReportPublisher和CommandHandler構成了一前一後的主要環節。
public DatanodeStateMachine(DatanodeDetails datanodeDetails,
Configuration conf, CertificateClient certClient,
HddsDatanodeStopService hddsDatanodeStopService) throws IOException {
OzoneConfiguration ozoneConf = new OzoneConfiguration(conf);
DatanodeConfiguration dnConf =
ozoneConf.getObject(DatanodeConfiguration.class);
...
// Command Handler的初始化
commandDispatcher = CommandDispatcher.newBuilder()
.addHandler(new CloseContainerCommandHandler())
.addHandler(new DeleteBlocksCommandHandler(container.getContainerSet(),
conf))
.addHandler(new ReplicateContainerCommandHandler(conf, supervisor))
.addHandler(new DeleteContainerCommandHandler(
dnConf.getContainerDeleteThreads()))
.addHandler(new ClosePipelineCommandHandler())
.addHandler(new CreatePipelineCommandHandler(conf))
.setConnectionManager(connectionManager)
.setContainer(container)
.setContext(context)
.build();
// Report Publisher的初始化
reportManager = ReportManager.newBuilder(conf)
.setStateContext(context)
.addPublisherFor(NodeReportProto.class)
.addPublisherFor(ContainerReportsProto.class)
.addPublisherFor(CommandStatusReportsProto.class)
.addPublisherFor(PipelineReportsProto.class)
.build();
}
CommandHandler跑在獨立的線程內執行,
/**
* Create a command handler thread.
*
* @param config
*/
private void initCommandHandlerThread(Configuration config) {
...
Runnable processCommandQueue = () -> {
long now;
while (getContext().getState() != DatanodeStates.SHUTDOWN) {
// 1)從StateContext中獲取下一條需要執行的SCM回覆命令,HeartbeatEndpointTask會往StateContext的Command Queue中加命令
SCMCommand command = getContext().getNextCommand();
if (command != null) {
// 2)commandDispatcher處理獲取到的命令
commandDispatcher.handle(command);
commandsHandled++;
} else {
try {
// Sleep till the next HB + 1 second.
now = Time.monotonicNow();
if (nextHB.get() > now) {
Thread.sleep((nextHB.get() - now) + 1000L);
}
} catch (InterruptedException e) {
// Ignore this exception.
}
}
}
};
// We will have only one thread for command processing in a datanode.
cmdProcessThread = getCommandHandlerThread(processCommandQueue);
cmdProcessThread.start();
}
此部分心跳的整個過程如下圖所示,
當然Ozone Datanode內部還有其它運行過程,例如數據的定期自檢過程,類似Data Healthy Scanner,還有數據請求的處理流程,這個在之前的文章Ozone Datanode的分佈式元數據管理)已經闡述過了,感興趣的同學可以繼續閱讀相關文章。