瞭解nacos的心跳機制,需要先了解nacos的服務註冊原理;可先閱讀https://blog.csdn.net/LiaoHongHB/article/details/103993074
當nacos進行服務註冊的時候,NacosServiceRegistry.class會調用register()方法進行服務註冊,該方法中調用了namingService.registerInstance()方法進行服務註冊的邏輯。
@Override
public void register(Registration registration) {
if (StringUtils.isEmpty(registration.getServiceId())) {
log.warn("No service to register for nacos client...");
return;
}
String serviceId = registration.getServiceId();
Instance instance = new Instance();
instance.setIp(registration.getHost());
instance.setPort(registration.getPort());
instance.setWeight(nacosDiscoveryProperties.getWeight());
instance.setClusterName(nacosDiscoveryProperties.getClusterName());
instance.setMetadata(registration.getMetadata());
try {
namingService.registerInstance(serviceId, instance);
log.info("nacos registry, {} {}:{} register finished", serviceId,
instance.getIp(), instance.getPort());
}
catch (Exception e) {
log.error("nacos registry, {} register failed...{},", serviceId,
registration.toString(), e);
}
}
NacosNamingService實現了NamingService的接口;然後在namingService.registerInstance()方法中,會做兩件事情,第一件事就是組裝心跳包BeatInfo,並且發送心跳:
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
if (instance.isEphemeral()) {
BeatInfo beatInfo = new BeatInfo();
beatInfo.setServiceName(NamingUtils.getGroupedName(serviceName, groupName));
beatInfo.setIp(instance.getIp());
beatInfo.setPort(instance.getPort());
beatInfo.setCluster(instance.getClusterName());
beatInfo.setWeight(instance.getWeight());
beatInfo.setMetadata(instance.getMetadata());
beatInfo.setScheduled(false);
this.beatReactor.addBeatInfo(NamingUtils.getGroupedName(serviceName, groupName), beatInfo);
}
this.serverProxy.registerService(NamingUtils.getGroupedName(serviceName, groupName), groupName, instance);
}
NacosNamingService中的構造函數,會調用init()方法,然後在init方法中會執行一個BeatReactor線程
NacosNamingService中的構造函數和init()方法:
public NacosNamingService(Properties properties) {
this.init(properties);
}
private void init(Properties properties) {
this.serverList = properties.getProperty("serverAddr");
this.initNamespace(properties);
this.initEndpoint(properties);
this.initWebRootContext();
this.initCacheDir();
this.initLogName(properties);
this.eventDispatcher = new EventDispatcher();
this.serverProxy = new NamingProxy(this.namespace, this.endpoint, this.serverList);
this.serverProxy.setProperties(properties);
//執行心跳的線程
this.beatReactor = new BeatReactor(this.serverProxy, this.initClientBeatThreadCount(properties));
this.hostReactor = new HostReactor(this.eventDispatcher, this.serverProxy, this.cacheDir, this.isLoadCacheAtStart(properties), this.initPollingThreadCount(properties));
}
BeatReactor的構造函數中創建了一個ScheduledExecutorService線程操作對象,然後執行的方法是BeatReactor.BeatProcessor();在BeatProcessor()方法中又執行了一個線程操作,BeatTask線程,然後在BeatTask線程中調用了sendBeat()方法,將心跳包作爲參數;
BeatReactor的構造函數:創建一個線程執行類,並執行BeatProcessor()方法
public BeatReactor(NamingProxy serverProxy, int threadCount) {
this.clientBeatInterval = 5000L;
this.dom2Beat = new ConcurrentHashMap();
this.serverProxy = serverProxy;
//創建一個線程執行類,並執行BeatProcessor()方法
this.executorService = new ScheduledThreadPoolExecutor(threadCount, new ThreadFactory() {
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setDaemon(true);
thread.setName("com.alibaba.nacos.naming.beat.sender");
return thread;
}
});
this.executorService.schedule(new BeatReactor.BeatProcessor(), 0L, TimeUnit.MILLISECONDS);
}
BeatProcessor類中的線程操作:執行一個BeatTask線程
public void run() {
try {
Iterator var1 = BeatReactor.this.dom2Beat.entrySet().iterator();
while(var1.hasNext()) {
Entry<String, BeatInfo> entry = (Entry)var1.next();
BeatInfo beatInfo = (BeatInfo)entry.getValue();
if (!beatInfo.isScheduled()) {
beatInfo.setScheduled(true);
//執行一個BeatTask線程
BeatReactor.this.executorService.schedule(BeatReactor.this.new BeatTask(beatInfo), 0L, TimeUnit.MILLISECONDS);
}
}
} catch (Exception var7) {
LogUtils.NAMING_LOGGER.error("[CLIENT-BEAT] Exception while scheduling beat.", var7);
} finally {
BeatReactor.this.executorService.schedule(this, BeatReactor.this.clientBeatInterval, TimeUnit.MILLISECONDS);
}
}
BeatTask線程操作:調用sendBeat()方法
class BeatTask implements Runnable {
BeatInfo beatInfo;
public BeatTask(BeatInfo beatInfo) {
this.beatInfo = beatInfo;
}
public void run() {
//調用sendBeat()方法
long result = BeatReactor.this.serverProxy.sendBeat(this.beatInfo);
this.beatInfo.setScheduled(false);
if (result > 0L) {
BeatReactor.this.clientBeatInterval = result;
}
}
}
在sendBeat()方法中,通過http服務,調用了InstanceController.beat()方法,進行心跳的確認:
public long sendBeat(BeatInfo beatInfo) {
try {
LogUtils.NAMING_LOGGER.info("[BEAT] {} sending beat to server: {}", this.namespaceId, beatInfo.toString());
Map<String, String> params = new HashMap(4);
params.put("beat", JSON.toJSONString(beatInfo));
params.put("namespaceId", this.namespaceId);
params.put("serviceName", beatInfo.getServiceName());
//http遠程調用
String result = this.reqAPI(UtilAndComs.NACOS_URL_BASE + "/instance/beat", params, (String)"PUT");
JSONObject jsonObject = JSON.parseObject(result);
if (jsonObject != null) {
return jsonObject.getLong("clientBeatInterval").longValue();
}
} catch (Exception var5) {
LogUtils.NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: " + JSON.toJSONString(beatInfo), var5);
}
return 0L;
}
InstanceController.beat()方法
在InstanceController.beat()方法中,調用了service.processClientBeat(clientBeat)方法;在該方法中調用了HealthCheckReactor.scheduleNow(clientBeatProcessor)方法執行clientBeatProcessor的線程操作;在clientBeatProcessor線程操作中,會通過當前的ip+port找到對應的當前實例,然後調用setLastBeat()方法,最後將當前發送心跳的時間賦值到對應的屬性中:
InstanceController.beat():
service.processClientBeat(clientBeat);
service.processClientBeat():
public void processClientBeat(final RsInfo rsInfo) {
ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
clientBeatProcessor.setService(this);
clientBeatProcessor.setRsInfo(rsInfo);
//執行一個clientBeatProcessor線程對象
HealthCheckReactor.scheduleNow(clientBeatProcessor);
}
HealthCheckReactor.scheduleNow:
public static ScheduledFuture<?> scheduleNow(Runnable task) {
return EXECUTOR.schedule(task, 0, TimeUnit.MILLISECONDS);
}
clientBeatProcessor線程操作:
public void run() {
Service service = this.service;
if (Loggers.EVT_LOG.isDebugEnabled()) {
Loggers.EVT_LOG.debug("[CLIENT-BEAT] processing beat: {}", rsInfo.toString());
}
String ip = rsInfo.getIp();
String clusterName = rsInfo.getCluster();
int port = rsInfo.getPort();
Cluster cluster = service.getClusterMap().get(clusterName);
List<Instance> instances = cluster.allIPs(true);
for (Instance instance : instances) {
//根據ip+port獲取當前的實例
if (instance.getIp().equals(ip) && instance.getPort() == port) {
if (Loggers.EVT_LOG.isDebugEnabled()) {
Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
}
//設置當前發送心跳的時間
instance.setLastBeat(System.currentTimeMillis());
if (!instance.isMarked()) {
if (!instance.isHealthy()) {
instance.setHealthy(true);
Loggers.EVT_LOG.info("service: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
cluster.getService().getName(), ip, port, cluster.getName(), UtilsAndCommons.LOCALHOST_SITE);
getPushService().serviceChanged(service);
}
}
}
}
}
至此,nacos發送心跳的過程就到此結束。
接下倆還要分析的是,nacos是如何定時通過心跳機制判斷實例是否存活的原理。
前面說到,namingService.registerInstance()方法中,會做兩件事情,第一件事就是組裝心跳包BeatInfo,並且發送心跳:
那麼第二件事情就是向nacos註冊實例,也是通過http調用的方式,將請求發送到InstanceController.register()方法中:
@PostMapping
public String register(HttpServletRequest request) throws Exception {
String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
serviceManager.registerInstance(namespaceId, serviceName, parseInstance(request));
return "ok";
}
該方法中調用了serviceManager.registerInstance方法,registerInstance方法中的邏輯如下:
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
//創建service對象
createEmptyService(namespaceId, serviceName, instance.isEphemeral());
Service service = getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.INVALID_PARAM,
"service not found, namespace: " + namespaceId + ", service: " + serviceName);
}
//將創建好的service對象放入到內存中
addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
}
首先會創建一個service對象,然後將該對象放入到內存中;在創建service對象的時候,邏輯如下:
public void createEmptyService(String namespaceId, String serviceName, boolean local) throws NacosException {
createServiceIfAbsent(namespaceId, serviceName, local, null);
}
public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster) throws NacosException {
Service service = getService(namespaceId, serviceName);
if (service == null) {
Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName);
service = new Service();
service.setName(serviceName);
service.setNamespaceId(namespaceId);
service.setGroupName(NamingUtils.getGroupName(serviceName));
// now validate the service. if failed, exception will be thrown
service.setLastModifiedMillis(System.currentTimeMillis());
service.recalculateChecksum();
if (cluster != null) {
cluster.setService(service);
service.getClusterMap().put(cluster.getName(), cluster);
}
service.validate();
putServiceAndInit(service);
if (!local) {
addOrReplaceService(service);
}
}
}
創建完service對象之後,調用了putServiceAndInit方法:
private void putServiceAndInit(Service service) throws NacosException {
putService(service);
service.init();
consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJSON());
}
主要看service.init()方法:
public void init() {
HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
entry.getValue().setService(this);
entry.getValue().init();
}
}
該方法中通過HealthCheckReactor.scheduleCheck(clientBeatCheckTask)調用了一個clientBeatCheckTask任務線程,進入到
scheduleCheck方法中:
public static void scheduleCheck(ClientBeatCheckTask task) {
futureMap.putIfAbsent(task.taskKey(), EXECUTOR.scheduleWithFixedDelay(task, 5000, 5000, TimeUnit.MILLISECONDS));
}
發現,該方法中是開啓了一個定時任務,這個任務是每隔5s就執行一次ClientBeatCheckTask線程操作;接下來看ClientBeatCheckTask線程操作:
@Override
public void run() {
try {
if (!getDistroMapper().responsible(service.getName())) {
return;
}
if (!getSwitchDomain().isHealthCheckEnabled()) {
return;
}
List<Instance> instances = service.allIPs(true);
// first set health status of instances:
for (Instance instance : instances) {
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
if (!instance.isMarked()) {
if (instance.isHealthy()) {
instance.setHealthy(false);
Loggers.EVT_LOG.info("{POS} {IP-DISABLED} valid: {}:{}@{}@{}, region: {}, msg: client timeout after {}, last beat: {}",
instance.getIp(), instance.getPort(), instance.getClusterName(), service.getName(),
UtilsAndCommons.LOCALHOST_SITE, instance.getInstanceHeartBeatTimeOut(), instance.getLastBeat());
getPushService().serviceChanged(service);
SpringContext.getAppContext().publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
}
}
}
}
if (!getGlobalConfig().isExpireInstance()) {
return;
}
// then remove obsolete instances:
for (Instance instance : instances) {
if (instance.isMarked()) {
continue;
}
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
// delete instance
Loggers.SRV_LOG.info("[AUTO-DELETE-IP] service: {}, ip: {}", service.getName(), JSON.toJSONString(instance));
deleteIP(instance);
}
}
} catch (Exception e) {
Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
}
}
發現ClientBeatCheckTask線程操作主要有兩個事情:
一個是遍歷所有的實例對象,判斷最後一次心跳發送的時間距離當前時間是否超過了設定的值,如果是,則將該實例的health屬性改爲false,
第二個事情是遍歷所有的實例對象,判斷最後一次心跳發送的時間距離當前時間是否超過了可刪除時間的值,如果是,則將該實例從內存中刪除。
需要注意的是,在InstanceController.beat方法中,如果instance不存在,也會自動的去創建一個instance,調用的方法同InstanceController.register()方法,所以這裏也是啓動定時線程檢查心跳機制的一個入口。
Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);
if (instance == null) {
if (clientBeat == null) {
result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
return result;
}
instance = new Instance();
instance.setPort(clientBeat.getPort());
instance.setIp(clientBeat.getIp());
instance.setWeight(clientBeat.getWeight());
instance.setMetadata(clientBeat.getMetadata());
instance.setClusterName(clusterName);
instance.setServiceName(serviceName);
instance.setInstanceId(instance.getInstanceId());
instance.setEphemeral(clientBeat.isEphemeral());
serviceManager.registerInstance(namespaceId, serviceName, instance);
}
Service service = serviceManager.getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.SERVER_ERROR,
"service not found: " + serviceName + "@" + namespaceId);
}
if (clientBeat == null) {
clientBeat = new RsInfo();
clientBeat.setIp(ip);
clientBeat.setPort(port);
clientBeat.setCluster(clusterName);
}
service.processClientBeat(clientBeat);
result.put(CommonParams.CODE, NamingResponseCode.OK);
result.put("clientBeatInterval", instance.getInstanceHeartBeatInterval());
result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
return result;