Broker的HA策略分爲兩部分
①同步元數據
②同步消息數據
同步元數據
在Slave啓動時,會啓動一個定時任務用來從master同步元數據
if (role == BrokerRole.SLAVE) {
if (null != slaveSyncFuture) {
slaveSyncFuture.cancel(false);
}
this.slaveSynchronize.setMasterAddr(null);
slaveSyncFuture = this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
BrokerController.this.slaveSynchronize.syncAll();
}
catch (Throwable e) {
log.error("ScheduledTask SlaveSynchronize syncAll error.", e);
}
}
}, 1000 * 3, 1000 * 10, TimeUnit.MILLISECONDS);
}
這裏設置了定時任務,執行slaveSynchronize的syncAll方法
可以注意在之前會通過setMasterAddr將Master的地址設爲null,這是由於在後面會通過另一個定時任務registerBrokerAll來向NameServer獲取Master的地址,詳見:
【RocketMQ中Broker的啓動源碼分析(二)】
SlaveSynchronize的syncAll方法:
public void syncAll() {
this.syncTopicConfig();
this.syncConsumerOffset();
this.syncDelayOffset();
this.syncSubscriptionGroupConfig();
}
這個方法會依次調用四個方法,來同步相應信息:
syncTopicConfig:同步topic的配置信息
syncConsumerOffset:同步Consumer的Offset信息
syncDelayOffset:同步延遲隊列信息
syncSubscriptionGroupConfig:同步訂閱信息
由於這幾個方法的實現是類似的,這裏就只看下syncTopicConfig的實現:
syncTopicConfig方法:
private void syncTopicConfig() {
String masterAddrBak = this.masterAddr;
if (masterAddrBak != null && !masterAddrBak.equals(brokerController.getBrokerAddr())) {
try {
TopicConfigSerializeWrapper topicWrapper =
this.brokerController.getBrokerOuterAPI().getAllTopicConfig(masterAddrBak);
if (!this.brokerController.getTopicConfigManager().getDataVersion()
.equals(topicWrapper.getDataVersion())) {
this.brokerController.getTopicConfigManager().getDataVersion()
.assignNewOne(topicWrapper.getDataVersion());
this.brokerController.getTopicConfigManager().getTopicConfigTable().clear();
this.brokerController.getTopicConfigManager().getTopicConfigTable()
.putAll(topicWrapper.getTopicConfigTable());
this.brokerController.getTopicConfigManager().persist();
log.info("Update slave topic config from master, {}", masterAddrBak);
}
} catch (Exception e) {
log.error("SyncTopicConfig Exception, {}", masterAddrBak, e);
}
}
}
這裏首先獲取master的地址masterAddr,由於registerBrokerAll定時任務的存在,即便這一次沒有獲取到masterAddr,只要節點中有master,總會在後面定時執行時從NameServer中獲取到
當獲取到master地址後,通過BrokerOuterAPI的getAllTopicConfig方法,向master請求
BrokerOuterAPI的getAllTopicConfig方法:
public TopicConfigSerializeWrapper getAllTopicConfig(
final String addr) throws RemotingConnectException, RemotingSendRequestException,
RemotingTimeoutException, InterruptedException, MQBrokerException {
RemotingCommand request = RemotingCommand.createRequestCommand(RequestCode.GET_ALL_TOPIC_CONFIG, null);
RemotingCommand response = this.remotingClient.invokeSync(MixAll.brokerVIPChannel(true, addr), request, 3000);
assert response != null;
switch (response.getCode()) {
case ResponseCode.SUCCESS: {
return TopicConfigSerializeWrapper.decode(response.getBody(), TopicConfigSerializeWrapper.class);
}
default:
break;
}
throw new MQBrokerException(response.getCode(), response.getRemark());
}
首先構建GET_ALL_TOPIC_CONFIG求情指令,然後通過remotingClient的invokeSync進行同步發送,注意這裏會通過MixAll的brokerVIPChannel方法,得到對應的master地址的VIP通道地址,就是端口號減2,這在我之前的博客中介紹過
有關同步發送在 【RocketMQ中Producer消息的發送源碼分析】 中詳細介紹過
請求發送給master後,來看看master是怎麼處理的
master端在收到請求後會通過AdminBrokerProcessor的processRequest方法判別請求指令:
case RequestCode.GET_ALL_TOPIC_CONFIG:
return this.getAllTopicConfig(ctx, request);
執行getAllTopicConfig方法:
private RemotingCommand getAllTopicConfig(ChannelHandlerContext ctx, RemotingCommand request) {
final RemotingCommand response = RemotingCommand.createResponseCommand(GetAllTopicConfigResponseHeader.class);
// final GetAllTopicConfigResponseHeader responseHeader =
// (GetAllTopicConfigResponseHeader) response.readCustomHeader();
String content = this.brokerController.getTopicConfigManager().encode();
if (content != null && content.length() > 0) {
try {
response.setBody(content.getBytes(MixAll.DEFAULT_CHARSET));
} catch (UnsupportedEncodingException e) {
log.error("", e);
response.setCode(ResponseCode.SYSTEM_ERROR);
response.setRemark("UnsupportedEncodingException " + e);
return response;
}
} else {
log.error("No topic in this broker, client: {}", ctx.channel().remoteAddress());
response.setCode(ResponseCode.SYSTEM_ERROR);
response.setRemark("No topic in this broker");
return response;
}
response.setCode(ResponseCode.SUCCESS);
response.setRemark(null);
return response;
}
這裏會將TopicConfigManager中保存的topicConfigTable:
private final ConcurrentMap<String, TopicConfig> topicConfigTable =
new ConcurrentHashMap<String, TopicConfig>(1024);
將這個map通過encode方法轉換成json字符串,再通過Netty發送給slave
回到slave中,在同步發送的情況下,會等待會送響應,收到響應後:
switch (response.getCode()) {
case ResponseCode.SUCCESS: {
return TopicConfigSerializeWrapper.decode(response.getBody(), TopicConfigSerializeWrapper.class);
}
default:
break;
}
通過decode解碼,將json字符串轉換爲map封裝在 TopicConfigSerializeWrapper中
回到syncTopicConfig方法中:
得到TopicConfigSerializeWrapper實例後
if (!this.brokerController.getTopicConfigManager().getDataVersion()
.equals(topicWrapper.getDataVersion())) {
this.brokerController.getTopicConfigManager().getDataVersion()
.assignNewOne(topicWrapper.getDataVersion());
this.brokerController.getTopicConfigManager().getTopicConfigTable().clear();
this.brokerController.getTopicConfigManager().getTopicConfigTable()
.putAll(topicWrapper.getTopicConfigTable());
this.brokerController.getTopicConfigManager().persist();
log.info("Update slave topic config from master, {}", masterAddrBak);
}
判斷版本是否一致,若不一致,會進行替換,這樣slave的Topic配置信息就和master保持同步了
其他三種信息的同步同理
同步消息數據
在master啓動時,會通過JDK的NIO方式啓動一個HA服務線程,用以處理slave的連接:
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
this.selector.select(1000);
Set<SelectionKey> selected = this.selector.selectedKeys();
if (selected != null) {
for (SelectionKey k : selected) {
if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) {
SocketChannel sc = ((ServerSocketChannel) k.channel()).accept();
if (sc != null) {
HAService.log.info("HAService receive new connection, "
+ sc.socket().getRemoteSocketAddress());
try {
HAConnection conn = new HAConnection(HAService.this, sc);
conn.start();
HAService.this.addConnection(conn);
} catch (Exception e) {
log.error("new HAConnection exception", e);
sc.close();
}
}
} else {
log.warn("Unexpected ops in select " + k.readyOps());
}
}
selected.clear();
}
} catch (Exception e) {
log.error(this.getServiceName() + " service has exception.", e);
}
}
log.info(this.getServiceName() + " service end");
}
這裏就是非常典型的JDK NIO的使用,在偵聽到連接取得SocketChannel後,將其封裝爲HAConnection
public HAConnection(final HAService haService, final SocketChannel socketChannel) throws IOException {
this.haService = haService;
this.socketChannel = socketChannel;
this.clientAddr = this.socketChannel.socket().getRemoteSocketAddress().toString();
this.socketChannel.configureBlocking(false);
this.socketChannel.socket().setSoLinger(false, -1);
this.socketChannel.socket().setTcpNoDelay(true);
this.socketChannel.socket().setReceiveBufferSize(1024 * 64);
this.socketChannel.socket().setSendBufferSize(1024 * 64);
this.writeSocketService = new WriteSocketService(this.socketChannel);
this.readSocketService = new ReadSocketService(this.socketChannel);
this.haService.getConnectionCount().incrementAndGet();
}
在構造方法內進行了對socketChannel的一些配置,還創建了一個WriteSocketService和一個ReadSocketService,這兩個是後續處理消息同步的基礎
在創建完HAConnection後,調用其start方法:
public void start() {
this.readSocketService.start();
this.writeSocketService.start();
}
這裏會啓動兩個線程,分別處理讀取slave發送的數據,以及向slave發送數據
到這裏,先不急着分析master了,來看看slave端
slave在啓動時,會啓動HAClient的線程:
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
if (this.connectMaster()) {
if (this.isTimeToReportOffset()) {
boolean result = this.reportSlaveMaxOffset(this.currentReportedOffset);
if (!result) {
this.closeMaster();
}
}
this.selector.select(1000);
boolean ok = this.processReadEvent();
if (!ok) {
this.closeMaster();
}
if (!reportSlaveMaxOffsetPlus()) {
continue;
}
long interval =
HAService.this.getDefaultMessageStore().getSystemClock().now()
- this.lastWriteTimestamp;
if (interval > HAService.this.getDefaultMessageStore().getMessageStoreConfig()
.getHaHousekeepingInterval()) {
log.warn("HAClient, housekeeping, found this connection[" + this.masterAddress
+ "] expired, " + interval);
this.closeMaster();
log.warn("HAClient, master not response some time, so close connection");
}
} else {
this.waitForRunning(1000 * 5);
}
} catch (Exception e) {
log.warn(this.getServiceName() + " service has exception. ", e);
this.waitForRunning(1000 * 5);
}
}
log.info(this.getServiceName() + " service end");
}
在這個while循環中,首先通過connectMaster檢查是否和master連接了
connectMaster方法:
private boolean connectMaster() throws ClosedChannelException {
if (null == socketChannel) {
String addr = this.masterAddress.get();
if (addr != null) {
SocketAddress socketAddress = RemotingUtil.string2SocketAddress(addr);
if (socketAddress != null) {
this.socketChannel = RemotingUtil.connect(socketAddress);
if (this.socketChannel != null) {
this.socketChannel.register(this.selector, SelectionKey.OP_READ);
}
}
}
this.currentReportedOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
this.lastWriteTimestamp = System.currentTimeMillis();
}
return this.socketChannel != null;
}
若是socketChannel爲null,意味着並沒有產生連接,或者連接斷開
需要重新根據masterAddress建立網絡連接
只要是需要建立連接,都需要通過defaultMessageStore的getMaxPhyOffset方法,獲取本地最大的Offset,由currentReportedOffset保存,後續用於向master報告;以及保存了一個時間戳lastWriteTimestamp,用於之後的校對
當確保與master的連接建立成功後,通過isTimeToReportOffset方法,檢查是否需要向master報告當前的最大Offset
isTimeToReportOffset方法:
private boolean isTimeToReportOffset() {
long interval =
HAService.this.defaultMessageStore.getSystemClock().now() - this.lastWriteTimestamp;
boolean needHeart = interval > HAService.this.defaultMessageStore.getMessageStoreConfig()
.getHaSendHeartbeatInterval();
return needHeart;
}
這裏就通過lastWriteTimestamp和當前時間檢查,判斷是否達到了報告時間間隔HaSendHeartbeatInterval,默認5s
若是達到了,就需要通過reportSlaveMaxOffset方法,將記錄的currentReportedOffset這個最大的offset發送給master
reportSlaveMaxOffset方法:
private boolean reportSlaveMaxOffset(final long maxOffset) {
this.reportOffset.position(0);
this.reportOffset.limit(8);
this.reportOffset.putLong(maxOffset);
this.reportOffset.position(0);
this.reportOffset.limit(8);
for (int i = 0; i < 3 && this.reportOffset.hasRemaining(); i++) {
try {
this.socketChannel.write(this.reportOffset);
} catch (IOException e) {
log.error(this.getServiceName()
+ "reportSlaveMaxOffset this.socketChannel.write exception", e);
return false;
}
}
return !this.reportOffset.hasRemaining();
}
其中reportOffset是專門用來緩存offset的ByteBuffer
private final ByteBuffer reportOffset = ByteBuffer.allocate(8);
將maxOffset存放在reportOffset中,然後通過socketChannel的write方法,完成向master的發送
其中hasRemaining方法用來檢查當前位置是否已經達到緩衝區極限limit,確保reportOffset 中的內容能被完全發送出去
發送成功後,會調用selector的select方法,在超時時間內進行NIO的輪詢,等待master的回送
通過這我們可以看出slave在和master建立連接後,會定時向master報告自己當前的offset
來看看master收到offset後是如何處理的:
在master端會通過前面提到的ReadSocketService線程進行處理:
public void run() {
HAConnection.log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
this.selector.select(1000);
boolean ok = this.processReadEvent();
if (!ok) {
HAConnection.log.error("processReadEvent error");
break;
}
long interval = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastReadTimestamp;
if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaHousekeepingInterval()) {
log.warn("ha housekeeping, found this connection[" + HAConnection.this.clientAddr + "] expired, " + interval);
break;
}
} catch (Exception e) {
HAConnection.log.error(this.getServiceName() + " service has exception.", e);
break;
}
}
this.makeStop();
writeSocketService.makeStop();
haService.removeConnection(HAConnection.this);
HAConnection.this.haService.getConnectionCount().decrementAndGet();
SelectionKey sk = this.socketChannel.keyFor(this.selector);
if (sk != null) {
sk.cancel();
}
try {
this.selector.close();
this.socketChannel.close();
} catch (IOException e) {
HAConnection.log.error("", e);
}
HAConnection.log.info(this.getServiceName() + " service end");
}
這裏的while循環中首先也是通過selector的select方法,在超時時間內進行NIO的輪詢
輪詢結束後的進一步的處理由processReadEvent來完成:
private boolean processReadEvent() {
int readSizeZeroTimes = 0;
if (!this.byteBufferRead.hasRemaining()) {
this.byteBufferRead.flip();
this.processPostion = 0;
}
while (this.byteBufferRead.hasRemaining()) {
try {
int readSize = this.socketChannel.read(this.byteBufferRead);
if (readSize > 0) {
readSizeZeroTimes = 0;
this.lastReadTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
if ((this.byteBufferRead.position() - this.processPostion) >= 8) {
int pos = this.byteBufferRead.position() - (this.byteBufferRead.position() % 8);
long readOffset = this.byteBufferRead.getLong(pos - 8);
this.processPostion = pos;
HAConnection.this.slaveAckOffset = readOffset;
if (HAConnection.this.slaveRequestOffset < 0) {
HAConnection.this.slaveRequestOffset = readOffset;
log.info("slave[" + HAConnection.this.clientAddr + "] request offset " + readOffset);
}
HAConnection.this.haService.notifyTransferSome(HAConnection.this.slaveAckOffset);
}
} else if (readSize == 0) {
if (++readSizeZeroTimes >= 3) {
break;
}
} else {
log.error("read socket[" + HAConnection.this.clientAddr + "] < 0");
return false;
}
} catch (IOException e) {
log.error("processReadEvent exception", e);
return false;
}
}
return true;
}
}
這個方法其實就是通過socketChannel的read方法,將slave發送過來的數據存入byteBufferRead中
在確保發送過來的數據能達到8字節時,取出long類型的offset值,然後交給HAConnection的slaveAckOffset成員進行保存
其中slaveRequestOffset是用來處理第一次連接時的同步
notifyTransferSome方法是作爲同步master時,進行相應的喚醒操作,異步master則沒有要求,在後面具體分析
也就是說ReadSocketService這個線程,只是不斷地讀取並更新slave發送來的offset數據
再來看看WriteSocketService線程是如何進行向slave的發送:
public void run() {
HAConnection.log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
this.selector.select(1000);
if (-1 == HAConnection.this.slaveRequestOffset) {
Thread.sleep(10);
continue;
}
if (-1 == this.nextTransferFromWhere) {
if (0 == HAConnection.this.slaveRequestOffset) {
long masterOffset = HAConnection.this.haService.getDefaultMessageStore().getCommitLog().getMaxOffset();
masterOffset =
masterOffset
- (masterOffset % HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
.getMapedFileSizeCommitLog());
if (masterOffset < 0) {
masterOffset = 0;
}
this.nextTransferFromWhere = masterOffset;
} else {
this.nextTransferFromWhere = HAConnection.this.slaveRequestOffset;
}
log.info("master transfer data from " + this.nextTransferFromWhere + " to slave[" + HAConnection.this.clientAddr
+ "], and slave request " + HAConnection.this.slaveRequestOffset);
}
if (this.lastWriteOver) {
long interval =
HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastWriteTimestamp;
if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
.getHaSendHeartbeatInterval()) {
// Build Header
this.byteBufferHeader.position(0);
this.byteBufferHeader.limit(headerSize);
this.byteBufferHeader.putLong(this.nextTransferFromWhere);
this.byteBufferHeader.putInt(0);
this.byteBufferHeader.flip();
this.lastWriteOver = this.transferData();
if (!this.lastWriteOver)
continue;
}
} else {
this.lastWriteOver = this.transferData();
if (!this.lastWriteOver)
continue;
}
SelectMappedBufferResult selectResult =
HAConnection.this.haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere);
if (selectResult != null) {
int size = selectResult.getSize();
if (size > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize()) {
size = HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize();
}
long thisOffset = this.nextTransferFromWhere;
this.nextTransferFromWhere += size;
selectResult.getByteBuffer().limit(size);
this.selectMappedBufferResult = selectResult;
// Build Header
this.byteBufferHeader.position(0);
this.byteBufferHeader.limit(headerSize);
this.byteBufferHeader.putLong(thisOffset);
this.byteBufferHeader.putInt(size);
this.byteBufferHeader.flip();
this.lastWriteOver = this.transferData();
} else {
HAConnection.this.haService.getWaitNotifyObject().allWaitForRunning(100);
}
} catch (Exception e) {
HAConnection.log.error(this.getServiceName() + " service has exception.", e);
break;
}
}
HAConnection.this.haService.getWaitNotifyObject().removeFromWaitingThreadTable();
if (this.selectMappedBufferResult != null) {
this.selectMappedBufferResult.release();
}
this.makeStop();
readSocketService.makeStop();
haService.removeConnection(HAConnection.this);
SelectionKey sk = this.socketChannel.keyFor(this.selector);
if (sk != null) {
sk.cancel();
}
try {
this.selector.close();
this.socketChannel.close();
} catch (IOException e) {
HAConnection.log.error("", e);
}
HAConnection.log.info(this.getServiceName() + " service end");
}
這裏一開始會對slaveRequestOffset進行一次判斷,當且僅當slaveRequestOffset初始化的時候是纔是-1
也就是說當slave還沒有發送過來offset時,WriteSocketService線程只會乾等
當slave發送來offset後
首先對nextTransferFromWhere進行了判斷,nextTransferFromWhere和slaveRequestOffset一樣,在初始化的時候爲-1
也就代表着master和slave剛剛建立連接,並沒有進行過一次消息的同步!
此時會對修改了的slaveRequestOffset進行判斷
若是等於0,說明slave沒有任何消息的歷史記錄,那麼此時master會取得自身的MaxOffset,根據這個MaxOffset,通過:
masterOffset = masterOffset
- (masterOffset % HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
.getMapedFileSizeCommitLog() /* 1G */);
計算出最後一個文件開始的offset
也就是說,當slave沒有消息的歷史記錄,master只會從本地最後一個CommitLog文件開始的地方,將消息數據發送給slave
若是slave有數據,就從slave發送來的offset的位置起,進行發送,通過nextTransferFromWhere記錄這個offset值
接着對lastWriteOver進行了判斷,lastWriteOver是一個狀態量,用來表示上次發送是否傳輸完畢,初始化是true
若是true,這裏會進行一次時間檢查,lastWriteTimestamp記錄最後一次發送的時間
一次來判斷是否超過了時間間隔haSendHeartbeatInterval(默認5s)
也就是說至少有5s,master沒有向slave發送任何消息
那麼此時就會發送一個心跳包
其中byteBufferHeader是一個12字節的ByteBuffer:
private final int headerSize = 8 + 4;
private final ByteBuffer byteBufferHeader = ByteBuffer.allocate(headerSize);
這裏就簡單地構造了一個心跳包,後續通過transferData方法來完成數據的發送
若是 lastWriteOver爲false,則表示上次數據沒有發送完,就需要通過transferData方法,將剩餘數據繼續發送,只要沒發送完,只會重複循環,直到發完
先繼續往下看,下面就是發送具體的消息數據了:
首先根據nextTransferFromWhere,也就是剛纔保存的offset,通過DefaultMessageStore的getCommitLogData方法,其實際上調用的是CommitLog的getData方法,這個方法在
【RocketMQ中Broker的啓動源碼分析(二)】 中關於消息調度(ReputMessageService)時詳細介紹過
根據offset找到對應的CommitLog文件,將其從offset對應起始處所有數據讀入ByteBuffer中,由SelectMappedBufferResult封裝
這裏若是master已將將所有本地數據同步給了slave,那麼得到的SelectMappedBufferResult就會爲null,會調用:
HAConnection.this.haService.getWaitNotifyObject().allWaitForRunning(100);
將自身阻塞,超時等待100ms,要麼一直等到超時時間到了,要麼就會在後面所講的同步雙傳中被同步master喚醒
在得到SelectMappedBufferResult後,這裏會對讀取到的數據大小進行一次判斷,若是大於haTransferBatchSize(默認32K),將size改爲32K,實際上就是對發送數據大小的限制,大於32K會切割,每次最多隻允許發送32k
通過thisOffset記錄nextTransferFromWhere即offset
更新nextTransferFromWhere值,以便下一次定位
還會將讀取到的數據結果selectResult交給selectMappedBufferResult保存
然後構建消息頭,這裏就和心跳包格式一樣,前八字節存放offset,後四字節存放數據大小
最後調用transferData方法,進行發送:
private boolean transferData() throws Exception {
int writeSizeZeroTimes = 0;
// Write Header
while (this.byteBufferHeader.hasRemaining()) {
int writeSize = this.socketChannel.write(this.byteBufferHeader);
if (writeSize > 0) {
writeSizeZeroTimes = 0;
this.lastWriteTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
} else if (writeSize == 0) {
if (++writeSizeZeroTimes >= 3) {
break;
}
} else {
throw new Exception("ha master write header error < 0");
}
}
if (null == this.selectMappedBufferResult) {
return !this.byteBufferHeader.hasRemaining();
}
writeSizeZeroTimes = 0;
// Write Body
if (!this.byteBufferHeader.hasRemaining()) {
while (this.selectMappedBufferResult.getByteBuffer().hasRemaining()) {
int writeSize = this.socketChannel.write(this.selectMappedBufferResult.getByteBuffer());
if (writeSize > 0) {
writeSizeZeroTimes = 0;
this.lastWriteTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
} else if (writeSize == 0) {
if (++writeSizeZeroTimes >= 3) {
break;
}
} else {
throw new Exception("ha master write body error < 0");
}
}
}
boolean result = !this.byteBufferHeader.hasRemaining() && !this.selectMappedBufferResult.getByteBuffer().hasRemaining();
if (!this.selectMappedBufferResult.getByteBuffer().hasRemaining()) {
this.selectMappedBufferResult.release();
this.selectMappedBufferResult = null;
}
return result;
}
首先將byteBufferHeader中的12字節消息頭通過socketChannel的write方法發送出去
然後將selectMappedBufferResult中的ByteBuffer的消息數據發送出去
若是selectMappedBufferResult等於null,說明是心跳包,只發送消息頭
無論發送什麼都會將時間記錄在lastWriteTimestamp中,以便後續發送心跳包的判斷
看到這裏其實就會發現WriteSocketService線程開啓後,只要slave向master發出了第一個offset後,WriteSocketService線程都會不斷地將對應位置自己本地的CommitLog文件中的內容發送給slave,直到完全同步後,WriteSocketService線程纔會稍微緩緩,進入阻塞100ms以及每隔五秒發一次心跳包的狀態
但是隻要當Producer向master發送來消息後,由刷盤線程完成持久化後,WriteSocketService線程又會忙碌起來,此時也纔是體現同步雙寫和異步複製的時候
先不急着說這個,來看看slave接收到消息是如何處理的:
是在HAClient的線程中的processReadEvent方法處理的:
private boolean processReadEvent() {
int readSizeZeroTimes = 0;
while (this.byteBufferRead.hasRemaining()) {
try {
int readSize = this.socketChannel.read(this.byteBufferRead);
if (readSize > 0) {
lastWriteTimestamp = HAService.this.defaultMessageStore.getSystemClock().now();
readSizeZeroTimes = 0;
boolean result = this.dispatchReadRequest();
if (!result) {
log.error("HAClient, dispatchReadRequest error");
return false;
}
} else if (readSize == 0) {
if (++readSizeZeroTimes >= 3) {
break;
}
} else {
log.info("HAClient, processReadEvent read socket < 0");
return false;
}
} catch (IOException e) {
log.info("HAClient, processReadEvent read socket exception", e);
return false;
}
}
return true;
}
在socketChannel通過read方法將master發送的數據讀取到byteBufferRead緩衝區後,由dispatchReadRequest方法做進一步處理
dispatchReadRequest方法:
private boolean dispatchReadRequest() {
final int msgHeaderSize = 8 + 4; // phyoffset + size
int readSocketPos = this.byteBufferRead.position();
while (true) {
int diff = this.byteBufferRead.position() - this.dispatchPostion;
if (diff >= msgHeaderSize) {
long masterPhyOffset = this.byteBufferRead.getLong(this.dispatchPostion);
int bodySize = this.byteBufferRead.getInt(this.dispatchPostion + 8);
long slavePhyOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
if (slavePhyOffset != 0) {
if (slavePhyOffset != masterPhyOffset) {
log.error("master pushed offset not equal the max phy offset in slave, SLAVE: "
+ slavePhyOffset + " MASTER: " + masterPhyOffset);
return false;
}
}
if (diff >= (msgHeaderSize + bodySize)) {
byte[] bodyData = new byte[bodySize];
this.byteBufferRead.position(this.dispatchPostion + msgHeaderSize);
this.byteBufferRead.get(bodyData);
HAService.this.defaultMessageStore.appendToCommitLog(masterPhyOffset, bodyData);
this.byteBufferRead.position(readSocketPos);
this.dispatchPostion += msgHeaderSize + bodySize;
if (!reportSlaveMaxOffsetPlus()) {
return false;
}
continue;
}
}
if (!this.byteBufferRead.hasRemaining()) {
this.reallocateByteBuffer();
}
break;
}
return true;
}
這裏就首先將12字節的消息頭取出來
masterPhyOffset:8字節offset ,bodySize :4字節消息大小
根據master發來的masterPhyOffset會和自己本地的slavePhyOffset進行校驗,以便安全備份
之後就會將byteBufferRead中存放在消息頭後面的消息數據取出來,調用appendToCommitLog方法持久化到的CommitLog中
public boolean appendToCommitLog(long startOffset, byte[] data) {
if (this.shutdown) {
log.warn("message store has shutdown, so appendToPhyQueue is forbidden");
return false;
}
boolean result = this.commitLog.appendData(startOffset, data);
if (result) {
this.reputMessageService.wakeup();
} else {
log.error("appendToPhyQueue failed " + startOffset + " " + data.length);
}
return result;
}
實際上調用了commitLog的appendData方法將其寫入磁盤,這個方法我在前面博客中介紹過
【RocketMQ中Broker的刷盤源碼分析】
在完成寫入後,需要喚醒reputMessageService消息調度,以便Consumer的消費
關於消息調度詳見 【RocketMQ中Broker的啓動源碼分析(二)】
當然前面說過master還會發送心跳消息,但這裏明顯沒對心跳消息進行處理,只是appendToCommitLog調用時,傳入了一個大小爲0的byte數組,顯然有些不合理,想不通
在完成後,還會調用reportSlaveMaxOffsetPlus方法:
private boolean reportSlaveMaxOffsetPlus() {
boolean result = true;
long currentPhyOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
if (currentPhyOffset > this.currentReportedOffset) {
this.currentReportedOffset = currentPhyOffset;
result = this.reportSlaveMaxOffset(this.currentReportedOffset);
if (!result) {
this.closeMaster();
log.error("HAClient, reportSlaveMaxOffset error, " + this.currentReportedOffset);
}
}
return result;
}
由於完成了寫入,那麼此時獲取到的offset肯定比currentReportedOffset中保存的大,然後再次通過reportSlaveMaxOffset方法,將當前的offset報告給master
這其實上已經完成了異步master的異步複製過程
再來看看同步雙寫是如何實現的:
和刷盤一樣,都是在Producer發送完消息,Broker進行完消息的存儲後進行的
【RocketMQ中Broker的消息存儲源碼分析】
在CommitLog的handleHA方法:
public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
HAService service = this.defaultMessageStore.getHaService();
if (messageExt.isWaitStoreMsgOK()) {
// Determine whether to wait
if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
service.getWaitNotifyObject().wakeupAll();
boolean flushOK =
request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
if (!flushOK) {
log.error("do sync transfer other node, wait return, but failed, topic: " + messageExt.getTopic() + " tags: "
+ messageExt.getTags() + " client address: " + messageExt.getBornHostNameString());
putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
}
}
// Slave problem
else {
// Tell the producer, slave not available
putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
}
}
}
}
這裏就會檢查Broker的類型,看以看到只對SYNC_MASTER即同步master進行了操作
這個操作過程其實就和同步刷盤類似
【RocketMQ中Broker的刷盤源碼分析】
根據Offset+WroteBytes創建一條記錄GroupCommitRequest,然後會將添加在List中
然後調用getWaitNotifyObject的wakeupAll方法,把阻塞中的所有WriteSocketService線程喚醒
因爲master和slave是一對多的關係,那麼這裏就會有多個slave連接,也就有多個WriteSocketService線程,保證消息能同步到所有slave中
在喚醒WriteSocketService線程工作後,調用request的waitForFlush方法,將自身阻塞,預示着同步複製的真正開啓
在HAService開啓時,還開啓了一個GroupTransferService線程:
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
this.waitForRunning(10);
this.doWaitTransfer();
} catch (Exception e) {
log.warn(this.getServiceName() + " service has exception. ", e);
}
}
log.info(this.getServiceName() + " service end");
}
這裏的工作原理和同步刷盤GroupCommitService基本一致,相似的地方我就不仔細分析了
GroupTransferService同樣保存兩張List:
private volatile List<CommitLog.GroupCommitRequest> requestsWrite = new ArrayList<>();
private volatile List<CommitLog.GroupCommitRequest> requestsRead = new ArrayList<>();
由這兩張List做一個類似JVM新生代的複製算法
在handleHA方法中,就會將創建的GroupCommitRequest記錄添加在requestsWrite這個List中
其中doWaitTransfer方法:
private void doWaitTransfer() {
synchronized (this.requestsRead) {
if (!this.requestsRead.isEmpty()) {
for (CommitLog.GroupCommitRequest req : this.requestsRead) {
boolean transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
for (int i = 0; !transferOK && i < 5; i++) {
this.notifyTransferObject.waitForRunning(1000);
transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
}
if (!transferOK) {
log.warn("transfer messsage to slave timeout, " + req.getNextOffset());
}
req.wakeupCustomer(transferOK);
}
this.requestsRead.clear();
}
}
}
和刷盤一樣,這裏會通過複製算法,將requestsWrite和requestsRead進行替換,那麼這裏的requestsRead實際上就存放着剛纔添加的記錄
首先取出記錄中的NextOffset和push2SlaveMaxOffset比較
push2SlaveMaxOffset值是通過slave發送過來的,在之前說過的ReadSocketService線程中的:
HAConnection.this.haService.notifyTransferSome(HAConnection.this.slaveAckOffset);
notifyTransferSome方法:
public void notifyTransferSome(final long offset) {
for (long value = this.push2SlaveMaxOffset.get(); offset > value; ) {
boolean ok = this.push2SlaveMaxOffset.compareAndSet(value, offset);
if (ok) {
this.groupTransferService.notifyTransferSome();
break;
} else {
value = this.push2SlaveMaxOffset.get();
}
}
}
即便也多個slave連接,這裏的push2SlaveMaxOffset永遠會記錄最大的那個offset
所以在doWaitTransfer中,根據當前NextOffset(完成寫入後master本地的offset),進行判斷
其實這裏主要要考慮到WriteSocketService線程的工作原理,只要本地文件有更新,那麼就會向slave發送數據,所以這裏由於HA同步是發生在刷盤後的,那麼就有可能在這個doWaitTransfer執行前,有slave已經將數據進行了同步,並且向master報告了自己offset,更新了push2SlaveMaxOffset的值
那麼
boolean transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
這個判斷就會爲真,意味着節點中已經有了備份,所以就會直接調用
req.wakeupCustomer(transferOK);
以此來喚醒剛纔在handleHA方法中的阻塞
若是判斷爲假,就說明沒有一個slave完成同步,就需要
for (int i = 0; !transferOK && i < 5; i++) {
this.notifyTransferObject.waitForRunning(1000);
transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
}
通過waitForRunning進行阻塞,超時等待,最多五次等待,超過時間會向Producer發送FLUSH_SLAVE_TIMEOUT
若是在超時時間內,有slave完成了同步,並向master發送了offset後,在notifyTransferSome方法中:
public void notifyTransferSome(final long offset) {
for (long value = this.push2SlaveMaxOffset.get(); offset > value; ) {
boolean ok = this.push2SlaveMaxOffset.compareAndSet(value, offset);
if (ok) {
this.groupTransferService.notifyTransferSome();
break;
} else {
value = this.push2SlaveMaxOffset.get();
}
}
}
就會更新push2SlaveMaxOffset,並通過notifyTransferSome喚醒上面所說的阻塞
然後再次判斷push2SlaveMaxOffset和getNextOffset
成功後喚醒剛纔在handleHA方法中的阻塞,同步master的主從複製也就結束
由於同步master的刷盤是在主從複製前發生的,所以同步雙寫意味着master和slave都會完成消息的持久化
至此,RocketMQ中Broker的HA策略分析到此結束