RocketMQ：索引源碼分析

RocketMQ是阿里開源的一款高性能高吞吐的消息中間件，我們來研究下它是如何實現的，重點關注索引。

我們拿一個執行用例來測試，代碼如下：

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.rocketmq.example.quickstart;

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.List;

import org.apache.rocketmq.client.exception.MQClientException;
import org.apache.rocketmq.client.producer.DefaultMQProducer;
import org.apache.rocketmq.client.producer.MessageQueueSelector;
import org.apache.rocketmq.common.message.Message;
import org.apache.rocketmq.common.message.MessageQueue;
import org.apache.rocketmq.remoting.common.RemotingHelper;

public class Producer {
    public static void main(String[] args) throws MQClientException, InterruptedException {
    	SimpleDateFormat time=new SimpleDateFormat("yyyy/MM/dd HH:mm:ss"); 
    	final DefaultMQProducer producer = new DefaultMQProducer("Producer");
        producer.setNamesrvAddr("localhost:9876");
        producer.start();

    	final int num = 2;
        for (int i = 0; i < 1; i++) {
            try {

            	Message msg = new Message("Topic1" /* Topic */,
                    "TagA" /* Tag */,
                    ("Hello RocketMQ " + num + time.format(new Date()) + " " + i).getBytes(RemotingHelper.DEFAULT_CHARSET) /* Message body */
                );
                msg.putUserProperty("psly", "psly");

                producer.send(msg, new MessageQueueSelector(){
                	@Override
					public MessageQueue select(final List<MessageQueue> mqs, final Message msg, final Object arg){
                		System.out.println(arg);
                		return mqs.get(((Integer) arg) % mqs.size());
                	}
                }, num);

            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        producer.shutdown();
    }
}

我們接着在DefaultMessageStore裏面打個斷點，然後執行以上用例。

可以看到代碼進入了這個方法。

我們跟着它的執行，最後會看到它到了關鍵的doAppend方法。

這個位置會真正開始組織消息數據，並且保存到commit文件對應的內存映射裏面。

那麼具體來說，消息數據是如何格式化的呢？我們可以直接看calMsgLength方法，註釋中詳細說明了消息存儲所佔有的字節數：

我們可以重點關注其中的幾個重要數據：

TOTALSIZE，作爲消息的最字節數，作爲第一個成員，4個字節數。它用於界定消息的邊界。
BODYCRC，通過循環冗餘校驗來查看消息內容是否已出錯。
QUEUEOFFSET，根據topic名稱取得對應的Long（默認0-4），將來作爲存儲索引文件的目錄。
PHYSICALOFFSET，用於消息在查找持久化(文件)之後在文件塊(MappedFile)中的偏移位置。

然後消息格式化之後，接着又要幹什麼呢？一般來說此時內存裏面已經有了這條消息，但是我們不知道消息何時會被消費，所以我們得持久化這條消息。也就是將將消息flush到文件上。而事實上我們已經構造的消息內存正是關聯到一個文件的，截圖如下：

那麼我們所要做的就是去flush這個文件映射，從而確保消息保存到磁盤上。

另一方面，RocketMQ的索引設計採取的方式是

先格式化消息(計算此消息的總大小、topic名字、此key計算得到的queueoffset、放入磁盤中的偏移量等數據)，然後放入消息塊文件(比較大的文件，默認貌似1G)。
一個線程異步地將上面構造的消息flush進硬盤
一個線程將topic對應的physicaloffset放入索引的文件目錄(內存映射)。physicaloffset用於從大塊的文件存儲中索引該消息。
一個線程將上面構造的文件(含索引)，刷新到硬盤中。

所以這裏的消息將來怎麼取得呢？

方式如下：

首先根據topic直接取得對應的topic目錄。
再根據key計算對應的queueoffset值，默認（0-4）。
該目錄下的文件內容（默認大小6000000個字節，5860KB）爲消息對應commit大文件的索引，默認一個消息20（CQ_STORE_UNIT_SIZE）個字節。
所以先取得索引文件的內容（20個字節），然後根據其中的offset字段、總長度字段，去commit文件中取得真正的消息內容。
（因爲採用MappedByteBuffer來實現，所以以上的操作很可能不需要磁盤IO）

這裏有個問題，爲什麼不爲每個topic、queue建立一個文件來專門保存此類消息呢？

推測如下：

假如topic過多，會導致文件數量過多，且每個文件都保存着大量數據，不好維護。
分成多個topic文件的方式，並不能提高IO的效率。可能會導致同時打開多個文件I/O。
將消息內容都存在一個目錄。這樣讀取和寫入時只需要打開一個文件I/O，提高效率。然後將物理位置的索引放到對應的topic目錄。
這種方式可以理解爲：一個重量級的目錄+多個輕量級的索引目錄。

最後我們來看看實現的代碼：

從消息內容中提取字段，構造字段存入索引文件（僅在內存中構造ConsumeQueue，以及存入消息索引，很快），由ReputMessageService線程來完成。

由於需要快速響應給消費者，可以看到這裏輪詢的時間間隔非常短(Thread.sleep(1))。

將消息內容commit到磁盤上，由FlushRealTimeService線程來完成：

這裏的interval稍微久點，默認500毫秒。因爲刷一次硬盤比較昂貴，儘量一次多幹點活。

將ReputMessageService產生的索引刷到對應的topic目錄文件中，由FlushConsumeQueueService線程完成，代碼如下：

由於前面的ReputMessageService線程已經將索引數據保存在內容中了，所以這裏的磁盤操作輪詢間隔interval也比較大，默認1000毫秒。

最後還有個重要的問題，這四類線程是如何協作工作的呢，看如圖代碼：

SendMessageThread_*線程通過wrotePosition變量來通知ReputMessageService線程和FlushRealTimeService線程。
FlushRealTimeService執行消息內容持久化，ReputMessageService執行構建消息索引的內存映射。這兩者可同時進行。
ReputMessageService完成任務之後，再次通過其對應的wrotePosition來通知FlushConsumeQueueService進行刷新索引的工作。代碼如下：

所以這裏的依賴如下：

FlushRealTimeService 依賴於SendMessageThread_*，通過wrotePosition變量；
ReputMessageService 依賴於SendMessageThread_*，通過wrotePosition變量；
FlushConsumeQueueService 依賴於ReputMessageService，通過wrotePosition變量。

以上爲索引與存儲的服務設計。

RocketMQ：索引源碼分析

Java Thread&Concurrency(3): 深入理解SynchronousQueue實現原理

Java Thread&Concurrency(8): 深入理解CompletionService接口及其實現

Java Thread&Concurrency(7): 深入理解Callable/Future（FutureTask）接口及其實現

Java Thread&Concurrency(9): 深入理解StampedLock及其實現原理

Java Thread&Concurrency(5): 深入理解Phaser實現原理

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結