mongodb 深度分頁優化思路之cursor遊標

　　mongodb 沒有官方的遊標滾動實現深度分頁功能，建議的都是選擇出一個字段，如_id,然後每次查詢時限制該字段，而不進行分頁處理。

　　也沒有看到更優的實現方式，本文做一個大膽的假設，自行實現滾動分頁功能。供大家思路參考。

　　但是猜想可以自行實現一個，簡單思路就是，第一次查詢時不帶limit進行查詢全量數據，然後自己通過cursor迭代出需要的行數後返回調用端，下次再調用時，直接取出上一次的cursor，再迭代limit的數量返回。

　　優勢是隻需計算一次，後續就直接複用結果即可。該功能需要有mongodb的clientSession功能支持。

　　但是需要複雜的自己維護cursor實例，打開、關閉、過期等。稍微管理不好，可能就客戶端內存泄漏或者mongo server內存泄漏。

實踐步驟：

1. 引入mongo 驅動：

        <!-- https://mvnrepository.com/artifact/org.mongodb/mongodb-driver-sync -->
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongodb-driver-sync</artifactId>
            <version>4.4.2</version>
        </dependency>
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongodb-driver-core</artifactId>
            <version>4.4.2</version>
        </dependency>
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>bson</artifactId>
            <version>4.4.2</version>
        </dependency>

　　注意版本不匹配問題，所以要引入多個包。

2. 創建測試類：

　　驗證接入mongo無誤，且造入適量的數據。

import static com.mongodb.client.model.Filters.eq;

import com.mongodb.ConnectionString;
import com.mongodb.MongoClientSettings;
import com.mongodb.WriteConcern;
import com.mongodb.client.*;
import com.mongodb.client.result.InsertOneResult;
import org.bson.Document;
import org.junit.Before;
import org.junit.Test;
import org.openjdk.jmh.annotations.Setup;

public class MongoQuickStartTest {

    private MongoClient mongoClient;

    @Before
    public void setup() {
        // Replace the placeholder with your MongoDB deployment's connection string
        String uri = "mongodb://localhost:27017";
        MongoClientSettings options = MongoClientSettings.builder()
                .applyConnectionString(new ConnectionString(uri))
                .writeConcern(WriteConcern.W1).build();
        mongoClient = MongoClients.create(options);
    }

    @Test
    public void testFind() {
//        ConnectionString connectionString = new ConnectionString("mongodb://localhost:27017");
//        MongoClient mongoClient = MongoClients.create(connectionString);
        // Replace the placeholder with your MongoDB deployment's connection string
        MongoDatabase database = mongoClient.getDatabase("local");
        MongoCollection<Document> collection = database.getCollection("test01");
        Document doc = collection.find(eq("name", "zhangsan1")).first();
        if (doc != null) {
            System.out.println(doc.toJson());
        } else {
            System.out.println("No matching documents found.");
        }
    }

    @Test
    public void testInsert() {
        Document body = new Document();
        long startId = 60011122212L;
        MongoDatabase database = mongoClient.getDatabase("local");
        MongoCollection<Document> collection = database.getCollection("test01");
        int i;
        for (i = 0; i < 500000; i++) {
            String id = (startId + i) + "";
            body.put("_id", id);
            body.put("name", "name_" + id);
            body.put("title", "title_" + id);
            InsertOneResult result = collection.insertOne(body);
        }
        System.out.println("insert " + i + " rows");
    }
}

3. 創建cursor的分頁查詢實現類

　　基於springboot創建 controller進行會話測試，使用一個固定的查詢語句進行分頁測試。

import com.mongodb.ConnectionString;
import com.mongodb.MongoClientSettings;
import com.mongodb.WriteConcern;
import com.mongodb.client.*;
import org.bson.Document;
import org.springframework.stereotype.Service;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Service
public class MongoDbService {

    private MongoClient mongoClient;
    // 所有遊標容器，簡單測試，真正的管理很複雜
    private Map<String, MongoCursor<Document>> cursorHolder
            = new ConcurrentHashMap<>();

    public void ensureMongo() {
        // Replace the placeholder with your MongoDB deployment's connection string
        String uri = "mongodb://localhost:27017";
        MongoClientSettings options = MongoClientSettings.builder()
                .applyConnectionString(new ConnectionString(uri))
                .writeConcern(WriteConcern.W1).build();
        mongoClient = MongoClients.create(options);
    }

    // 特殊實現的 cursor 滾動查詢
    public List<Document> findDataWithCursor(String searchAfter, int limit) {
        ensureMongo();
        MongoDatabase database = mongoClient.getDatabase("local");
        MongoCollection<Document> collection = database.getCollection("test01");
        List<Document> resultList = new ArrayList<>();
        MongoCursor<Document> cursor = cursorHolder.get(searchAfter);
        if(cursor == null) {
            // 第一次取用需要查詢，後續直接複用cursor即可
            cursor = collection.find().sort(new Document("name", 1)).iterator();
            cursorHolder.put(searchAfter, cursor);
        }
        int i = 0;
        // 自行計數，到達後即返回前端
        while (cursor.hasNext()) {
            resultList.add(cursor.next());
            if(++i >= limit) {
                break;
            }
        }
        if(!cursor.hasNext()) {
            cursor.close();
            cursorHolder.remove(searchAfter);
        }
        return resultList;
    }
}

　　應用調用controller:

    @Resource
    private MongoDbService mongoDbService;

    @GetMapping("/mongoPageScroll")
    @ResponseBody
    public Object mongoPageScroll(@RequestParam(required = false) String params,
                                  @RequestParam String scrollId) {
        return mongoDbService.findDataWithCursor(scrollId, 9);
    }

4. 測試方式或使用方法

　　測試方式分爲首次查詢和下一頁查詢，首次訪問接口：http://localhost:8080/hello/mongoPageScroll?scrollId=c，然後反覆調用（下一頁）。

　　如此，只要前端第一次查詢時，不存在cursor就創建，後續就直接使用原來的結果。第一次可能慢，第二次就很快了。

　　結論，是可以簡單實現的，但是生產不一定能用。因爲，如何管理cursor,絕對是個超級複雜的事，何時打開，何時關閉，超時處理，單點故障，機器宕機等，很難解決。思路僅參考！

5. search_after機制實現

而同樣的事情如果交給db server也許是容易些的，但遇到的困難也很多，主要更多了一個內存過大問題很難處理，所以es的高版本實現已經把 scroll 機制去除了。

es的高版本去除了scroll機制，而是替換成了search_after機制。那麼search_after機制又有什麼不同呢？其表象是每次查詢下一頁時把最後一條記錄的sort字段攜帶上，然後就不再skip記錄了，而是直接取limit條即可。那麼它的底層原理是什麼呢？緩存機制？查詢語句改寫？

具體方式後面再細細研究，如果是語義改寫，我們是可以做點什麼的。如果是緩存機制則可能要放棄了。

下面給出一點語義改寫的思路：

1. 如果是單個字段，那麼相對簡單，只要新生成一個排序字段和_id字段組合串，用戶下次查詢時帶上就可以了，但是要求兩個排序的方向一致，即單方向，從而下次偏移時知道是大於還是小於了；比如如果asc，那麼下次的語義改寫就是添加一個條件： and _id > 'last_id'; 而如果是desc，那麼語義改寫就是： and _id < 'last_id';

2. 如果是複合字段，如果方向相同，可以參考第一點（僅參考，實際是不能應用的），如果是多方向的，那麼就不能簡單的使用><進行偏移了；簡單來說可能就是取反邏輯，但如何取反卻是很難的。

比如以2個字段排序爲例：

原始排序依據是：order by fd1 asc, fd2 desc;

首先要保證準確的排序展現，後端必須隱形默默地加上_id排序，即會變成：order by fd1 asc, fd2 desc, _id asc; 但爲說清楚原理簡單起見，這種情況不在我們的理論討論範圍內。即忽略，假設每條記錄都可以通過排序字段區分出來。

那麼，語義改寫則可能是：and ( (fd1 > 'last_fd1') or (fd1 = 'last_fd1' and fd2 > 'last_fd2') )

比如以3個字段排序爲例：

原始排序依據是：order by fd1 asc, fd2 desc, fd3 asc;

那麼，語義改寫則可能是：and ( (fd1 > 'last_fd1') or (fd1 = 'last_fd1' and fd2 < 'last_fd2') or (fd1 = 'last_fd1' and fd2 = 'last_fd2' and fd3 > 'last_fd3'))

更多字段依此類推，只要以下幾種情況都是可以的：

1. 第一字段滿足，停止；

2. 第一+第二字段滿足，停止；

3. 第一+第二字段+第三字段滿足，停止；

4. 更多。。。即有幾個排序字體就有幾個改寫的可能；

這種改寫與skip有什麼差別嗎？還是有的，skip的實現方式是先找到所有數據，再跳過。而這種改寫是縮小了結果集範圍，減少了運算量，效果應該是要好一點的。更優化的方式是，在排序字段上加上索引，那麼性能就差別更大了，就像前面的_id字段優化，已成爲了最佳實踐。

mongodb 深度分頁優化思路之cursor遊標

1. 引入mongo 驅動：

2. 創建測試類：

3. 創建cursor的分頁查詢實現類

4. 測試方式或使用方法

5. search_after機制實現

小測試：HashSet可以插入重複的元素嗎？

springboot事務管理實現原理解析

mongodb 深度分頁優化思路之cursor遊標

pagehelper踩坑記之分頁亂套

sql語法巧用之not取反

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結