es的聚合查詢會涉及到很多概念，比如fielddata,DocValue，也會引出很多問題，比如聚合查詢導致的內存溢出。在沒有真正瞭解聚合查詢的情況下，我們往往對這些概念，問題都是雲山霧繞的。本文我們分析一下ES聚合查詢的源碼，理清楚聚合查詢的流程。穿越層層迷霧來認清聚合的本質。

聚合查詢的入口

es的聚合查詢的入口代碼如下：

public void execute(SearchContext searchContext) throws QueryPhaseExecutionException {
        aggregationPhase.preProcess(searchContext);  <1>
        boolean rescore = execute(searchContext, searchContext.searcher());<2>
        aggregationPhase.execute(searchContext);<3>
        }
    }

<1>爲聚合查詢做準備
<2>根據條件進行查詢，獲取查詢結果
<3>對查詢結果進行聚合

<3>纔是聚合的真正入口，但是要想真正理解ES的聚合，我們必須瞭解<1><2>。因爲<1>中提供了聚合查詢必要的採集器（collector）, 正排索引。<2>爲聚合查詢提供了數據基礎，即<3>是在<2>中採集出來的數據的基礎上進行的。下面以這3步爲大綱，分析es的聚合查詢源碼

聚合前的準備

聚合前所需要做的準備主要就是一件事：構建採集器aggregators。
aggregators是一個由aggregator組成的列表，aggregator包裝着聚合的實現邏輯，因爲es擁有多種聚合方式，所以也就有多種不同實現邏輯的aggregator。在查詢階段中，es會調用aggregator中的邏輯去採集數據。

aggregator的構建

AggregatorFactories factories = context.aggregations().factories();   <1>
aggregators = factories.createTopLevelAggregators(aggregationContext); <2>

<1>從上下文中獲取aggregator的工廠
<2>工廠生產出aggregators

值得一提的是<2>步驟表面是隻是生產了aggregator。實際上還偷偷幹了一件重要的事情：加載Doc Value 。
Doc Value 和FieldData是es的正排索引，它對提升es聚合查詢的性能起着至關重要的作用。因此，有必要探究一下它的加載邏輯。

DocValue的加載

DocValue的加載的源碼位置：AggregatorFactories：createTopLevelAggregators -> AggregatorFactory:create -> ValuesSourceAggregatorFactory:createInternal

public Aggregator createInternal(AggregationContext context, Aggregator parent, boolean collectsFromSingleBucket,
            List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
        VS vs = config.toValuesSource(context.getQueryShardContext()); <1>
        return doCreateInternal(vs, context, parent, collectsFromSingleBucket, pipelineAggregators, metaData); <2>
    }

<1> 從config中獲取value source,vs中包含了DocValue
<2>fielddata 作爲vs參數傳入該方法中。

protected Aggregator doCreateInternal(ValuesSource valuesSource, Aggregator parent, boolean collectsFromSingleBucket,
            List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
 
       ...
                ValuesSource.Bytes.WithOrdinals valueSourceWithOrdinals = (ValuesSource.Bytes.WithOrdinals) valuesSource;
                IndexSearcher indexSearcher = context.searcher();
                maxOrd = valueSourceWithOrdinals.globalMaxOrd(indexSearcher);<1>
                ratio = maxOrd / ((double) indexSearcher.getIndexReader().numDocs());<2>
           ...
    }

<1>從vs中加載docValue，它首先會嘗試去本地緩存中找，如果本地緩存中沒有DocValue的話，就從磁盤文件中讀取，着就是傳入indexSearcher的目的。獲取到docvalue之後就可以獲得maxOrd,它表示這個字段中term的總數。
<2>ratio=詞項總數/文檔總數。radio越小聚合出來結果的bucket數量就越小。根據ratio的值我們應該選擇適合的聚合模式以優化聚合查詢的性能。

加載出來的docValue真正排上用場是在執行查詢的過程中。

在查詢過程中採集數據

這個步驟的入口代碼位於QueryPhase：execute方法中，這個方法很長，內容很多，但是我們只關注它與聚合部分的聯繫，因此我們只需要看到其中的一行代碼

searcher.search(query, collector);

這行代碼是es正式開始查詢的入口，它直接調用的lucene的查詢接口，query參數包含了查詢條件，collector則是封裝了aggregator，它攜帶了聚合的邏輯，我們稱collector爲採集器。

protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector)
      throws IOException {
    for (LeafReaderContext ctx : leaves) { // search each subreader
      final LeafCollector leafCollector;
      try {
        leafCollector = collector.getLeafCollector(ctx);
      } catch (CollectionTerminatedException e) {
        continue;
      }
      BulkScorer scorer = weight.bulkScorer(ctx);<1>
      if (scorer != null) {
        try {
          scorer.score(leafCollector, ctx.reader().getLiveDocs());<1>
        } catch (CollectionTerminatedException e) {
          // collection was terminated prematurely
          // continue with the following leaf
        }
      }
    }
  }

這段代碼的邏輯一目瞭然。我們知道es中一個索引包含多個分片，一個分片包含多個段，這裏的leaves就是段的集合。代碼中遍歷每個段，去查詢段中符合查詢條件的文檔，給文檔打分，用採集器收集匹配查詢文檔的聚合指標數據。
<1>找出匹配條件的文檔集合
<2>遍歷匹配的文檔集合，用採集器採集指標數據。
我們只關注跟聚合相關的<2>

for (int doc = iterator.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = iterator.nextDoc()) {
          if (acceptDocs == null || acceptDocs.get(doc)) {
            collector.collect(doc);
          }
        }

public void collect(int doc) throws IOException {
      final LeafCollector[] collectors = this.collectors;
      int numCollectors = this.numCollectors;
      for (int i = 0; i < numCollectors; ) {
        final LeafCollector collector = collectors[i];
        try {
          collector.collect(doc);
          ++i;
        } catch (CollectionTerminatedException e) {
          removeCollector(i);
          numCollectors = this.numCollectors;
          if (numCollectors == 0) {
            throw new CollectionTerminatedException();
          }
        }
      }
    }

遍歷匹配的文檔集合，用採集器採集這個文檔的指標

public void collect(int doc, long bucket) throws IOException {
                    assert bucket == 0;
                    final int ord = singleValues.getOrd(doc);<1>
                    if (ord >= 0) {
                        collectGlobalOrd(doc, ord, sub);<2>
                    }
                }

<1>用正排索引DocValue尋找指定文檔對應的詞項，這裏就是前面加載的DocValue排上用場的地方了
<2>更新詞項對應的指標

private void collectGlobalOrd(int doc, long globalOrd, LeafBucketCollector sub) throws IOException {
      
          collectExistingBucket(sub, doc, globalOrd);
    }

public final void collectExistingBucket(LeafBucketCollector subCollector, int doc, long bucketOrd) throws IOException {
        docCounts.increment(bucketOrd, 1);
    }

docCounts可以理解爲一個Map,以詞項作爲key,以詞項對應的文檔數量作爲value。這裏說的詞項實際上是一個數字bucketOrd，它是詞項在全局的唯一標誌。
到這裏查詢階段數據的採集完成，docCounts就是在查詢階段爲聚合準備的數據。聚合中的bucket就是從docCounts的基礎上構建出來的。

ES request斷路器對docCounts的內存限制

docCounts的大小取決於詞項的數量，我們假設如果聚合請求涉及到的詞項非常龐大，那麼docCounts佔用的內存空間也會非常龐大，這是不是有OOM的風險呢？所幸ES對此早已有了對策，那就是通過request 斷路器來限制docCounts的大小。request 斷路器的作用就是防止每個請求（比如聚合查詢請求）的數據結構佔用的內存超出一定的量。

    private IntArray docCounts;

    public BucketsAggregator(String name, AggregatorFactories factories, SearchContext context, Aggregator parent,
            List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
        super(name, factories, context, parent, pipelineAggregators, metaData);
        bigArrays = context.bigArrays();
        docCounts = bigArrays.newIntArray(1, true);
    }

public IntArray newIntArray(long size, boolean clearOnResize) {
        if (size > INT_PAGE_SIZE) {
            // when allocating big arrays, we want to first ensure we have the capacity by
            // checking with the circuit breaker before attempting to allocate
            adjustBreaker(BigIntArray.estimateRamBytes(size), false);
            return new BigIntArray(size, this, clearOnResize);
        } 
    }

以上代碼可以看到，docCounts的類型是IntArray 。而在創建一個IntArray 對象的時候，會調用adjustBreaker方法預估，加上這個intArray之後佔用的內存會不會達到request 斷路器定義的limit，如果超過limit就會拋出異常終止查詢。這就是斷路器對內存的保護。

數據聚合

進過前兩個步驟，我們已經獲取到了用於聚合的基礎數據，現在我們可以開始聚合了。
聚合的關鍵代碼：

aggregations.add(aggregator.buildAggregation(0));

public InternalAggregation buildAggregation(long owningBucketOrdinal) throws IOException {
       
		final int size;
        BucketPriorityQueue<OrdBucket> ordered = new BucketPriorityQueue<>(size, order.comparator(this));<1>
        OrdBucket spare = new OrdBucket(-1, 0, null, showTermDocCountError, 0);<2>
        for (long globalTermOrd = 0; globalTermOrd < valueCount; ++globalTermOrd) {<3>
           
            final long bucketOrd = getBucketOrd(globalTermOrd);
            final int bucketDocCount = bucketOrd < 0 ? 0 : bucketDocCount(bucketOrd);
            if (bucketCountThresholds.getMinDocCount() > 0 && bucketDocCount == 0) {
                continue;
            }
            otherDocCount += bucketDocCount;
            spare.globalOrd = globalTermOrd;
            spare.bucketOrd = bucketOrd;
            spare.docCount = bucketDocCount;
            if (bucketCountThresholds.getShardMinDocCount() <= spare.docCount) {
                spare = ordered.insertWithOverflow(spare);
                if (spare == null) {
                    spare = new OrdBucket(-1, 0, null, showTermDocCountError, 0);
                }
            }
        }

        // Get the top buckets
        final StringTerms.Bucket[] list = new StringTerms.Bucket[ordered.size()];<4>
        long survivingBucketOrds[] = new long[ordered.size()];
        for (int i = ordered.size() - 1; i >= 0; --i) {
            final OrdBucket bucket = ordered.pop();
            survivingBucketOrds[i] = bucket.bucketOrd;
            BytesRef scratch = new BytesRef();
            copy(lookupGlobalOrd.apply(bucket.globalOrd), scratch);
            list[i] = new StringTerms.Bucket(scratch, bucket.docCount, null, showTermDocCountError, 0, format);
            list[i].bucketOrd = bucket.bucketOrd;
            otherDocCount -= list[i].docCount;
        }
        //replay any deferred collections
        runDeferredCollections(survivingBucketOrds);

        //Now build the aggs
        for (int i = 0; i < list.length; i++) {
            StringTerms.Bucket bucket = list[i];
            bucket.aggregations = bucket.docCount == 0 ? bucketEmptyAggregations() : bucketAggregations(bucket.bucketOrd);
            bucket.docCountError = 0;
        }

        return new StringTerms(name, order, bucketCountThresholds.getRequiredSize(), bucketCountThresholds.getMinDocCount(),
                pipelineAggregators(), metaData(), format, bucketCountThresholds.getShardSize(), showTermDocCountError,
                otherDocCount, Arrays.asList(list), 0);<5>
    }

<1>創建一個bucket隊列ordered ，存放bucket
<2>構建一個空的bucket對象spare
<3>構建所有bucket並且添加到ordered。其中spare.globalOrd這個詞項在全局的序號，spare.bucketOrd是這個詞項在段中的序號，spare.docCount這個詞項擁有的文檔數，這些信息都是從DocValue和docCounts中獲取的。
<4>創建bucket列表list，將ordered中的bucket放入list中
<5>最後用StringTerms對象包裝list返回聚合結果。

這裏需要提醒的是，bucket列表list是存儲在內存中的，如果這個list中bucket的數量太過龐大，比如達到了幾千萬甚至上億的數據量，很可能會引發esOOM的慘案。事實上着中情況在es的運維過程中時有發生，一些不同瞭解es聚合原理的業餘操作者，動不動就在上億數據量的索引上對時間戳，主鍵這種唯一標誌的字段做聚合查詢，導致生成上億個bucket最終出發es OOM。最後還抱怨不好用。對於這種問題，目前主要的規避方法是：1、培訓es操作者，杜絕提交這種不合理的查詢請求；2、在es上層做一層網關或者代理，過拒絕這種惡意請求。
不過我發現es6.2.0中新推出了一個search.max_buckets的配置，如果查詢產生的buckets數量超過配置的數量，就能終止查詢，防止es OOM。

看到這裏各位看官可能會疑惑，前面不是說過es 的request斷路器可以保護內存的嗎？爲什麼阻止不了bucket列表的內存溢出。這裏我們需要知道，request斷路器監控的只是查詢過程中產生的docCounts佔用內存的大小，並沒有監控聚合階段bucket列表佔用的內存。千萬不要錯誤的以爲request斷路器會監控聚合查詢過程中所有數據結構佔用的內存！

ES5.6.4源碼解析--聚合查詢流程

聚合查詢的入口

聚合前的準備

aggregator的構建

DocValue的加載

在查詢過程中採集數據

ES request斷路器對docCounts的內存限制

數據聚合

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

ES5.6.4源碼解析--聚合查詢流程

impala的基本操作------持續更新中

ES 7.X安全功能的開啓配置

Kyuubi介紹

Spark基本操作----持續更新中

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結